Using Primes calculation as benchmark
Author 
Message 
David Fran #1 / 38

Using Primes calculation as benchmark
My revised James VanBuskirk source (he posted his original in a recent topic) has been posted below and in comp.lang.pl1 newsgroup in response to a challenge and may be ofinterest here as well. ! module primes_1 contains subroutine compute_primes(n, count, numbers) implicit none integer, intent(in) :: n integer, intent(out) :: count integer, allocatable :: numbers(:) integer, parameter :: NB = bit_size(1) ! usually = 32 integer :: sieve(0:n/NB+1) ! bit table for possible primes integer :: i, j, upper sieve = 0 sieve(0) = 3 ! 0 and 1 aren't prime upper = sqrt(n+0.5d0) do i = 2, upper if (iand(sieve(i/NB),ishft(1,mod(i,NB))) == 0) then do j = i**2, n, i sieve(j/NB) = ior(sieve(j/NB),ishft(1,mod(j,NB))) end do end if end do count = 0 do i = 1, n if (iand(sieve(i/NB),ishft(1,mod(i,NB))) == 0) count = count+1 end do allocate( numbers(count) ) count = 0 do i = 1, n if (iand(sieve(i/NB),ishft(1,mod(i,NB))) == 0) then count = count+1 numbers(count) = i end if end do end subroutine compute_primes end module primes_1 !  program primes use primes_1 integer,allocatable :: numbers(:) integer :: clock1,clock2, rate, count, n = 10000000 call system_clock(clock1,rate) call compute_primes(n,count,numbers) call system_clock(clock2,rate) write(*,91) '#Primes = ', count, & '1st Prime = ', numbers(1), & 'Last Prime = ', numbers(count) 'time(sec) = ',(clock2clock1)/dble(rate) 91 format (3(a,io/),a,f0.2) end program primes ! CVF6.6b results running above on my 833Mhz PC  #Primes = 664579 1st Prime = 2 Last Prime = 9999991 time(sec) = 2.35

Thu, 14 Jul 2005 19:17:02 GMT 


James Van Buskir #2 / 38

Using Primes calculation as benchmark
Quote: > James VanBuskirk
OK, I took the time to look at some of your posts in c.l.p, and have some comments to make: o You speculated that my code was kludged up from some old C source, but as is the case with most of my examples it was written from scratch on the spot using CVF. o In one of the threads it was mentioned that pl1 has bit strings, I think. These really, to my mind, would make the [elided] code much clearer. Actually fortran can implement bit strings, too, as a singlebit LOGICAL KIND. M&R recommended their implementation as LOGICAL(0) but I am not aware of any vendors who have done so at the current time. o More than one responder pointed out that the code could have been made faster by various means, and this is indeed the case. Indeed, my fastest version can run up and count a sieve table to 1.0e9 in 0.641 seconds on my 1.4 GHz Athlon. I think the slowest part of the code may be where it generates the table of results, so all the things I do to speed it up may not help much if it is required that the output be an array of INTEGERs, one entry per prime. BTW, I tried the same executable on a 2.0 GHz P4 at Office Max (you would think that you could get kicked out of a store for taking a floppy down to the establishment and running some benchmarks, but I have found that the clerks are generally actively curious about the results you get on various machines...) and it took something like 2 seconds to achieve he same result. I think this may have been due mostly to the tiny cache size on that processor; my program actually can change the target cache size at runtime via user input, but doing so caused some Athlonspecific assembly code to kick in, resulting in an undefined opcode exception, and I never got around to stripping it off and recompiling with IFL for a P4 target. o I have noticed that you tend to take a stance of Fortran advocacy in comp.lang.pl1. From my experience, when folks advocate some nonFortran language in c.l.f., it doesn't tend to give me a positive impression of whatever language he advocates, nor of its practitioners. Granted, you do both stimulate and provide code examples and seem to be trying to get information about the capabilities and performance of pl1 rather than just saying that we must all convert to the evil debbildebbil religion of C++ or Java by decree of God Almighty, as the misguided practitioners of those latter languages chant from time to time in this Forum, but I can see how some c.l.p. participants might consider your approach to be a bit offputting. I realize that you sometimes take what seems to be a confrontational stance when you actually just want to persuade readers to put forth some extra effort to provide technical information, but I worry that some readers may take your approach the wrong way and get a bad impression of Fortran programmers as a consequence, surely not an outcome that you would consider desirable.

Fri, 15 Jul 2005 00:49:25 GMT 


Walt Brainer #3 / 38

Using Primes calculation as benchmark
There are two typos in the code (see code). I ran with NAG "f95 O" on a 500MHz Linux PC and got 3.41 sec. If any of these benchmarks make any sense at all (I am skeptical), then this one makes NAG look pretty good (so far). But wouldn't cpu_time be a better measure? Maybe not much differentI got 3.32 sec using it. However, the "obvious", "straightforward" way to do this doesn't take all that much longer4.81 sec with the NAG compiler. This is a modified version of an example of F code provided by{*filter*} Hendrickson. On an 800 MHz Windows XP (Intel) laptop, the F compiler runs the following "straightforward" version in 3.24 sec. !!program Sieve_Of_Eratosthenes ! Find prime numbers using array processing. ! Strike the Twos and strike the Threes ! The Sieve of Eratosthenes! ! When the multiplies sublime ! The numbers that are left, are prime. ! From: Drunkard's Walk, by Frederik Pohl module sieve integer, private :: last_number = 10000000 integer, public, dimension(:), allocatable :: numbers integer, public :: number_of_primes, last_prime public :: find_primes contains subroutine find_primes() integer :: stat, i, ac allocate (numbers(last_number), stat=stat) if (stat > 0) then print *, "numbers allocation problem" stop end if ! Initialize numbers array to 0, 2, 3, ..., last_number ! Zero instead of 1 because 1 is a special case. numbers = (/ 0, (ac, ac = 2, last_number) /) do i = 2, int(sqrt(real(last_number)))+1 if (numbers(i) /= 0) then ! if this number is prime numbers(2*i : last_number : i) = 0 ! eliminate all multiples endif ! Count the primes. number_of_primes = count (numbers /= 0) ! Find last prime do i = last_number, 1, 1 if (numbers(i) /= 0) then last_prime = i exit end if end do end subroutine find_primes end module sieve program Sieve_Of_Eratosthenes use sieve character(len=*), parameter :: & f91 = "(3(a,i0,/),a,f0.2)" real :: start, stop call cpu_time(start) call find_primes() call cpu_time(stop) print f91, "#Primes = ", number_of_primes, & "1st Prime = ", numbers(2), & "Last Prime = ", numbers(last_prime), & "time(sec) = ", stop  start end program Sieve_Of_Eratosthenes Quote:
> My revised James VanBuskirk source (he posted his original in a recent > topic) has been posted below and in comp.lang.pl1 newsgroup in > response > to a challenge and may be ofinterest here as well. > ! > module primes_1 > contains > subroutine compute_primes(n, count, numbers) > implicit none > integer, intent(in) :: n > integer, intent(out) :: count > integer, allocatable :: numbers(:) > integer, parameter :: NB = bit_size(1) ! usually = 32 > integer :: sieve(0:n/NB+1) ! bit table for possible primes > integer :: i, j, upper > sieve = 0 > sieve(0) = 3 ! 0 and 1 aren't prime > upper = sqrt(n+0.5d0) > do i = 2, upper > if (iand(sieve(i/NB),ishft(1,mod(i,NB))) == 0) then > do j = i**2, n, i > sieve(j/NB) = ior(sieve(j/NB),ishft(1,mod(j,NB))) > end do > end if > end do > count = 0 > do i = 1, n > if (iand(sieve(i/NB),ishft(1,mod(i,NB))) == 0) count = count+1 > end do > allocate( numbers(count) ) > count = 0 > do i = 1, n > if (iand(sieve(i/NB),ishft(1,mod(i,NB))) == 0) then > count = count+1 > numbers(count) = i > end if > end do > end subroutine compute_primes > end module primes_1 > !  > program primes > use primes_1 > integer,allocatable :: numbers(:) > integer :: clock1,clock2, rate, count, n = 10000000 > call system_clock(clock1,rate) > call compute_primes(n,count,numbers) > call system_clock(clock2,rate) > write(*,91) '#Primes = ', count, & > '1st Prime = ', numbers(1), & > 'Last Prime = ', numbers(count) , & !! < > 'time(sec) = ',(clock2clock1)/dble(rate) > 91 format (3(a,io/),a,f0.2)
i0 !! < Quote: > end program primes > ! CVF6.6b results running above on my 833Mhz PC  > #Primes = 664579 > 1st Prime = 2 > Last Prime = 9999991 > time(sec) = 2.35
 Walt Brainerd +18773556640 (voice & fax) The Fortran Company +15207601397 (outside USA)
Tucson, AZ 85750 USA http://www.***.com/

Fri, 15 Jul 2005 00:52:21 GMT 


David Fran #4 / 38

Using Primes calculation as benchmark
I replaced iand/ishft syntax in my previous version with bit functions. This surprisingly dropped runtime from 2.35 to 1.88 sec. Then I adopted Gordon S. 's recommendations made in clp newsgroup to scan odd entries only, this dropped runtime from 1.88 to 1.68 sec. However I was unable to get his odd bit table only with jj indexing to work. !  module primes_1 contains subroutine compute_primes(n, count, numbers) implicit none integer, intent(in) :: n integer, intent(out) :: count integer, allocatable :: numbers(:) integer, parameter :: NB = bit_size(1) ! usually = 32 integer :: sieve(0:n/NB+1) ! bit table for prime candidates integer :: i, j, upper sieve = 0 sieve(0) = 3 ! 0 and 1 aren't prime upper = sqrt(n+0.5d0) do i = 2, upper if (.not.btest(sieve(i/NB),ii/NB*NB)) then do j = i*i, n, i sieve(j/NB) = ibset(sieve(j/NB),jj/NB*NB) end do end if end do count = 1 ! 1st prime =2 count do i = 3,n,2 ! count odd entries only if (.not.btest(sieve(i/NB),ii/NB*NB)) count = count+1 end do allocate( numbers(count) ) count = 1 ; numbers(1) = 2 ! preset 1st prime do i = 3,n,2 ! count odd entries only if (.not.btest(sieve(i/NB),ii/NB*NB)) then count = count+1 numbers(count) = i end if end do end subroutine compute_primes end module primes_1 !  program primes use primes_1 integer,allocatable :: numbers(:) integer :: clock1,clock2, rate, count, n = 10000000 call system_clock(clock1,rate) call compute_primes(n,count,numbers) call system_clock(clock2,rate) write(*,91) '#Primes = ', count, & '1st Prime = ', numbers(1), & 'Last Prime = ', numbers(count), & 'Last Test# = ', n, & 'time(sec) = ',(clock2clock1)/dble(rate) 91 format (4(a,i0/),a,f0.2) end program primes ! results using CVF6.6b on 833Mhz computer ! #Primes = 664579 ! 1st Prime = 2 ! Last Prime = 9999991 ! Last Number= 10000000 ! time(sec) = 1.68

Fri, 15 Jul 2005 15:37:17 GMT 


#5 / 38

Using Primes calculation as benchmark

Wed, 18 Jun 1902 08:00:00 GMT 


David Fran #6 / 38

Using Primes calculation as benchmark
Quote: > o I have noticed that you tend to take a stance of Fortran > advocacy in comp.lang.pl1.
<snip many other comments > James, Your very lengthy critique of my comp.lang.pl1 messages can in the interest of brevity be summarized by responding to your quote above by making the following points. In case you havent noticed there are those who take a stance of PL/I advocacy here in comp.lang.fortran altho not to the degree taken by me over there. The PL/I FAQ states its more powerful than Fortran. Altho undoubtedly true at one time, Its not been true for many years. When I dont post anything for several weeks the level of new topics drops to about 1 or 2 per week reflecting what appears to be IBM's plan to cease their PL/I language development especially for the desktop user. There is only 1 compiler effectively available for the desktop, Personal PL/I at $113, altho there is a regular version as one of IBM's Visual Age languages but it packs a typical IBM pricetag at $2900). IBM is spending billions on Linux development, you would think that if they had any future plans they would have said something by now about a Linux version. In fact there has been NO news for desktop users since about 1999. Occasionally there are migrating to C/C++ VB, etc. topics. but until I showed up, Fortran wasnt being seriously considered as a migration candidate to the degree I think it is currently, even tho IBM's original name of their "new" language in 1965 was FortranV and it definitely shows its roots came from IBM FortranIV. BTW, notice I crossposted this to comp.lang.pl1, those making a response may wish to modify your reply's destination.

Fri, 15 Jul 2005 16:59:17 GMT 


#7 / 38

Using Primes calculation as benchmark

Wed, 18 Jun 1902 08:00:00 GMT 


James Van Buskir #8 / 38

Using Primes calculation as benchmark
Quote: > Gordon S. 's recommendations made in clp newsgroup to scan odd entries > only, this dropped runtime from 1.88 to 1.68 sec. However I was unable > to get his odd bit table only with jj indexing to work.
It's a bit tricky  you have to keep in mind that if the bit position in the sieve table (counting from position 0) is i, then the bit represents the number 2*i+1. Following this through consistently you get code that is measurably faster on my machine (Athlon 1.4 GHz) !  module primes_1 implicit none contains subroutine compute_primes(n, numbers) implicit none integer, intent(in) :: n integer count integer, allocatable :: numbers(:) integer, parameter :: NB = bit_size(1) ! usually = 32 integer :: sieve(0:(n1)/(2*NB)+1) ! bit table for prime candidates integer :: i, j, upper sieve = 0 sieve(0) = 1 ! 1 isn't prime upper = sqrt(n+0.5d0) do i = 1, (upper1)/2 if (.not.btest(sieve(i/NB),ii/NB*NB)) then do j = 2*i*(i+1), (n1)/2, 2*i+1 ! sieve(j/NB) = ibset(sieve(j/NB),jj/NB*NB) sieve(j/NB) = ibset(sieve(j/NB),modulo(j,NB)) ! sieve(j/NB) = ior(sieve(j/NB),ishft(1,modulo(j,NB))) end do end if end do count = 1 ! 1st prime =2 count do i = 1,(n1)/2 ! All entries are odd if (.not.btest(sieve(i/NB),ii/NB*NB)) count = count+1 end do allocate( numbers(count) ) count = 1 ; numbers(1) = 2 ! preset 1st prime do i = 1,(n1)/2 ! All entries are odd if (.not.btest(sieve(i/NB),ii/NB*NB)) then count = count+1 numbers(count) = 2*i+1 end if end do end subroutine compute_primes end module primes_1 !  program primes use primes_1 implicit none integer,allocatable :: numbers(:) integer :: clock1,clock2, rate, n = 10000000 call system_clock(clock1,rate) call compute_primes(n,numbers) call system_clock(clock2,rate) write(*,91) '#Primes = ', size(numbers), & '1st Prime = ', numbers(1), & 'Last Prime = ', numbers(size(numbers)), & 'Last Test# = ', n, & 'time(sec) = ',(clock2clock1)/dble(rate) 91 format (4(a,i0/),a,f0.2) end program primes Mod30 compression rather than mod2 compression as above could make the sieving a little faster still, but is needlessly complicated for your purpose. What really speeds up sieving is buffering: you want to completely sieve an L1 cachesized chunk of the table before moving on to the next chunk. This makes the pattern of memory access much less pernicious. In order to implement it you have to make a small sieve table that allows you to find all the primes that you will want to sieve the full sized table with, then you have to locate the position in the current chunk of each prime in the small table before you execute the sieve loop with it. In my code, I use a fixed buffer, but if that approach were adopted for your code you might have to sieve twice: the first time to find out how big the array you need to allocate will have to be, and the second to find the primes to put in the table. An alternative would be to use a rolling buffer: just move on to the next block of your big sieve table in memory rather than reinitializing a fixed buffer in memory each time you move on to the next chunk. That way after the sieving operation is complete you will have the entire sieve table available to you rather than just the last block. If the above sounds complicated, then all I can say is that it is, but it makes the sieving operation several times faster, especially if the sieve table is quite large. Of course it won't help a bit with the part of your code that writes the primes to your output array. I don't know whether you will be tempted to experiment with this rather complicated but highly efficient optimization, but I thought I would let you know about the next logical step up in complexity.

Fri, 15 Jul 2005 17:23:39 GMT 


David Fran #9 / 38

Using Primes calculation as benchmark
Quote: > There are two typos in the code (see code). > I ran with NAG "f95 O" on a 500MHz Linux PC > and got 3.41 sec. If any of these benchmarks > make any sense at all (I am skeptical), then > this one makes NAG look pretty good (so far).
Walt, You didnt try to translate my source very hard and as a result I dont think your version's timings can be compared.. since you dont run a loop to calc # of primes then allocate caller's array accordingly.. I would think taking your shortcuts would speed things up considerably for you, but note my version is twice as fast at 1.68 sec on same speed PC.

Fri, 15 Jul 2005 17:33:33 GMT 


#10 / 38

Using Primes calculation as benchmark

Wed, 18 Jun 1902 08:00:00 GMT 


Greg Linda #11 / 38

Using Primes calculation as benchmark
Quote:
>BTW, notice I crossposted this to comp.lang.pl1, those making a >response may wish to modify your reply's destination.
Please stop doing this, Dave. You're just annoying both groups. greg

Fri, 15 Jul 2005 17:38:33 GMT 


David Fran #12 / 38

Using Primes calculation as benchmark
Quote:
> >BTW, notice I crossposted this to comp.lang.pl1, those making a > >response may wish to modify your reply's destination. > Please stop doing this, Dave. You're just annoying both groups. > greg
Unlike you Greg, only a small % of my posts are annoying to readers.

Fri, 15 Jul 2005 19:09:37 GMT 


#13 / 38

Using Primes calculation as benchmark

Wed, 18 Jun 1902 08:00:00 GMT 


David Fran #14 / 38

Using Primes calculation as benchmark
Quote:
> > Gordon S. 's recommendations made in clp newsgroup to scan odd entries > > only, this dropped runtime from 1.88 to 1.68 sec. However I was unable > > to get his odd bit table only with jj indexing to work. > It's a bit tricky  you have to keep in mind that if the bit position > in the sieve table (counting from position 0) is i, then the bit > represents the number 2*i+1. Following this through consistently > you get code that is measurably faster on my machine (Athlon 1.4 GHz) > !  > module primes_1 > implicit none > contains > subroutine compute_primes(n, numbers) > implicit none > integer, intent(in) :: n > integer count > integer, allocatable :: numbers(:) > integer, parameter :: NB = bit_size(1) ! usually = 32 > integer :: sieve(0:(n1)/(2*NB)+1) ! bit table for prime candidates > integer :: i, j, upper > sieve = 0 > sieve(0) = 1 ! 1 isn't prime > upper = sqrt(n+0.5d0) > do i = 1, (upper1)/2 > if (.not.btest(sieve(i/NB),ii/NB*NB)) then > do j = 2*i*(i+1), (n1)/2, 2*i+1 > ! sieve(j/NB) = ibset(sieve(j/NB),jj/NB*NB) > sieve(j/NB) = ibset(sieve(j/NB),modulo(j,NB)) > ! sieve(j/NB) = ior(sieve(j/NB),ishft(1,modulo(j,NB))) > end do > end if > end do > count = 1 ! 1st prime =2 count > do i = 1,(n1)/2 ! All entries are odd > if (.not.btest(sieve(i/NB),ii/NB*NB)) count = count+1 > end do > allocate( numbers(count) ) > count = 1 ; numbers(1) = 2 ! preset 1st prime > do i = 1,(n1)/2 ! All entries are odd > if (.not.btest(sieve(i/NB),ii/NB*NB)) then > count = count+1 > numbers(count) = 2*i+1 > end if > end do > end subroutine compute_primes > end module primes_1 > !  > program primes > use primes_1 > implicit none > integer,allocatable :: numbers(:) > integer :: clock1,clock2, rate, n = 10000000 > call system_clock(clock1,rate) > call compute_primes(n,numbers) > call system_clock(clock2,rate) > write(*,91) '#Primes = ', size(numbers), & > '1st Prime = ', numbers(1), & > 'Last Prime = ', numbers(size(numbers)), & > 'Last Test# = ', n, & > 'time(sec) = ',(clock2clock1)/dble(rate) > 91 format (4(a,i0/),a,f0.2) > end program primes > Mod30 compression rather than mod2 compression as above > could make the sieving a little faster still, but is > needlessly complicated for your purpose. What really > speeds up sieving is buffering: you want to completely > sieve an L1 cachesized chunk of the table before moving > on to the next chunk. This makes the pattern of memory > access much less pernicious. In order to implement it > you have to make a small sieve table that allows you to > find all the primes that you will want to sieve the full > sized table with, then you have to locate the position > in the current chunk of each prime in the small table > before you execute the sieve loop with it. In my code, > I use a fixed buffer, but if that approach were adopted > for your code you might have to sieve twice: the first > time to find out how big the array you need to allocate > will have to be, and the second to find the primes to put > in the table. An alternative would be to use a rolling > buffer: just move on to the next block of your big sieve > table in memory rather than reinitializing a fixed buffer > in memory each time you move on to the next chunk. That > way after the sieving operation is complete you will have > the entire sieve table available to you rather than just > the last block. > If the above sounds complicated, then all I can say is that > it is, but it makes the sieving operation several times > faster, especially if the sieve table is quite large. Of > course it won't help a bit with the part of your code that > writes the primes to your output array. I don't know > whether you will be tempted to experiment with this rather > complicated but highly efficient optimization, but I thought > I would let you know about the next logical step up in > complexity.
Thanks for solving the mystery of the odd bit table indexing, Using above, lowers my 10m runtime from 1.68 to 0.92 sec BTW, n = 100m produces below output on my 833Mhz PC. #Primes = 5761455 1st Prime = 2 Last Prime = 99999989 Last Test# = 100000000 time(sec) = 10.78

Fri, 15 Jul 2005 19:31:10 GMT 


#15 / 38

Using Primes calculation as benchmark

Wed, 18 Jun 1902 08:00:00 GMT 


Page 1 of 3

[ 38 post ] 

Go to page:
[1]
[2] [3] 
