Using Primes calculation as benchmark 
Author Message
 Using Primes calculation as benchmark

My revised James VanBuskirk source (he posted his original in a recent
topic) has been posted below and in comp.lang.pl1 newsgroup in
response
to a challenge and may be ofinterest here as well.

!-------------------------------------
module primes_1
contains
subroutine compute_primes(n, count, numbers)
implicit none
integer, intent(in) :: n
integer, intent(out) :: count
integer, allocatable :: numbers(:)
integer, parameter :: NB = bit_size(1)  ! usually = 32
integer :: sieve(0:n/NB+1)    ! bit table for possible primes
integer :: i, j, upper

sieve = 0
sieve(0) = 3  ! 0 and 1 aren't prime
upper = sqrt(n+0.5d0)
do i = 2, upper
   if (iand(sieve(i/NB),ishft(1,mod(i,NB))) == 0) then
      do j = i**2, n, i
         sieve(j/NB) = ior(sieve(j/NB),ishft(1,mod(j,NB)))
      end do
   end if
end do
count = 0
do i = 1, n
   if (iand(sieve(i/NB),ishft(1,mod(i,NB))) == 0) count = count+1
end do
allocate( numbers(count) )
count = 0
do i = 1, n
   if (iand(sieve(i/NB),ishft(1,mod(i,NB))) == 0) then
      count = count+1
      numbers(count) = i
   end if
end do
end subroutine compute_primes
end module primes_1
! ---------------------------------------
program primes
use primes_1
integer,allocatable :: numbers(:)
integer :: clock1,clock2, rate, count, n = 10000000

call system_clock(clock1,rate)
call compute_primes(n,count,numbers)
call system_clock(clock2,rate)

write(*,91) '#Primes    = ', count, &
            '1st Prime  = ', numbers(1), &
            'Last Prime = ', numbers(count)
            'time(sec)  = ',(clock2-clock1)/dble(rate)
91 format (3(a,io/),a,f0.2)
end program primes

!--- CVF6.6b results running above on my 833Mhz PC ---

#Primes    = 664579
1st Prime  = 2
Last Prime = 9999991
time(sec)  = 2.35



Thu, 14 Jul 2005 19:17:02 GMT  
 Using Primes calculation as benchmark


Quote:
>            James VanBuskirk

OK, I took the time to look at some of your posts in c.l.p,
and have some comments to make:

o  You speculated that my code was kludged up from some
   old C source, but as is the case with most of my examples
   it was written from scratch on the spot using CVF.

o  In one of the threads it was mentioned that pl1 has bit
   strings, I think.  These really, to my mind, would make
   the [elided] code much clearer.  Actually fortran can
   implement bit strings, too, as a single-bit LOGICAL KIND.
   M&R recommended their implementation as LOGICAL(0) but I
   am not aware of any vendors who have done so at the
   current time.

o  More than one responder pointed out that the code could
   have been made faster by various means, and this is
   indeed the case.  Indeed, my fastest version can run up
   and count a sieve table to 1.0e9 in 0.641 seconds on my
   1.4 GHz Athlon.  I think the slowest part of the code
   may be where it generates the table of results, so all
   the things I do to speed it up may not help much if it is
   required that the output be an array of INTEGERs, one
   entry per prime.  BTW, I tried the same executable on a
   2.0 GHz P4 at Office Max (you would think that you could
   get kicked out of a store for taking a floppy down to
   the establishment and running some benchmarks, but I have
   found that the clerks are generally actively curious about
   the results you get on various machines...) and it took
   something like 2 seconds to achieve he same result.  I
   think this may have been due mostly to the tiny cache
   size on that processor; my program actually can change
   the target cache size at runtime via user input, but
   doing so caused some Athlon-specific assembly code to
   kick in, resulting in an undefined opcode exception, and
   I never got around to stripping it off and recompiling
   with IFL for a P4 target.

o  I have noticed that you tend to take a stance of Fortran
   advocacy in comp.lang.pl1.  From my experience, when folks
   advocate some non-Fortran language in c.l.f., it doesn't
   tend to give me a positive impression of whatever language
   he advocates, nor of its practitioners.  Granted, you do
   both stimulate and provide code examples and seem to be
   trying to get information about the capabilities and
   performance of pl1 rather than just saying that we must
   all convert to the evil debbil-debbil religion of C++ or
   Java by decree of God Almighty, as the misguided practitioners
   of those latter languages chant from time to time in this
   Forum, but I can see how some c.l.p. participants might
   consider your approach to be a bit off-putting.  I realize that
   you sometimes take what seems to be a confrontational stance
   when you actually just want to persuade readers to put forth
   some extra effort to provide technical information, but I
   worry that some readers may take your approach the wrong way
   and get a bad impression of Fortran programmers as a
   consequence, surely not an outcome that you would consider
   desirable.



Fri, 15 Jul 2005 00:49:25 GMT  
 Using Primes calculation as benchmark
There are two typos in the code (see code).

I ran with NAG "f95 -O" on a 500MHz Linux PC
and got 3.41 sec.  If any of these benchmarks
make any sense at all (I am skeptical), then
this one makes NAG look pretty good (so far).

But wouldn't cpu_time be a better measure?
Maybe not much different--I got 3.32 sec using it.

However, the "obvious", "straightforward" way to do
this doesn't take all that much longer--4.81 sec with
the NAG compiler.  This is a modified version of an
example of F code provided by{*filter*} Hendrickson.

On an 800 MHz Windows XP (Intel) laptop, the F compiler
runs the following "straightforward" version in 3.24 sec.

!!program Sieve_Of_Eratosthenes

!  Find prime numbers using array processing.

!  Strike the Twos and strike the Threes
!  The Sieve of Eratosthenes!
!  When the multiplies sublime
!  The numbers that are left, are prime.

!            From:  Drunkard's Walk, by Frederik Pohl

module sieve

integer, private :: last_number = 10000000
integer, public, dimension(:), allocatable :: numbers
integer, public :: number_of_primes, last_prime
public :: find_primes

contains

subroutine find_primes()

integer :: stat, i, ac

allocate (numbers(last_number), stat=stat)
if (stat > 0) then
    print *, "numbers allocation problem"
    stop
end if

!  Initialize numbers array to 0, 2, 3, ..., last_number--
!  Zero instead of 1 because 1 is a special case.
numbers = (/ 0, (ac, ac = 2, last_number) /)

do i = 2, int(sqrt(real(last_number)))+1
   if (numbers(i) /= 0) then              ! if this number is prime
     numbers(2*i : last_number : i) = 0   ! eliminate all multiples
   endif

!  Count the primes.
number_of_primes = count (numbers /= 0)

!  Find last prime

do i = last_number, 1, -1
    if (numbers(i) /= 0) then
       last_prime = i
       exit
    end if
end do

end subroutine find_primes

end module sieve

program Sieve_Of_Eratosthenes

use sieve

character(len=*), parameter :: &
       f91 = "(3(a,i0,/),a,f0.2)"

real :: start, stop

call cpu_time(start)
call find_primes()
call cpu_time(stop)

print f91, "#Primes    = ", number_of_primes, &
            "1st Prime  = ", numbers(2), &
            "Last Prime = ", numbers(last_prime), &
            "time(sec)  = ", stop - start

end program Sieve_Of_Eratosthenes

Quote:

> My revised James VanBuskirk source (he posted his original in a recent
> topic) has been posted below and in comp.lang.pl1 newsgroup in
> response
> to a challenge and may be ofinterest here as well.

> !-------------------------------------
> module primes_1
> contains
> subroutine compute_primes(n, count, numbers)
> implicit none
> integer, intent(in) :: n
> integer, intent(out) :: count
> integer, allocatable :: numbers(:)
> integer, parameter :: NB = bit_size(1)  ! usually = 32
> integer :: sieve(0:n/NB+1)    ! bit table for possible primes
> integer :: i, j, upper

> sieve = 0
> sieve(0) = 3  ! 0 and 1 aren't prime
> upper = sqrt(n+0.5d0)
> do i = 2, upper
>    if (iand(sieve(i/NB),ishft(1,mod(i,NB))) == 0) then
>       do j = i**2, n, i
>          sieve(j/NB) = ior(sieve(j/NB),ishft(1,mod(j,NB)))
>       end do
>    end if
> end do
> count = 0
> do i = 1, n
>    if (iand(sieve(i/NB),ishft(1,mod(i,NB))) == 0) count = count+1
> end do
> allocate( numbers(count) )
> count = 0
> do i = 1, n
>    if (iand(sieve(i/NB),ishft(1,mod(i,NB))) == 0) then
>       count = count+1
>       numbers(count) = i
>    end if
> end do
> end subroutine compute_primes
> end module primes_1
> ! ---------------------------------------
> program primes
> use primes_1
> integer,allocatable :: numbers(:)
> integer :: clock1,clock2, rate, count, n = 10000000

> call system_clock(clock1,rate)
> call compute_primes(n,count,numbers)
> call system_clock(clock2,rate)

> write(*,91) '#Primes    = ', count, &
>             '1st Prime  = ', numbers(1), &
>             'Last Prime = ', numbers(count)              , & !! <------------------
>             'time(sec)  = ',(clock2-clock1)/dble(rate)
> 91 format (3(a,io/),a,f0.2)

                  i0                !! <---------------------

Quote:
> end program primes

> !--- CVF6.6b results running above on my 833Mhz PC ---

> #Primes    = 664579
> 1st Prime  = 2
> Last Prime = 9999991
> time(sec)  = 2.35

--
Walt Brainerd         +1-877-355-6640 (voice & fax)
The Fortran Company   +1-520-760-1397 (outside USA)

Tucson, AZ 85750 USA   http://www.*-*-*.com/


Fri, 15 Jul 2005 00:52:21 GMT  
 Using Primes calculation as benchmark
I replaced iand/ishft syntax in my previous version with bit
functions.
This surprisingly dropped runtime from 2.35 to 1.88 sec.  Then I
adopted
Gordon S. 's recommendations made in clp newsgroup to scan odd entries
only, this dropped runtime from 1.88 to 1.68 sec. However I was unable
to get his odd bit table only with jj indexing to work.

! -----------------------------------------
module primes_1
contains
subroutine compute_primes(n, count, numbers)
implicit none
integer, intent(in) :: n
integer, intent(out) :: count
integer, allocatable :: numbers(:)
integer, parameter :: NB = bit_size(1)  ! usually = 32
integer :: sieve(0:n/NB+1)    ! bit table for prime candidates
integer :: i, j, upper

sieve = 0
sieve(0) = 3  ! 0 and 1 aren't prime
upper = sqrt(n+0.5d0)
do i = 2, upper
   if (.not.btest(sieve(i/NB),i-i/NB*NB)) then
      do j = i*i, n, i
         sieve(j/NB) = ibset(sieve(j/NB),j-j/NB*NB)
      end do
   end if
end do
count = 1        ! 1st prime =2 count
do i = 3,n,2     ! count odd entries only
   if (.not.btest(sieve(i/NB),i-i/NB*NB)) count = count+1
end do
allocate( numbers(count) )
count = 1 ; numbers(1) = 2   ! preset 1st prime
do i = 3,n,2     ! count odd entries only
   if (.not.btest(sieve(i/NB),i-i/NB*NB)) then
      count = count+1
      numbers(count) = i
   end if
end do
end subroutine compute_primes
end module primes_1
! ---------------------------------------
program primes
use primes_1
integer,allocatable :: numbers(:)
integer :: clock1,clock2, rate, count, n = 10000000

call system_clock(clock1,rate)
call compute_primes(n,count,numbers)
call system_clock(clock2,rate)

write(*,91) '#Primes    = ', count, &
            '1st Prime  = ', numbers(1), &
            'Last Prime = ', numbers(count), &
            'Last Test# = ', n, &
            'time(sec)  = ',(clock2-clock1)/dble(rate)
91 format (4(a,i0/),a,f0.2)
end program primes

! results using CVF6.6b on 833Mhz computer
! #Primes    = 664579
! 1st Prime  = 2
! Last Prime = 9999991
! Last Number= 10000000
! time(sec)  = 1.68



Fri, 15 Jul 2005 15:37:17 GMT  
 Using Primes calculation as benchmark


Wed, 18 Jun 1902 08:00:00 GMT  
 Using Primes calculation as benchmark



Quote:

> o  I have noticed that you tend to take a stance of Fortran
>    advocacy in comp.lang.pl1.

<snip many other comments >

James,
Your very lengthy critique of my comp.lang.pl1 messages can in the
interest of brevity be summarized by responding to your quote above
by making the following points.

In case you havent noticed there are those who take a stance of PL/I
advocacy here in comp.lang.fortran altho not to the degree taken by
me over there.

The PL/I FAQ states its more powerful than Fortran.
Altho undoubtedly true at one time, Its not been true for many years.

When I dont post anything for several weeks the level of new topics
drops to about 1 or 2 per week reflecting what appears to
be IBM's plan to cease their PL/I language development especially for
the desktop user. There is only 1 compiler effectively available for
the desktop, Personal PL/I at $113, altho there is a regular version
as one of IBM's Visual Age languages but it packs a typical IBM
price-tag at $2900).

IBM is spending billions on Linux development, you would think that if
they had any future plans they would have said something by now about
a Linux version. In fact there has been NO news for desktop users
since
about 1999.

Occasionally there are migrating to C/C++ VB, etc. topics.
but until I showed up, Fortran wasnt being seriously considered as
a migration candidate to the degree I think it is currently, even tho
IBM's original name of their "new" language in 1965 was Fortran-V
and it definitely shows its roots came from IBM Fortran-IV.

BTW, notice I cross-posted this to comp.lang.pl1, those making a
response
may wish to modify your reply's destination.



Fri, 15 Jul 2005 16:59:17 GMT  
 Using Primes calculation as benchmark


Wed, 18 Jun 1902 08:00:00 GMT  
 Using Primes calculation as benchmark


Quote:
> Gordon S. 's recommendations made in clp newsgroup to scan odd entries
> only, this dropped runtime from 1.88 to 1.68 sec. However I was unable
> to get his odd bit table only with jj indexing to work.

It's a bit tricky -- you have to keep in mind that if the bit position
in the sieve table (counting from position 0) is i, then the bit
represents the number 2*i+1.  Following this through consistently
you get code that is measurably faster on my machine (Athlon 1.4 GHz)

! -----------------------------------------
module primes_1
implicit none
contains
subroutine compute_primes(n, numbers)
implicit none
integer, intent(in) :: n
integer count
integer, allocatable :: numbers(:)
integer, parameter :: NB = bit_size(1)  ! usually = 32
integer :: sieve(0:(n-1)/(2*NB)+1)    ! bit table for prime candidates
integer :: i, j, upper

sieve = 0
sieve(0) = 1  ! 1 isn't prime
upper = sqrt(n+0.5d0)
do i = 1, (upper-1)/2
   if (.not.btest(sieve(i/NB),i-i/NB*NB)) then
      do j = 2*i*(i+1), (n-1)/2, 2*i+1
!         sieve(j/NB) = ibset(sieve(j/NB),j-j/NB*NB)
         sieve(j/NB) = ibset(sieve(j/NB),modulo(j,NB))
!         sieve(j/NB) = ior(sieve(j/NB),ishft(1,modulo(j,NB)))
      end do
   end if
end do
count = 1        ! 1st prime =2 count
do i = 1,(n-1)/2     ! All entries are odd
   if (.not.btest(sieve(i/NB),i-i/NB*NB)) count = count+1
end do
allocate( numbers(count) )
count = 1 ; numbers(1) = 2   ! preset 1st prime
do i = 1,(n-1)/2     ! All entries are odd
   if (.not.btest(sieve(i/NB),i-i/NB*NB)) then
      count = count+1
      numbers(count) = 2*i+1
   end if
end do
end subroutine compute_primes
end module primes_1
! ---------------------------------------
program primes
use primes_1
implicit none
integer,allocatable :: numbers(:)
integer :: clock1,clock2, rate, n = 10000000

call system_clock(clock1,rate)
call compute_primes(n,numbers)
call system_clock(clock2,rate)

write(*,91) '#Primes    = ', size(numbers), &
            '1st Prime  = ', numbers(1), &
            'Last Prime = ', numbers(size(numbers)), &
            'Last Test# = ', n, &
            'time(sec)  = ',(clock2-clock1)/dble(rate)
91 format (4(a,i0/),a,f0.2)
end program primes

Mod-30 compression rather than mod-2 compression as above
could make the sieving a little faster still, but is
needlessly complicated for your purpose.  What really
speeds up sieving is buffering: you want to completely
sieve an L1 cache-sized chunk of the table before moving
on to the next chunk.  This makes the pattern of memory
access much less pernicious.  In order to implement it
you have to make a small sieve table that allows you to
find all the primes that you will want to sieve the full-
sized table with, then you have to locate the position
in the current chunk of each prime in the small table
before you execute the sieve loop with it.  In my code,
I use a fixed buffer, but if that approach were adopted
for your code you might have to sieve twice: the first
time to find out how big the array you need to allocate
will have to be, and the second to find the primes to put
in the table.  An alternative would be to use a rolling
buffer: just move on to the next block of your big sieve
table in memory rather than reinitializing a fixed buffer
in memory each time you move on to the next chunk.  That
way after the sieving operation is complete you will have
the entire sieve table available to you rather than just
the last block.

If the above sounds complicated, then all I can say is that
it is, but it makes the sieving operation several times
faster, especially if the sieve table is quite large.  Of
course it won't help a bit with the part of your code that
writes the primes to your output array.  I don't know
whether you will be tempted to experiment with this rather
complicated but highly efficient optimization, but I thought
I would let you know about the next logical step up in
complexity.



Fri, 15 Jul 2005 17:23:39 GMT  
 Using Primes calculation as benchmark


Quote:
> There are two typos in the code (see code).

> I ran with NAG "f95 -O" on a 500MHz Linux PC
> and got 3.41 sec.  If any of these benchmarks
> make any sense at all (I am skeptical), then
> this one makes NAG look pretty good (so far).

Walt,
You didnt try to translate my source very hard and as a result
I dont think your version's timings can be compared..
since you dont run a loop to calc # of primes then allocate caller's
array accordingly..

I would think taking your shortcuts would speed things up considerably
for you, but note my version is twice as fast at 1.68 sec
on same speed PC.



Fri, 15 Jul 2005 17:33:33 GMT  
 Using Primes calculation as benchmark


Wed, 18 Jun 1902 08:00:00 GMT  
 Using Primes calculation as benchmark

Quote:

>BTW, notice I cross-posted this to comp.lang.pl1, those making a
>response may wish to modify your reply's destination.

Please stop doing this, Dave. You're just annoying both groups.

greg



Fri, 15 Jul 2005 17:38:33 GMT  
 Using Primes calculation as benchmark


Quote:


> >BTW, notice I cross-posted this to comp.lang.pl1, those making a
> >response may wish to modify your reply's destination.

> Please stop doing this, Dave. You're just annoying both groups.

> greg

Unlike you Greg, only a small % of my posts are annoying to readers.


Fri, 15 Jul 2005 19:09:37 GMT  
 Using Primes calculation as benchmark


Wed, 18 Jun 1902 08:00:00 GMT  
 Using Primes calculation as benchmark



Quote:



> > Gordon S. 's recommendations made in clp newsgroup to scan odd
entries
> > only, this dropped runtime from 1.88 to 1.68 sec. However I was
unable
> > to get his odd bit table only with jj indexing to work.

> It's a bit tricky -- you have to keep in mind that if the bit
position
> in the sieve table (counting from position 0) is i, then the bit
> represents the number 2*i+1.  Following this through consistently
> you get code that is measurably faster on my machine (Athlon 1.4
GHz)

> ! -----------------------------------------
> module primes_1
> implicit none
> contains
> subroutine compute_primes(n, numbers)
> implicit none
> integer, intent(in) :: n
> integer count
> integer, allocatable :: numbers(:)
> integer, parameter :: NB = bit_size(1)  ! usually = 32
> integer :: sieve(0:(n-1)/(2*NB)+1)    ! bit table for prime
candidates
> integer :: i, j, upper

> sieve = 0
> sieve(0) = 1  ! 1 isn't prime
> upper = sqrt(n+0.5d0)
> do i = 1, (upper-1)/2
>    if (.not.btest(sieve(i/NB),i-i/NB*NB)) then
>       do j = 2*i*(i+1), (n-1)/2, 2*i+1
> !         sieve(j/NB) = ibset(sieve(j/NB),j-j/NB*NB)
>          sieve(j/NB) = ibset(sieve(j/NB),modulo(j,NB))
> !         sieve(j/NB) = ior(sieve(j/NB),ishft(1,modulo(j,NB)))
>       end do
>    end if
> end do
> count = 1        ! 1st prime =2 count
> do i = 1,(n-1)/2     ! All entries are odd
>    if (.not.btest(sieve(i/NB),i-i/NB*NB)) count = count+1
> end do
> allocate( numbers(count) )
> count = 1 ; numbers(1) = 2   ! preset 1st prime
> do i = 1,(n-1)/2     ! All entries are odd
>    if (.not.btest(sieve(i/NB),i-i/NB*NB)) then
>       count = count+1
>       numbers(count) = 2*i+1
>    end if
> end do
> end subroutine compute_primes
> end module primes_1
> ! ---------------------------------------
> program primes
> use primes_1
> implicit none
> integer,allocatable :: numbers(:)
> integer :: clock1,clock2, rate, n = 10000000

> call system_clock(clock1,rate)
> call compute_primes(n,numbers)
> call system_clock(clock2,rate)

> write(*,91) '#Primes    = ', size(numbers), &
>             '1st Prime  = ', numbers(1), &
>             'Last Prime = ', numbers(size(numbers)), &
>             'Last Test# = ', n, &
>             'time(sec)  = ',(clock2-clock1)/dble(rate)
> 91 format (4(a,i0/),a,f0.2)
> end program primes

> Mod-30 compression rather than mod-2 compression as above
> could make the sieving a little faster still, but is
> needlessly complicated for your purpose.  What really
> speeds up sieving is buffering: you want to completely
> sieve an L1 cache-sized chunk of the table before moving
> on to the next chunk.  This makes the pattern of memory
> access much less pernicious.  In order to implement it
> you have to make a small sieve table that allows you to
> find all the primes that you will want to sieve the full-
> sized table with, then you have to locate the position
> in the current chunk of each prime in the small table
> before you execute the sieve loop with it.  In my code,
> I use a fixed buffer, but if that approach were adopted
> for your code you might have to sieve twice: the first
> time to find out how big the array you need to allocate
> will have to be, and the second to find the primes to put
> in the table.  An alternative would be to use a rolling
> buffer: just move on to the next block of your big sieve
> table in memory rather than reinitializing a fixed buffer
> in memory each time you move on to the next chunk.  That
> way after the sieving operation is complete you will have
> the entire sieve table available to you rather than just
> the last block.

> If the above sounds complicated, then all I can say is that
> it is, but it makes the sieving operation several times
> faster, especially if the sieve table is quite large.  Of
> course it won't help a bit with the part of your code that
> writes the primes to your output array.  I don't know
> whether you will be tempted to experiment with this rather
> complicated but highly efficient optimization, but I thought
> I would let you know about the next logical step up in
> complexity.

Thanks for solving the mystery of the odd bit table indexing,
Using above, lowers my 10m runtime from 1.68 to 0.92 sec

BTW, n = 100m produces below output on my 833Mhz PC.

#Primes    = 5761455
1st Prime  = 2
Last Prime = 99999989
Last Test# = 100000000
time(sec)  = 10.78



Fri, 15 Jul 2005 19:31:10 GMT  
 Using Primes calculation as benchmark


Wed, 18 Jun 1902 08:00:00 GMT  
 
 [ 38 post ]  Go to page: [1] [2] [3]

 Relevant Pages 

1. Using Primes calculation as benchmark

2. Using Prime calc. as benchmark

3. Using Prime calc. as benchmark

4. Searcing PL/I source for calculation of prime numbers

5. Searching PL/1 source for calculation of prime numbers

6. Searching PL/I source for calculation of prime numbers

7. Using Prime calc.

8. anyone using a PRIME?????

9. Using J as a FoxPro DDE calculation server

10. Report Calculation using a Formaula

11. Weird calculations using SUM

12. Date calculations using Report Writer or CW2.003

 

 
Powered by phpBB® Forum Software