Using Primes calculation as benchmark
Author Message Using Primes calculation as benchmark

My revised James VanBuskirk source (he posted his original in a recent
topic) has been posted below and in comp.lang.pl1 newsgroup in
response
to a challenge and may be ofinterest here as well.

!-------------------------------------
module primes_1
contains
subroutine compute_primes(n, count, numbers)
implicit none
integer, intent(in) :: n
integer, intent(out) :: count
integer, allocatable :: numbers(:)
integer, parameter :: NB = bit_size(1)  ! usually = 32
integer :: sieve(0:n/NB+1)    ! bit table for possible primes
integer :: i, j, upper

sieve = 0
sieve(0) = 3  ! 0 and 1 aren't prime
upper = sqrt(n+0.5d0)
do i = 2, upper
if (iand(sieve(i/NB),ishft(1,mod(i,NB))) == 0) then
do j = i**2, n, i
sieve(j/NB) = ior(sieve(j/NB),ishft(1,mod(j,NB)))
end do
end if
end do
count = 0
do i = 1, n
if (iand(sieve(i/NB),ishft(1,mod(i,NB))) == 0) count = count+1
end do
allocate( numbers(count) )
count = 0
do i = 1, n
if (iand(sieve(i/NB),ishft(1,mod(i,NB))) == 0) then
count = count+1
numbers(count) = i
end if
end do
end subroutine compute_primes
end module primes_1
! ---------------------------------------
program primes
use primes_1
integer,allocatable :: numbers(:)
integer :: clock1,clock2, rate, count, n = 10000000

call system_clock(clock1,rate)
call compute_primes(n,count,numbers)
call system_clock(clock2,rate)

write(*,91) '#Primes    = ', count, &
'1st Prime  = ', numbers(1), &
'Last Prime = ', numbers(count)
'time(sec)  = ',(clock2-clock1)/dble(rate)
91 format (3(a,io/),a,f0.2)
end program primes

!--- CVF6.6b results running above on my 833Mhz PC ---

#Primes    = 664579
1st Prime  = 2
Last Prime = 9999991
time(sec)  = 2.35

Thu, 14 Jul 2005 19:17:02 GMT  Using Primes calculation as benchmark

Quote:
>            James VanBuskirk

OK, I took the time to look at some of your posts in c.l.p,
and have some comments to make:

o  You speculated that my code was kludged up from some
old C source, but as is the case with most of my examples
it was written from scratch on the spot using CVF.

o  In one of the threads it was mentioned that pl1 has bit
strings, I think.  These really, to my mind, would make
the [elided] code much clearer.  Actually fortran can
implement bit strings, too, as a single-bit LOGICAL KIND.
M&R recommended their implementation as LOGICAL(0) but I
am not aware of any vendors who have done so at the
current time.

o  More than one responder pointed out that the code could
have been made faster by various means, and this is
indeed the case.  Indeed, my fastest version can run up
and count a sieve table to 1.0e9 in 0.641 seconds on my
1.4 GHz Athlon.  I think the slowest part of the code
may be where it generates the table of results, so all
the things I do to speed it up may not help much if it is
required that the output be an array of INTEGERs, one
entry per prime.  BTW, I tried the same executable on a
2.0 GHz P4 at Office Max (you would think that you could
get kicked out of a store for taking a floppy down to
the establishment and running some benchmarks, but I have
found that the clerks are generally actively curious about
the results you get on various machines...) and it took
something like 2 seconds to achieve he same result.  I
think this may have been due mostly to the tiny cache
size on that processor; my program actually can change
the target cache size at runtime via user input, but
doing so caused some Athlon-specific assembly code to
kick in, resulting in an undefined opcode exception, and
I never got around to stripping it off and recompiling
with IFL for a P4 target.

o  I have noticed that you tend to take a stance of Fortran
advocacy in comp.lang.pl1.  From my experience, when folks
advocate some non-Fortran language in c.l.f., it doesn't
tend to give me a positive impression of whatever language
he advocates, nor of its practitioners.  Granted, you do
both stimulate and provide code examples and seem to be
trying to get information about the capabilities and
performance of pl1 rather than just saying that we must
all convert to the evil debbil-debbil religion of C++ or
Java by decree of God Almighty, as the misguided practitioners
of those latter languages chant from time to time in this
Forum, but I can see how some c.l.p. participants might
consider your approach to be a bit off-putting.  I realize that
you sometimes take what seems to be a confrontational stance
some extra effort to provide technical information, but I
and get a bad impression of Fortran programmers as a
consequence, surely not an outcome that you would consider
desirable.

Fri, 15 Jul 2005 00:49:25 GMT  Using Primes calculation as benchmark
There are two typos in the code (see code).

I ran with NAG "f95 -O" on a 500MHz Linux PC
and got 3.41 sec.  If any of these benchmarks
make any sense at all (I am skeptical), then
this one makes NAG look pretty good (so far).

But wouldn't cpu_time be a better measure?
Maybe not much different--I got 3.32 sec using it.

However, the "obvious", "straightforward" way to do
this doesn't take all that much longer--4.81 sec with
the NAG compiler.  This is a modified version of an
example of F code provided by{*filter*} Hendrickson.

On an 800 MHz Windows XP (Intel) laptop, the F compiler
runs the following "straightforward" version in 3.24 sec.

!!program Sieve_Of_Eratosthenes

!  Find prime numbers using array processing.

!  Strike the Twos and strike the Threes
!  The Sieve of Eratosthenes!
!  When the multiplies sublime
!  The numbers that are left, are prime.

!            From:  Drunkard's Walk, by Frederik Pohl

module sieve

integer, private :: last_number = 10000000
integer, public, dimension(:), allocatable :: numbers
integer, public :: number_of_primes, last_prime
public :: find_primes

contains

subroutine find_primes()

integer :: stat, i, ac

allocate (numbers(last_number), stat=stat)
if (stat > 0) then
print *, "numbers allocation problem"
stop
end if

!  Initialize numbers array to 0, 2, 3, ..., last_number--
!  Zero instead of 1 because 1 is a special case.
numbers = (/ 0, (ac, ac = 2, last_number) /)

do i = 2, int(sqrt(real(last_number)))+1
if (numbers(i) /= 0) then              ! if this number is prime
numbers(2*i : last_number : i) = 0   ! eliminate all multiples
endif

!  Count the primes.
number_of_primes = count (numbers /= 0)

!  Find last prime

do i = last_number, 1, -1
if (numbers(i) /= 0) then
last_prime = i
exit
end if
end do

end subroutine find_primes

end module sieve

program Sieve_Of_Eratosthenes

use sieve

character(len=*), parameter :: &
f91 = "(3(a,i0,/),a,f0.2)"

real :: start, stop

call cpu_time(start)
call find_primes()
call cpu_time(stop)

print f91, "#Primes    = ", number_of_primes, &
"1st Prime  = ", numbers(2), &
"Last Prime = ", numbers(last_prime), &
"time(sec)  = ", stop - start

end program Sieve_Of_Eratosthenes

Quote:

> My revised James VanBuskirk source (he posted his original in a recent
> topic) has been posted below and in comp.lang.pl1 newsgroup in
> response
> to a challenge and may be ofinterest here as well.

> !-------------------------------------
> module primes_1
> contains
> subroutine compute_primes(n, count, numbers)
> implicit none
> integer, intent(in) :: n
> integer, intent(out) :: count
> integer, allocatable :: numbers(:)
> integer, parameter :: NB = bit_size(1)  ! usually = 32
> integer :: sieve(0:n/NB+1)    ! bit table for possible primes
> integer :: i, j, upper

> sieve = 0
> sieve(0) = 3  ! 0 and 1 aren't prime
> upper = sqrt(n+0.5d0)
> do i = 2, upper
>    if (iand(sieve(i/NB),ishft(1,mod(i,NB))) == 0) then
>       do j = i**2, n, i
>          sieve(j/NB) = ior(sieve(j/NB),ishft(1,mod(j,NB)))
>       end do
>    end if
> end do
> count = 0
> do i = 1, n
>    if (iand(sieve(i/NB),ishft(1,mod(i,NB))) == 0) count = count+1
> end do
> allocate( numbers(count) )
> count = 0
> do i = 1, n
>    if (iand(sieve(i/NB),ishft(1,mod(i,NB))) == 0) then
>       count = count+1
>       numbers(count) = i
>    end if
> end do
> end subroutine compute_primes
> end module primes_1
> ! ---------------------------------------
> program primes
> use primes_1
> integer,allocatable :: numbers(:)
> integer :: clock1,clock2, rate, count, n = 10000000

> call system_clock(clock1,rate)
> call compute_primes(n,count,numbers)
> call system_clock(clock2,rate)

> write(*,91) '#Primes    = ', count, &
>             '1st Prime  = ', numbers(1), &
>             'Last Prime = ', numbers(count)              , & !! <------------------
>             'time(sec)  = ',(clock2-clock1)/dble(rate)
> 91 format (3(a,io/),a,f0.2)

i0                !! <---------------------

Quote:
> end program primes

> !--- CVF6.6b results running above on my 833Mhz PC ---

> #Primes    = 664579
> 1st Prime  = 2
> Last Prime = 9999991
> time(sec)  = 2.35

--
Walt Brainerd         +1-877-355-6640 (voice & fax)
The Fortran Company   +1-520-760-1397 (outside USA)

Tucson, AZ 85750 USA   http://www.*-*-*.com/

Fri, 15 Jul 2005 00:52:21 GMT  Using Primes calculation as benchmark
I replaced iand/ishft syntax in my previous version with bit
functions.
This surprisingly dropped runtime from 2.35 to 1.88 sec.  Then I
Gordon S. 's recommendations made in clp newsgroup to scan odd entries
only, this dropped runtime from 1.88 to 1.68 sec. However I was unable
to get his odd bit table only with jj indexing to work.

! -----------------------------------------
module primes_1
contains
subroutine compute_primes(n, count, numbers)
implicit none
integer, intent(in) :: n
integer, intent(out) :: count
integer, allocatable :: numbers(:)
integer, parameter :: NB = bit_size(1)  ! usually = 32
integer :: sieve(0:n/NB+1)    ! bit table for prime candidates
integer :: i, j, upper

sieve = 0
sieve(0) = 3  ! 0 and 1 aren't prime
upper = sqrt(n+0.5d0)
do i = 2, upper
if (.not.btest(sieve(i/NB),i-i/NB*NB)) then
do j = i*i, n, i
sieve(j/NB) = ibset(sieve(j/NB),j-j/NB*NB)
end do
end if
end do
count = 1        ! 1st prime =2 count
do i = 3,n,2     ! count odd entries only
if (.not.btest(sieve(i/NB),i-i/NB*NB)) count = count+1
end do
allocate( numbers(count) )
count = 1 ; numbers(1) = 2   ! preset 1st prime
do i = 3,n,2     ! count odd entries only
if (.not.btest(sieve(i/NB),i-i/NB*NB)) then
count = count+1
numbers(count) = i
end if
end do
end subroutine compute_primes
end module primes_1
! ---------------------------------------
program primes
use primes_1
integer,allocatable :: numbers(:)
integer :: clock1,clock2, rate, count, n = 10000000

call system_clock(clock1,rate)
call compute_primes(n,count,numbers)
call system_clock(clock2,rate)

write(*,91) '#Primes    = ', count, &
'1st Prime  = ', numbers(1), &
'Last Prime = ', numbers(count), &
'Last Test# = ', n, &
'time(sec)  = ',(clock2-clock1)/dble(rate)
91 format (4(a,i0/),a,f0.2)
end program primes

! results using CVF6.6b on 833Mhz computer
! #Primes    = 664579
! 1st Prime  = 2
! Last Prime = 9999991
! Last Number= 10000000
! time(sec)  = 1.68

Fri, 15 Jul 2005 15:37:17 GMT  Using Primes calculation as benchmark

Wed, 18 Jun 1902 08:00:00 GMT  Using Primes calculation as benchmark

Quote:

> o  I have noticed that you tend to take a stance of Fortran

James,
Your very lengthy critique of my comp.lang.pl1 messages can in the
interest of brevity be summarized by responding to your quote above
by making the following points.

In case you havent noticed there are those who take a stance of PL/I
advocacy here in comp.lang.fortran altho not to the degree taken by
me over there.

The PL/I FAQ states its more powerful than Fortran.
Altho undoubtedly true at one time, Its not been true for many years.

When I dont post anything for several weeks the level of new topics
drops to about 1 or 2 per week reflecting what appears to
be IBM's plan to cease their PL/I language development especially for
the desktop user. There is only 1 compiler effectively available for
the desktop, Personal PL/I at \$113, altho there is a regular version
as one of IBM's Visual Age languages but it packs a typical IBM
price-tag at \$2900).

IBM is spending billions on Linux development, you would think that if
they had any future plans they would have said something by now about
a Linux version. In fact there has been NO news for desktop users
since

Occasionally there are migrating to C/C++ VB, etc. topics.
but until I showed up, Fortran wasnt being seriously considered as
a migration candidate to the degree I think it is currently, even tho
IBM's original name of their "new" language in 1965 was Fortran-V
and it definitely shows its roots came from IBM Fortran-IV.

BTW, notice I cross-posted this to comp.lang.pl1, those making a
response

Fri, 15 Jul 2005 16:59:17 GMT  Using Primes calculation as benchmark

Wed, 18 Jun 1902 08:00:00 GMT  Using Primes calculation as benchmark

Quote:
> Gordon S. 's recommendations made in clp newsgroup to scan odd entries
> only, this dropped runtime from 1.88 to 1.68 sec. However I was unable
> to get his odd bit table only with jj indexing to work.

It's a bit tricky -- you have to keep in mind that if the bit position
in the sieve table (counting from position 0) is i, then the bit
represents the number 2*i+1.  Following this through consistently
you get code that is measurably faster on my machine (Athlon 1.4 GHz)

! -----------------------------------------
module primes_1
implicit none
contains
subroutine compute_primes(n, numbers)
implicit none
integer, intent(in) :: n
integer count
integer, allocatable :: numbers(:)
integer, parameter :: NB = bit_size(1)  ! usually = 32
integer :: sieve(0:(n-1)/(2*NB)+1)    ! bit table for prime candidates
integer :: i, j, upper

sieve = 0
sieve(0) = 1  ! 1 isn't prime
upper = sqrt(n+0.5d0)
do i = 1, (upper-1)/2
if (.not.btest(sieve(i/NB),i-i/NB*NB)) then
do j = 2*i*(i+1), (n-1)/2, 2*i+1
!         sieve(j/NB) = ibset(sieve(j/NB),j-j/NB*NB)
sieve(j/NB) = ibset(sieve(j/NB),modulo(j,NB))
!         sieve(j/NB) = ior(sieve(j/NB),ishft(1,modulo(j,NB)))
end do
end if
end do
count = 1        ! 1st prime =2 count
do i = 1,(n-1)/2     ! All entries are odd
if (.not.btest(sieve(i/NB),i-i/NB*NB)) count = count+1
end do
allocate( numbers(count) )
count = 1 ; numbers(1) = 2   ! preset 1st prime
do i = 1,(n-1)/2     ! All entries are odd
if (.not.btest(sieve(i/NB),i-i/NB*NB)) then
count = count+1
numbers(count) = 2*i+1
end if
end do
end subroutine compute_primes
end module primes_1
! ---------------------------------------
program primes
use primes_1
implicit none
integer,allocatable :: numbers(:)
integer :: clock1,clock2, rate, n = 10000000

call system_clock(clock1,rate)
call compute_primes(n,numbers)
call system_clock(clock2,rate)

write(*,91) '#Primes    = ', size(numbers), &
'1st Prime  = ', numbers(1), &
'Last Prime = ', numbers(size(numbers)), &
'Last Test# = ', n, &
'time(sec)  = ',(clock2-clock1)/dble(rate)
91 format (4(a,i0/),a,f0.2)
end program primes

Mod-30 compression rather than mod-2 compression as above
could make the sieving a little faster still, but is
needlessly complicated for your purpose.  What really
speeds up sieving is buffering: you want to completely
sieve an L1 cache-sized chunk of the table before moving
on to the next chunk.  This makes the pattern of memory
access much less pernicious.  In order to implement it
you have to make a small sieve table that allows you to
find all the primes that you will want to sieve the full-
sized table with, then you have to locate the position
in the current chunk of each prime in the small table
before you execute the sieve loop with it.  In my code,
I use a fixed buffer, but if that approach were adopted
for your code you might have to sieve twice: the first
time to find out how big the array you need to allocate
will have to be, and the second to find the primes to put
in the table.  An alternative would be to use a rolling
buffer: just move on to the next block of your big sieve
table in memory rather than reinitializing a fixed buffer
in memory each time you move on to the next chunk.  That
way after the sieving operation is complete you will have
the entire sieve table available to you rather than just
the last block.

If the above sounds complicated, then all I can say is that
it is, but it makes the sieving operation several times
faster, especially if the sieve table is quite large.  Of
course it won't help a bit with the part of your code that
writes the primes to your output array.  I don't know
whether you will be tempted to experiment with this rather
complicated but highly efficient optimization, but I thought
I would let you know about the next logical step up in
complexity.

Fri, 15 Jul 2005 17:23:39 GMT  Using Primes calculation as benchmark

Quote:
> There are two typos in the code (see code).

> I ran with NAG "f95 -O" on a 500MHz Linux PC
> and got 3.41 sec.  If any of these benchmarks
> make any sense at all (I am skeptical), then
> this one makes NAG look pretty good (so far).

Walt,
You didnt try to translate my source very hard and as a result
I dont think your version's timings can be compared..
since you dont run a loop to calc # of primes then allocate caller's
array accordingly..

I would think taking your shortcuts would speed things up considerably
for you, but note my version is twice as fast at 1.68 sec
on same speed PC.

Fri, 15 Jul 2005 17:33:33 GMT  Using Primes calculation as benchmark

Wed, 18 Jun 1902 08:00:00 GMT  Using Primes calculation as benchmark

Quote:

>BTW, notice I cross-posted this to comp.lang.pl1, those making a

Please stop doing this, Dave. You're just annoying both groups.

greg

Fri, 15 Jul 2005 17:38:33 GMT  Using Primes calculation as benchmark

Quote:

> >BTW, notice I cross-posted this to comp.lang.pl1, those making a

> Please stop doing this, Dave. You're just annoying both groups.

> greg

Unlike you Greg, only a small % of my posts are annoying to readers.

Fri, 15 Jul 2005 19:09:37 GMT  Using Primes calculation as benchmark

Wed, 18 Jun 1902 08:00:00 GMT  Using Primes calculation as benchmark

Quote:

> > Gordon S. 's recommendations made in clp newsgroup to scan odd
entries
> > only, this dropped runtime from 1.88 to 1.68 sec. However I was
unable
> > to get his odd bit table only with jj indexing to work.

> It's a bit tricky -- you have to keep in mind that if the bit
position
> in the sieve table (counting from position 0) is i, then the bit
> represents the number 2*i+1.  Following this through consistently
> you get code that is measurably faster on my machine (Athlon 1.4
GHz)

> ! -----------------------------------------
> module primes_1
> implicit none
> contains
> subroutine compute_primes(n, numbers)
> implicit none
> integer, intent(in) :: n
> integer count
> integer, allocatable :: numbers(:)
> integer, parameter :: NB = bit_size(1)  ! usually = 32
> integer :: sieve(0:(n-1)/(2*NB)+1)    ! bit table for prime
candidates
> integer :: i, j, upper

> sieve = 0
> sieve(0) = 1  ! 1 isn't prime
> upper = sqrt(n+0.5d0)
> do i = 1, (upper-1)/2
>    if (.not.btest(sieve(i/NB),i-i/NB*NB)) then
>       do j = 2*i*(i+1), (n-1)/2, 2*i+1
> !         sieve(j/NB) = ibset(sieve(j/NB),j-j/NB*NB)
>          sieve(j/NB) = ibset(sieve(j/NB),modulo(j,NB))
> !         sieve(j/NB) = ior(sieve(j/NB),ishft(1,modulo(j,NB)))
>       end do
>    end if
> end do
> count = 1        ! 1st prime =2 count
> do i = 1,(n-1)/2     ! All entries are odd
>    if (.not.btest(sieve(i/NB),i-i/NB*NB)) count = count+1
> end do
> allocate( numbers(count) )
> count = 1 ; numbers(1) = 2   ! preset 1st prime
> do i = 1,(n-1)/2     ! All entries are odd
>    if (.not.btest(sieve(i/NB),i-i/NB*NB)) then
>       count = count+1
>       numbers(count) = 2*i+1
>    end if
> end do
> end subroutine compute_primes
> end module primes_1
> ! ---------------------------------------
> program primes
> use primes_1
> implicit none
> integer,allocatable :: numbers(:)
> integer :: clock1,clock2, rate, n = 10000000

> call system_clock(clock1,rate)
> call compute_primes(n,numbers)
> call system_clock(clock2,rate)

> write(*,91) '#Primes    = ', size(numbers), &
>             '1st Prime  = ', numbers(1), &
>             'Last Prime = ', numbers(size(numbers)), &
>             'Last Test# = ', n, &
>             'time(sec)  = ',(clock2-clock1)/dble(rate)
> 91 format (4(a,i0/),a,f0.2)
> end program primes

> Mod-30 compression rather than mod-2 compression as above
> could make the sieving a little faster still, but is
> needlessly complicated for your purpose.  What really
> speeds up sieving is buffering: you want to completely
> sieve an L1 cache-sized chunk of the table before moving
> on to the next chunk.  This makes the pattern of memory
> access much less pernicious.  In order to implement it
> you have to make a small sieve table that allows you to
> find all the primes that you will want to sieve the full-
> sized table with, then you have to locate the position
> in the current chunk of each prime in the small table
> before you execute the sieve loop with it.  In my code,
> I use a fixed buffer, but if that approach were adopted
> for your code you might have to sieve twice: the first
> time to find out how big the array you need to allocate
> will have to be, and the second to find the primes to put
> in the table.  An alternative would be to use a rolling
> buffer: just move on to the next block of your big sieve
> table in memory rather than reinitializing a fixed buffer
> in memory each time you move on to the next chunk.  That
> way after the sieving operation is complete you will have
> the entire sieve table available to you rather than just
> the last block.

> If the above sounds complicated, then all I can say is that
> it is, but it makes the sieving operation several times
> faster, especially if the sieve table is quite large.  Of
> course it won't help a bit with the part of your code that
> writes the primes to your output array.  I don't know
> whether you will be tempted to experiment with this rather
> complicated but highly efficient optimization, but I thought
> I would let you know about the next logical step up in
> complexity.

Thanks for solving the mystery of the odd bit table indexing,
Using above, lowers my 10m runtime from 1.68 to 0.92 sec

BTW, n = 100m produces below output on my 833Mhz PC.

#Primes    = 5761455
1st Prime  = 2
Last Prime = 99999989
Last Test# = 100000000
time(sec)  = 10.78

Fri, 15 Jul 2005 19:31:10 GMT  Using Primes calculation as benchmark

Wed, 18 Jun 1902 08:00:00 GMT

 Page 1 of 3 [ 38 post ] Go to page:   

Relevant Pages