Performance of Personal PL/I on a Thinkpad 600 (300MHz Pentium II) 
Author Message
 Performance of Personal PL/I on a Thinkpad 600 (300MHz Pentium II)

Disclaimer:  Ordinarily I neither respond to nor comment on David Frank's posts.
However, occasionally he posts something that raises a question of qenuine
interest to those interested in PL/I or makes a statement that needs to be
rebutted.  I think his post today diserves comment on both grounds.

The Problem: Today David Frank posted timings of a fortran program which
multiplied each element of an n x n array, a,  by a constant, x, in two ways:
one using the statement a=a*x and the second using two nested loops.  He stated
that for n=3000 each method took .551 sec on a 833MHz machine.  He went on to
say that he would expect a PL/I program that did the same thing to take about 5
times as long.

I was curious to see how well Personal PL/I for OS/2 running on my Thinkpad 600
(a 300MHz Pentium II) would do on this problem.  The program David Frank posted
consisted of two parts: 1) a separately compiled subroutine which performed the
desired multiplication of an array by a scalar (both passed as parameters) using
one or the other of the two methods mentioned above depending on a Boolean
parameter; and, 2) a main program which called the subroutine twice (once for
each value of the Boolean parameter) surrounding the calls with calls to the
system timing routine, and reported the two ellapsed times.

For part 1 I wrote the following subroutine, compiled it, and added it to a
subroutine library:

%process mar(2,100);
  sub: proc(a,x,f) reorder;
   dcl (a(*,*),x) bin float(53),
    (i,j) bin fixed,
    f bit(1) aligned,
    (lbound,hbound) builtin;
   if f then a=a*x;
   else do i=lbound(a,1) to hbound(a,1);
    do j=lbound(a,2) to hbound(a,2); a(i,j)=a(i,j)*x; end;
    end;
   end sub;

Needless to say, it was not necessary to pass the size of the array as a
parameter since in PL/I the lbound and hbound builtin functions can be used to
find out the bounds of an array.

For part 2), I wrote the following main program:

%process mar(2,100);
  test: proc options(main) reorder;
   dcl siz bin fixed value(3000),
    a(siz,siz) bin float(53) init((siz*siz)(random)),
    (x value(1.23),y) bin float(53),
    (t1,t2,t3) char(17),
    sub entry((*,*) bin float(53),bin float(53),bit(1) aligned),
    sysprint file print,
    (datetime,random,secs) builtin;
   t1=datetime; call sub(a,x,'1'b); t2=datetime; call sub(a,x,'0'b); t3=datetime;
   y=secs(t2); put file(sysprint) edit(y-secs(t1),secs(t3)-y) (2 f(8,3));
   end test;

This program was compiled and executed and produced the following output:

    1.110   1.010

Over ten repeated executions, neither valued varied by more than +/- .01.

Discussion:  First of all these values are approximately twice those obtained by
David Frank, while his machine operates at about 2.77 times the clock rate of my
machine.  Thus for this program, Personal PL/I appears to outperform FORTRAN.
The portion of the program being timed is computation bound.  In fact it
involves no I/O whatsoever.  This should lay to rest any claim that PL/I code is
inefficient in relation to FORTRAN for this problem.

Second, it is of interest to note that the time for the a=a*x version is about
.1 second slower than the nested loops.  I added the list option and examined
the assembler code.  It turns out that the a=a*x code uses 32 bit registers for
the subscripts, whereas I used 16 bit variables which the compiler places in 16
bit registers.  The cumulative effect of 9,003,000 16 bit increments as opposed
to 9,003,000 32 bit increments is about .1 sec.  I verified this by changing i
and j to bin fixed(31) in sub and running the program again.  Indeed the two
timings were now equal.

It then occurred to me that by using a separately compiled subroutine rather
than inline code the compiler had to make worst case assumptions.  To analyze
the effect of this I tried two experiments.

First, I changed x in sub to a byvalue parameter, since otherwise the compiler
had to assume that x might be bound to an element of a and therefroe had to
fetch it from storage for each element of a.  This change reduced the time for
the nested loops by about .1 second but had no effect on the time for the a=a*x
version.  The reason turned out to be that the compiler more or less got painted
into a corner in trying to keep x, and the two anonymous subscripts in 32 bit
registers.  In the nested loop version, i and j were in 16 bit registers and
there were enough 32 bit registers to go around.  Actually it would still be
easy for a human programmer to improve the code, but it is certainly quite good.

Second, I eliminated sub altogether and simply placed the two computations in
line between the three calls to datetime.  This reduced the times to .590 and
.600.  Again over ten repititions there were occasional variations of =/- .010
second.  With the inline computation, the compiler knew that the subscripts were
varying between 1 and 3000 and it achieved an almost two fold speedup by partly
unrolling the inner loop and eliminating integer multiplications.  Specifically,
it eliminated entirely the use of the imul instruction in the subscript
computations and instead used the ability of the hardware to multiply subscripts
by small powers of 2 (i.e., 8) on the fly.  Moreover it operated on six array
elements in each "iteration" reducing the number of comparisons and conditional
branches by a factor of six.  This worked because 6 is an exact divisor of 3000.
  To see what would happen if this were not the case, I changed siz to 2999, a
prime number.  The timing did not change.  This time, for the a=a*x version, the
first three elements of each row were handled in the outer loop and four
elements were handled in each iteration of the inner loop.  However for the
explicit nested loop version the first two elements of each row were handled in
the outer loop and the inner loop dealt with three elements per iteration.
Obviously the bulk of the speedup comes from eliminating two imul instructions
per iteration.  I am at a loss to explain why the compiler chose a different
breakdown for the two essentially identical iterations.  In any event the
effective performance was for all intents and purposes identical and almost
twice as good as that obtained under worst case assumptions.



Tue, 28 Jun 2005 16:16:02 GMT  
 Performance of Personal PL/I on a Thinkpad 600 (300MHz Pentium II)

Quote:
> I was curious to see how well Personal PL/I for OS/2 running on my Thinkpad 600
> (a 300MHz Pentium II) would do on this problem.  The program David Frank posted
> consisted of two parts: 1) a separately compiled subroutine which performed the
> desired multiplication of an array by a scalar (both passed as parameters) using
> one or the other of the two methods mentioned above depending on a Boolean
> parameter; and, 2) a main program which called the subroutine twice (once for
> each value of the Boolean parameter) surrounding the calls with calls to the
> system timing routine, and reported the two ellapsed times.

For your tests:

   What was the level of optimization?
   Did you have the REORDER option specified?



Tue, 28 Jun 2005 19:22:32 GMT  
 Performance of Personal PL/I on a Thinkpad 600 (300MHz Pentium II)

Quote:

> For your tests:

>    What was the level of optimization?

  opt(2)

Quote:
>    Did you have the REORDER option specified?

  Yes.


Wed, 29 Jun 2005 03:27:36 GMT  
 Performance of Personal PL/I on a Thinkpad 600 (300MHz Pentium II)


Quote:
> Disclaimer:  Ordinarily I neither respond to nor comment on David Frank's posts.
> However, occasionally he posts something that raises a question of qenuine
> interest to those interested in PL/I or makes a statement that needs to be
> rebutted.  I think his post today diserves comment on both grounds.

<snip>
> Discussion:  First of all these values are approximately twice those obtained by
> David Frank, while his machine operates at about 2.77 times the clock rate of my
> machine.  Thus for this program, Personal PL/I appears to outperform FORTRAN.
> The portion of the program being timed is computation bound.  In fact it
> involves no I/O whatsoever.  This should lay to rest any claim that PL/I code is
> inefficient in relation to FORTRAN for this problem.

If you send me your exe (or make it available for download from your site) I can make a Fortran vs. PL/I
comparison on my hardware leaving your hardware out of the equation.


Wed, 29 Jun 2005 15:49:39 GMT  
 Performance of Personal PL/I on a Thinkpad 600 (300MHz Pentium II)

<snip>
: Discussion:  First of all these values are approximately twice those obtained by
: David Frank, while his machine operates at about 2.77 times the clock rate of my
: machine.  Thus for this program, Personal PL/I appears to outperform FORTRAN.
: The portion of the program being timed is computation bound.  In fact it
: involves no I/O whatsoever.  This should lay to rest any claim that PL/I code is
: inefficient in relation to FORTRAN for this problem.
<snip>
The benchmark is irrelevant for comparing compiler performance, its
main ingerdient is memory bandwidth:
In portion I removed you gave the exact routine you used for timing -- the
objective was to multiplay each element of 3000x3000 double precision array
by a double x. Your best timing was around 0.5s. So the result is around
18 megaflops -- mediocre for 300Mhz Pentium II. However, to do the
multiplication one has to read the array and then write it back meaning
72 MB read and 72 MB written meaning 144MB read and 144MB written per
second -- I would say impressive for 300Mz laptop.

--
                              Waldek Hebisch



Thu, 30 Jun 2005 04:39:52 GMT  
 Performance of Personal PL/I on a Thinkpad 600 (300MHz Pentium II)
That's pretty rich, you wanting the exe made by a compiler you claim
doesn't even exist. I'll bet you're one of those Florida seniors who
can't even punch cards right; that would explain your sloppy Fortran
programming skills. BTW, exactly why have you got such a hard-on about
PL/1; did sometime in the past did you have to stay on the ol'IBM 029
while the PL/1 programmers got the 3270s? Please provide details to

programs as Slashdot wants to do a story on cutting edge Fortran
nerds.


Thu, 30 Jun 2005 11:15:25 GMT  
 Performance of Personal PL/I on a Thinkpad 600 (300MHz Pentium II)


Quote:

> <snip>
> : Discussion:  First of all these values are approximately twice
those obtained by
> : David Frank, while his machine operates at about 2.77 times the
clock rate of my
> : machine.  Thus for this program, Personal PL/I appears to
outperform FORTRAN.
> : The portion of the program being timed is computation bound.
In fact it
> : involves no I/O whatsoever.  This should lay to rest any claim
that PL/I code is
> : inefficient in relation to FORTRAN for this problem.
> <snip>
> The benchmark is irrelevant for comparing compiler performance,
its
> main ingerdient is memory bandwidth:
> In portion I removed you gave the exact routine you used for
timing -- the
> objective was to multiplay each element of 3000x3000 double
precision array
> by a double x. Your best timing was around 0.5s. So the result is
around
> 18 megaflops -- mediocre for 300Mhz Pentium II. However, to do
the
> multiplication one has to read the array and then write it back
meaning
> 72 MB read and 72 MB written meaning 144MB read and 144MB written
per
> second -- I would say impressive for 300Mz laptop.

I would say not completely irrelevant.  The discussion on
comp.lang.fortran includes a case where the compiler makes a
temporary array, then copies the result to the destination.  That
would be twice the memory bandwidth required.  There are many cases
where PL/I must generate temporary arrays, where it allows array
operations that Fortran doesn't.  (Or didn't last I knew.)   So the
real question here is, when does the compiler generate a temporary
array.

Consider:

  CALL SUB(2*A);

where A is an array.  Except for built-in functions that the
compiler can inline, this requires a temporary array.  I don't know
if Fortran allows array expressions in subroutine arguments.
(Cross posted so someone can answer.)

-- glen



Thu, 30 Jun 2005 11:26:18 GMT  
 Performance of Personal PL/I on a Thinkpad 600 (300MHz Pentium II)

Quote:




> > <snip>
> > : Discussion:  First of all these values are approximately twice
> those obtained by
> > : David Frank, while his machine operates at about 2.77 times the
> clock rate of my
> > : machine.  Thus for this program, Personal PL/I appears to
> outperform FORTRAN.
> > : The portion of the program being timed is computation bound.
> In fact it
> > : involves no I/O whatsoever.  This should lay to rest any claim
> that PL/I code is
> > : inefficient in relation to FORTRAN for this problem.
> > <snip>
> > The benchmark is irrelevant for comparing compiler performance,
> its
> > main ingerdient is memory bandwidth:
> > In portion I removed you gave the exact routine you used for
> timing -- the
> > objective was to multiplay each element of 3000x3000 double
> precision array
> > by a double x. Your best timing was around 0.5s. So the result is
> around
> > 18 megaflops -- mediocre for 300Mhz Pentium II. However, to do
> the
> > multiplication one has to read the array and then write it back
> meaning
> > 72 MB read and 72 MB written meaning 144MB read and 144MB written
> per
> > second -- I would say impressive for 300Mz laptop.

> I would say not completely irrelevant.  The discussion on
> comp.lang.fortran includes a case where the compiler makes a
> temporary array, then copies the result to the destination.  That
> would be twice the memory bandwidth required.  There are many cases
> where PL/I must generate temporary arrays, where it allows array
> operations that Fortran doesn't.  (Or didn't last I knew.)   So the
> real question here is, when does the compiler generate a temporary
> array.

> Consider:

>   CALL SUB(2*A);

> where A is an array.  Except for built-in functions that the
> compiler can inline, this requires a temporary array.  I don't know
> if Fortran allows array expressions in subroutine arguments.

It does.

- Show quoted text -

Quote:
> -- glen



Thu, 30 Jun 2005 15:44:40 GMT  
 Performance of Personal PL/I on a Thinkpad 600 (300MHz Pentium II)


Quote:
> That's pretty rich, you wanting the exe made by a compiler you claim
> doesn't even exist.

You sound shocked, its my guess no-one left here has written anything that anyone ever asked for before,
and I merely pointed out that IBM has removed ALL traces of its Personal PL/I from the PL/I web pages.

Quote:
> I'll bet you're one of those Florida seniors who can't even punch cards right;

Ouch! that brings back memories. But I have likewise described you guys as a bunch of  babysitters of
obsolete PLI applications on IBM big-iron monstrosities written by your current boss 20 yrs ago.
When he retires, look out!

Quote:
> that would explain your sloppy Fortran programming skills.

I keep trying to improve them, have a look at below comp.lang.fortran message posted just yesterday
showing use of my little GET_TIME program (I think it can even be translated by Robin) and suggest
how it can be improved..

---------------------- begin comp.lang.fortran message ----------
What time is it?</TITLE>
 US Naval Observatory Master Clock time ... </H2>
 Jan. 11, 2003,   16:33:16   Universal    Time
  Jan. 11, 2003,   11:33:16     Eastern Standard  Time
  Jan. 11, 2003,   10:33:16     Central Standard  Time
  Jan. 11, 2003,   09:33:16     Mountain Standard  Time
  Jan. 11, 2003,   08:33:16     Pacific Standard  Time
  Jan. 11, 2003,   07:33:16     Alaska Standard  Time
  Jan. 11, 2003,   06:33:16     Hawaii-Aleutian Standard  Time
 <P>
Time Service Department, US Naval Observatory</A

Above was output within 1 sec to my screen (I have a cable internet connection) by clicking my desktop link
to GET_TIME program below (I suggest adjusting output text to bright white using desktop link properties)..

It illustrates use of my getfile program http://www.*-*-*.com/ ;which must be in same
directory as calling program..

PROGRAM GET_TIME
USE DFPORT
IMPLICIT NONE
INTEGER :: n
CHARACTER(120) :: dat
CHARACTER(60) :: sUrl = ' http://www.*-*-*.com/ '
CHARACTER(20) :: filename = 'timer.pl'
LOGICAL :: exists

n = SYSTEM( 'GETFILE ' // sUrl // ' nomsg')  ! 2 command line args suppress normal getfile messages
INQUIRE (FILE=filename,EXIST=exists)       ! timer.pl file was downloaded
IF (exists) THEN
   OPEN (1,FILE=filename)
   DO
      READ (1,90,END=100) dat
      n = INDEX(dat,'>')
      dat = dat(n+1:)           ! eliminate some html <...> text
      IF (dat /= ' ') WRITE (*,90) TRIM(dat)
   END DO
   CLOSE (1,STATUS='DELETE')              ! discard timer.pl file
END IF
100 WRITE (*,'(A\)') 'hit <ENTER> to EXIT'
READ (*,*)
STOP
90 FORMAT (A)
END PROGRAM
------------------------- end comp.lang.fortran message --------------

Quote:
>BTW, exactly why have you got such a hard-on about PL/1;

I read the PL/I FAQ and disagreed with the statement "more powerful than Fortran" so you can blame Robin
for my current presence, ha ha..
PLUS a couple years back the majority here wasnt aware Fortran had moved on, (methinks they are less ignorant
now).
PLUS if it wasnt for my cage-rattling posts you guys wouldnt have anything much to  read in comp.lang.pl1
except "how to enlarge your {*filter*}"

Quote:
> did sometime in the past did you have to stay on the ol'IBM 029
> while the PL/1 programmers got the 3270s? Please provide details to

> programs as Slashdot wants to do a story on cutting edge Fortran
> nerds.

Being a 68yo retiree, I'll take the description of being cutting edge anything as a compliment.
Since its obvious you have google researched my posting history and thus know all about me,
you also know that dozens of testers have tried my benchmark   http://www.*-*-*.com/
on a dozen different computers/compilers..  However those compilers dont  include PL/I which appears
inadequate
in syntax to allow a translation and/or knowledgeable proponents of what it does have in features to translate
this benchmark.
Thanks for the smile on my face while responding to your message..


Thu, 30 Jun 2005 16:49:04 GMT  
 Performance of Personal PL/I on a Thinkpad 600 (300MHz Pentium II)

Quote:


> <snip>
> : Discussion:  First of all these values are approximately twice those obtained by
> : David Frank, while his machine operates at about 2.77 times the clock rate of my
> : machine.  Thus for this program, Personal PL/I appears to outperform FORTRAN.
> : The portion of the program being timed is computation bound.  In fact it
> : involves no I/O whatsoever.  This should lay to rest any claim that PL/I code is
> : inefficient in relation to FORTRAN for this problem.
> <snip>
> The benchmark is irrelevant for comparing compiler performance, its
> main ingredient is memory bandwidth:

The original reason for the benchmark was to check the compiler wasnt degrading performance when using
array syntax vs. do loop operations,  I accept that PL/I appears NOT to degrade performance.
As you point out, Weinkam's results seem fishy when you realize problem of carrying out these memory moves
on a 300mhz pentium II,   (also remember this chip may be the original Celeron which is a dog without the
on-chip cache),
my old 333mhz Celeron had more than double the performance over the non-cache 300mhz Celeron, either moving
data or multiplying arrays.
Quote:
> In portion I removed you gave the exact routine you used for timing -- the
> objective was to multiplay each element of 3000x3000 double precision array
> by a double x. Your best timing was around 0.5s. So the result is around
> 18 megaflops -- mediocre for 300Mhz Pentium II. However, to do the
> multiplication one has to read the array and then write it back meaning
> 72 MB read and 72 MB written meaning 144MB read and 144MB written per
> second -- I would say impressive for 300Mz laptop.

> --
>                               Waldek Hebisch




Thu, 30 Jun 2005 17:31:58 GMT  
 Performance of Personal PL/I on a Thinkpad 600 (300MHz Pentium II)

Quote:

> I don't know
> if Fortran allows array expressions in subroutine arguments.

Yes, in general it does.  There are situations where expressions are
not allowed, but those situations mostly have little to do with rank.
For example, you can't use an expression as an actual argument if the
dummy argument is intent(out); that applies to either scalars or arrays.

--
Richard Maine
email: my last name at domain
domain: isomedia dot com



Fri, 01 Jul 2005 01:43:07 GMT  
 
 [ 24 post ]  Go to page: [1] [2]

 Relevant Pages 

1. Performance of Personal PL/I on a Thinkpad 600 (300MHz Pentium II)

2. MacOberon people: how exploit vector processing of G4 (3x Pentium III/600)

3. MacOberon people: how exploit vector processing of G4 (3x Pentium III/600)

4. Performance of Personal PL/I

5. Comparative perf of Pentium/Pentium II under J?

6. Pentium/Pentium II asm ?

7. Pentium & Pentium II specs

8. Pentium & Pentium II spec

9. Pentium Pro & Pentium II instruction decomposition (uops)

10. FORTRAN on dual CPU Pentium II or Pentium III

11. Pentium II vs. Pentium Pro

12. Compiling DR PL/I code with IBM Personal PL/I

 

 
Powered by phpBB® Forum Software