Performance comparisons of languages 
Author Message
 Performance comparisons of languages

I have a small numerically intensive benchmark that I have used for
comparing the runtime performance of several languages. This benchmark comes
from optical modeling of wavefront aberrations where the phase of a complex
E-field is assembled from a linear combination of Zernike polynomials
(complete, orthogonal over the unit-disk). Essentially, this benchmark
computes an image plane from a linear combination of 15 other image planes,
then it takes the sine and cosine of every pixel to form the complex
E-field.

The benchmark compares the speed of producing the linear combination of
image arrays and the subsequent sine and cosine of each pixel in the result.
Two image plane sizes were used to judge the effects of memory cache
spillage. The sizes used were 128 x 128 (16384 elements/image) and 256 x 256
(65536 elements/image). All computations were performed in double precision,
the same for all languages compared.

What I am hoping to find is the following: I use a language called IDL from
Research Systems Inc., a cousin of the PV-Wave numerical modeling language.
These are very primitive languages, but they are vectorized. I want to use a
higher level language so that my production code can run for a while with
the ability to generalize over operating conditions, instead of breaking at
every minor perturbation. (IDL doesn't even have the concept of an empty
list or vector).

At the same time, I hate to give up what speed I already have in runtime
performance with IDL. So as a reference mark, I coded this benchmark in
C/C++ using image planes filled with random numbers, and with random
coefficients. The highest performing C/C++ on my 350 MHz Pentium-II computer
is the Intel Reference Compiler. My main memory is 256 MBytes, and
essentially no disk paging activity occured in any of these tests. My L-2
cache is 256 KBytes.

The operating system is Windows/NT Workstation 4.0 sp 3 (build 1381).

One hundred iterations of this problem were timed in each run, and the
amortized runtime per pixel is computed. The figures reported here represent
the average of three consecutive runs of each test. Calling the inverse of
the amortized time per pixel the speed of performance, I indicate the Intel
Reference Compiler as unit speed. All other results are relative to that
performance.

Test Languages:

    Intel Reference Compiler C/C++ 3.0
    Microsoft Visual C/C++ 5.0 sp. 3
    GNU C/C++ 2.7-97r2aBeta
    IDL 5.1
    SML/NJ 110.0.3
    Harlequin Dylan 1.1

The tests performed by all three C compilers and that of Dylan rely on a
high-performance templated collection of primitives for performing various
vectorized arithmetic routines.

IDL relies on code written in its native interpreted language.

SML/NJ utilizes strictly inherent capabilities -- using an ARRAY2
implementation based on its Unsafe.Real64Array structure. (Is this boxed? I
don't know.) Array bounds checking is utilized, but only as needed. When an
internal package routine knows that it will be iterating over the entire
array, no bounds checking is performed (I assume that the internal structure
routines are trusted). External requests for sub, and update, do get
scrutinized by a bounds check. Note that I have not tweaked the internal
workings of SML/NJ. This structure was written and executed at the normal
operator interactive panel.

Now for the news you've been waiting to hear: Performance!

             Compiler        128x128        256x256
|----------  Intel Ref C          1.0              0.81
|---------   MSVC++            0.88            0.75
|-----       GNU C++           0.47            0.45
|----        SML/NJ             0.38            0.35
|-------     Dylan                0.70            0.58
|---         IDL 5.1              0.27            0.25

The 256x256 column shows some of the effects of spilling out of L2-cache.

I consider the above figures absolutely astounding in light of the high
level of abstractness in the written solution for SML!  And furthermore, not
only do I get to use higher-order logic and reasoning about the problem, it
also runs faster than the status quo in image processing!!! WOW!

Several points worth noting:

1. In order to acheive these speeds for SML I had to abandon the gratuitous
creation of new image arrays for every intermediate arithmetic result. That
gave speeds of 1/3 of IDL. Rather, I adopted 3-address coding style with
pre-allocated destination arrays. This is not as bad as it sounds, because
these arrays would normally be reused throughout our entire modeling runs
anyway.

2. Dylan achieves such intense performance by means of external C routines,
getting nearly 70 percent of the machines capability. Standalone Dylan
performs at roughly 0.9 - 1.4 times the performance of IDL. The Dylan
external interface utilizes the same 3-address coding style as for SML.

At this point I am convinced of the viability of doing image
processing/modeling in a higher order language. My choices are CLisp/CLOS,
Dylan, and SML. All have strengths/weaknesses. I now have to decide which
one I will use for the next major project.

--
David McClain
Sr. Scientist
Raytheon Missile Systems Co.
Tucson, AZ



Fri, 01 Jun 2001 03:00:00 GMT  
 Performance comparisons of languages

Quote:

>At this point I am convinced of the viability of doing image
>processing/modeling in a higher order language. My choices are CLisp/CLOS,
>Dylan, and SML. All have strengths/weaknesses. I now have to decide which
>one I will use for the next major project.

Take a look at the RG (reverse graphics) system from Xerox PARC. They use a
new experimental technique called aspect-oriented programming to get the
quality of hand optimized code with all of the clarity of functional
decomposition.

Take care,
Macneil Shonle



Sat, 02 Jun 2001 03:00:00 GMT  
 Performance comparisons of languages


    Date: Mon, 14 Dec 1998 21:19:02 -0700

    One hundred iterations of this problem were timed in each run, and the
    amortized runtime per pixel is computed. The figures reported here represent
    the average of three consecutive runs of each test. Calling the inverse of
    the amortized time per pixel the speed of performance, I indicate the Intel
    Reference Compiler as unit speed. All other results are relative to that
    performance.

    Test Languages:

        Intel Reference Compiler C/C++ 3.0
        Microsoft Visual C/C++ 5.0 sp. 3
        GNU C/C++ 2.7-97r2aBeta
        IDL 5.1
        SML/NJ 110.0.3
        Harlequin Dylan 1.1

...

    Now for the news you've been waiting to hear: Performance!

                 Compiler        128x128        256x256
    |----------  Intel Ref C       1.0            0.81
    |---------   MSVC++            0.88           0.75
    |-----       GNU C++           0.47           0.45
    |----        SML/NJ            0.38           0.35
    |-------     Dylan             0.70           0.58
    |---         IDL 5.1           0.27           0.25

    The 256x256 column shows some of the effects of spilling out of L2-cache.

Just to be perfectly clear, this table says that if Intel Ref C takes
1 minute to do this benchmark, then GNU C++ takes just over 2 minutes.

    I consider the above figures absolutely astounding in light of the high
    level of abstractness in the written solution for SML!  And furthermore, not
    only do I get to use higher-order logic and reasoning about the problem, it
    also runs faster than the status quo in image processing!!! WOW!

    Several points worth noting:

    1. In order to acheive these speeds for SML I had to abandon the gratuitous
    creation of new image arrays for every intermediate arithmetic result. That
    gave speeds of 1/3 of IDL. Rather, I adopted 3-address coding style with
    pre-allocated destination arrays. This is not as bad as it sounds, because
    these arrays would normally be reused throughout our entire modeling runs
    anyway.

    2. Dylan achieves such intense performance by means of external C routines,
    getting nearly 70 percent of the machines capability. Standalone Dylan
    performs at roughly 0.9 - 1.4 times the performance of IDL. The Dylan
    external interface utilizes the same 3-address coding style as for SML.

That is, standalone Dylan is 4 to 6 times _slower_ than Intel Ref C,
but by using the FFI to call out to some critical routines, you got it
to run at about 70% of the speed of Intel Ref C.

Would it be possible to send me the standalone Dylan version of this
benchmark (and your test data)?  I would like to (1) have a crack at
tuning it a bit (I'll send the result back to you), and (2) make sure
our compiler people see it in case there is anything obvious to be
done.  I will see if I can benchmarking it here under both HD 1.1 and
under HD pre-2.0; the 2.0 compiler has much better support for such
things as limited vectors, which (if you aren't already using them)
can save lots of time boxing/unboxing FP data.

    At this point I am convinced of the viability of doing image
    processing/modeling in a higher order language. My choices are CLisp/CLOS,
    Dylan, and SML. All have strengths/weaknesses. I now have to decide which
    one I will use for the next major project.

You should definitely use Dylan.  ;-)

Seriously, I think there's a lot to be said for the hybrid approach of
using a language like Dylan for higher-level code, calling out to C
for bit-twiddling and floating point computations.  But a little
tuning might get straight Dylan to come even closer to C.



Sat, 02 Jun 2001 03:00:00 GMT  
 Performance comparisons of languages
I am surprised you get such poor performance from IDL. I would have expected
something near C performance for properly coded code involving relatively large
arrays. The couple of times I have competed with respected C programmers in my
group I have matched their performance. While IDL is not a compiled language
its overhead should be small for large arrays, where its primitives are
optimized. Your results are counterintuitive with relative performance not
increasing with array size. This suggests, that at least in this case, your
coding might be a problem and not the implementation of the language. I have
two obvious questions:

1. Are you using array arithmetic as much as possible and not relying on array
element or array subsections where the language's overhead has a serious
impact?

2. Are you assuming that array indexing is as in C and not as in fortran?
Indexing in IDL should be as in Fortran with the leftmost index varying most
rapidly. Be aware that the language's documentation plays Humpty Dumpty, in
trying to make itself sound like C rather than Fortran, defining itself noting
that C is commonly described as row major, but defining C array accesses as
[column, row], and IDL's accesses as [row,column].

William B. Clodius



Sun, 03 Jun 2001 03:00:00 GMT  
 Performance comparisons of languages
William,

I have been using IDL now for nearly 10 years... The apparent lack of growth
with increasing size is also apparent in all the other higher level
languages, except for the Dylan + C Libs approach. All of the C-libs tests
show about a 21% hit with quadruple array size, while all the higher level
languages show around 7% increase. I thought about this for a while and it
seems to me that this can be explained by the far greater utilization of L1
+ L2 cache by the C-based tests. Thus, when the array size quadruples and no
longer fits in the cache, the performance drop will be greater than those
approaches which utilized the caches less well to begin with.

I must say that when I started this test, I was under the impression too,
that IDL would be quite efficient overall. After all, I have been told by
them that the low-level vector routines have been written in C. However, in
light of these results, I have to wonder if they are messing around at the C
level doing things like array bounds checking, and what not...

Anyway, here is my IDL code... See for yourself what gives?

As an aside, I have dropped using their FFT routines for 2-D image
processing, because, being a routine that handles arbitrary image sizes
instead of powers of 2, it runs in O(N^2 (Log2 N)^2) instead of the more
reasonable O(N^2 Log2 N) that one gets using power of 2 decimation. So for
the past two years, I have been using a high-performance, multithreaded, 2-D
FFT based on the Intel Math Kernel 1-D routines. This routine splits the
arrays into as many chunks as you have processors and runs the chunks in
parallel. On my machine here with only a single processor, I can do 256 x
256 complex/complex FFT's at around 30 Hz. Try that in IDL!

Anyway, thanks for your interest...

DM
--------------------------------------------------------------------------
;; tstspeed.pro
;;
;; DM/RMSC 07/98
;; -----------------------------------------------
pro null,x,y,z
end

pro doitdp,limit
nx = 256
x = dindgen(nx,nx)
xarr = dblarr(nx,nx,15)
coff = 1 + findgen(15)
for ix = 0,14 do xarr[0,0,ix] = 0.01*dindgen(128,128)
nel = n_elements(x)
print,'N Elements: ',n_elements(x)
limit = long(limit)
nops = float(limit)*float(nel)
tstart = systime(1)
for jx=0L,limit-1 do begin
  null,x2,x,1.0d0
end
tstop = systime(1)
oh = tstop - tstart
t = 0
x2 = x
tstart = systime(1)
for jx=0L,limit-1 do begin
  x = dblarr(nx,nx)
  for kx = 0,14 do x = x + coff[kx]*xarr[*,*,kx]
  y = dcomplex(cos(x),sin(x))
end
tstop = systime(1)
print,'total: ',t
print,'NOps: ',n_elements(x)*float(limit)
dur = (tstop - tstart - oh)
print,'Dur: ', dur
print,'Overhead: ',oh
print,'ns/op: ',1e9*dur/nops
print,'ops/us: ',nops/dur*1e-6
end

Quote:

>I am surprised you get such poor performance from IDL. I would have
expected
>something near C performance for properly coded code involving relatively
large
>arrays. The couple of times I have competed with respected C programmers in
my
>group I have matched their performance. While IDL is not a compiled
language
>its overhead should be small for large arrays, where its primitives are
>optimized. Your results are counterintuitive with relative performance not
>increasing with array size. This suggests, that at least in this case, your
>coding might be a problem and not the implementation of the language. I
have
>two obvious questions:

>1. Are you using array arithmetic as much as possible and not relying on
array
>element or array subsections where the language's overhead has a serious
>impact?

>2. Are you assuming that array indexing is as in C and not as in Fortran?
>Indexing in IDL should be as in Fortran with the leftmost index varying
most
>rapidly. Be aware that the language's documentation plays Humpty Dumpty, in
>trying to make itself sound like C rather than Fortran, defining itself
noting
>that C is commonly described as row major, but defining C array accesses as
>[column, row], and IDL's accesses as [row,column].

>William B. Clodius



Sun, 03 Jun 2001 03:00:00 GMT  
 Performance comparisons of languages
Are you saying this because you beleive it, and because this is the Dylan
thread? or do you know something I don't know?

DM

Quote:


>    Date: Mon, 14 Dec 1998 21:19:02 -0700

>    One hundred iterations of this problem were timed in each run, and the
>    amortized runtime per pixel is computed. The figures reported here
represent
>    the average of three consecutive runs of each test. Calling the inverse
of
>    the amortized time per pixel the speed of performance, I indicate the
Intel
>    Reference Compiler as unit speed. All other results are relative to
that
>    performance.

>    Test Languages:

>        Intel Reference Compiler C/C++ 3.0
>        Microsoft Visual C/C++ 5.0 sp. 3
>        GNU C/C++ 2.7-97r2aBeta
>        IDL 5.1
>        SML/NJ 110.0.3
>        Harlequin Dylan 1.1

>...

>    Now for the news you've been waiting to hear: Performance!

>                 Compiler        128x128        256x256
>    |----------  Intel Ref C       1.0            0.81
>    |---------   MSVC++            0.88           0.75
>    |-----       GNU C++           0.47           0.45
>    |----        SML/NJ            0.38           0.35
>    |-------     Dylan             0.70           0.58
>    |---         IDL 5.1           0.27           0.25

>    The 256x256 column shows some of the effects of spilling out of
L2-cache.

>Just to be perfectly clear, this table says that if Intel Ref C takes
>1 minute to do this benchmark, then GNU C++ takes just over 2 minutes.

>    I consider the above figures absolutely astounding in light of the high
>    level of abstractness in the written solution for SML!  And
furthermore, not
>    only do I get to use higher-order logic and reasoning about the
problem, it
>    also runs faster than the status quo in image processing!!! WOW!

>    Several points worth noting:

>    1. In order to acheive these speeds for SML I had to abandon the
gratuitous
>    creation of new image arrays for every intermediate arithmetic result.
That
>    gave speeds of 1/3 of IDL. Rather, I adopted 3-address coding style
with
>    pre-allocated destination arrays. This is not as bad as it sounds,
because
>    these arrays would normally be reused throughout our entire modeling
runs
>    anyway.

>    2. Dylan achieves such intense performance by means of external C
routines,
>    getting nearly 70 percent of the machines capability. Standalone Dylan
>    performs at roughly 0.9 - 1.4 times the performance of IDL. The Dylan
>    external interface utilizes the same 3-address coding style as for SML.

>That is, standalone Dylan is 4 to 6 times _slower_ than Intel Ref C,
>but by using the FFI to call out to some critical routines, you got it
>to run at about 70% of the speed of Intel Ref C.

>Would it be possible to send me the standalone Dylan version of this
>benchmark (and your test data)?  I would like to (1) have a crack at
>tuning it a bit (I'll send the result back to you), and (2) make sure
>our compiler people see it in case there is anything obvious to be
>done.  I will see if I can benchmarking it here under both HD 1.1 and
>under HD pre-2.0; the 2.0 compiler has much better support for such
>things as limited vectors, which (if you aren't already using them)
>can save lots of time boxing/unboxing FP data.

>    At this point I am convinced of the viability of doing image
>    processing/modeling in a higher order language. My choices are
CLisp/CLOS,
>    Dylan, and SML. All have strengths/weaknesses. I now have to decide
which
>    one I will use for the next major project.

>You should definitely use Dylan.  ;-)

>Seriously, I think there's a lot to be said for the hybrid approach of
>using a language like Dylan for higher-level code, calling out to C
>for bit-twiddling and floating point computations.  But a little
>tuning might get straight Dylan to come even closer to C.



Sun, 03 Jun 2001 03:00:00 GMT  
 Performance comparisons of languages
I have been following the Xerox/PARC RG and Aspect-Oriented Programming for
some time now... I would like to get my hands on some of it, but PARC seems
unwilling to post their code -- only their papers seem available. In fact I
just check this morning after getting your input...

However, RG is used for production of fast code, and not for anything
interactive. While that is great once the processing required is learned,
one needs to interact with the data in the beginning. The approach I am
considering will do both, but, probably not the immense speedup that RG
would give.

Thanks for the interest!

DM

Quote:


>>At this point I am convinced of the viability of doing image
>>processing/modeling in a higher order language. My choices are CLisp/CLOS,
>>Dylan, and SML. All have strengths/weaknesses. I now have to decide which
>>one I will use for the next major project.

>Take a look at the RG (reverse graphics) system from Xerox PARC. They use a
>new experimental technique called aspect-oriented programming to get the
>quality of hand optimized code with all of the clarity of functional
>decomposition.

>Take care,
>Macneil Shonle



Sun, 03 Jun 2001 03:00:00 GMT  
 Performance comparisons of languages
I will be happy to provide my source codes for all of these tests to
everyone and anyone. Judge for yourselves. In fact, I am interested in just
how much difference is seen with differences in machine architecture, memory
size, etc.

Also, if any of you have some good ideas, I am eager to hear them. I can
never acquire too much knowledge!

Quote:
>--
>David McClain
>Sr. Scientist
>Raytheon Missile Systems Co.
>Tucson, AZ



Sun, 03 Jun 2001 03:00:00 GMT  
 Performance comparisons of languages
I should also point out, that one of the things I am watching in these tests
is just how easily I can achieve the max performance. What I find in Dylan,
is that straight Dylan code gets me a working prototype, but in order to get
reasonable production speed I need to carefully pepper the method headers
and bindings with well chosen types.

As a result, this is not an easy thing for my novice colleagues. I will
undoubtedly have to examine the "production" modules by hand to tune them to
acceptable performance. With ML I don't have to work this hard, but then
again, I work harder on other things there...

As a compromise best language, I can be easily convinced that Dylan is the
way to go... I can test out my ideas, ignoring performance, because as one
colleague correctly pointed out, there is much to be said for rapid
prototyping of correct code. If the program runs in 20 seconds as opposed to
2 seconds, it hardly matters because we will all go off and ponder the
results for several days afterward.

So why do I care so much about speed? Well, easy to write code is first,
maintainability is second, along with the generality of the code to adapt
easily to slight perturbations in the operating conditions. My original
frustration with IDL was that it is too {*filter*}y nitpicking for many things,
it fails to offer good default behavior, and it is inconsistent. It appears
to have been hacked out of old Fortran over the past 15 years, and it shows.
As I said, and as I have complained often to them, the silly language
doesn't even have the concept of an empty list or vector!

DM

Quote:


>    Date: Mon, 14 Dec 1998 21:19:02 -0700

>    One hundred iterations of this problem were timed in each run, and the
>    amortized runtime per pixel is computed. The figures reported here
represent
>    the average of three consecutive runs of each test. Calling the inverse
of
>    the amortized time per pixel the speed of performance, I indicate the
Intel
>    Reference Compiler as unit speed. All other results are relative to
that
>    performance.

>    Test Languages:

>        Intel Reference Compiler C/C++ 3.0
>        Microsoft Visual C/C++ 5.0 sp. 3
>        GNU C/C++ 2.7-97r2aBeta
>        IDL 5.1
>        SML/NJ 110.0.3
>        Harlequin Dylan 1.1

>...

>    Now for the news you've been waiting to hear: Performance!

>                 Compiler        128x128        256x256
>    |----------  Intel Ref C       1.0            0.81
>    |---------   MSVC++            0.88           0.75
>    |-----       GNU C++           0.47           0.45
>    |----        SML/NJ            0.38           0.35
>    |-------     Dylan             0.70           0.58
>    |---         IDL 5.1           0.27           0.25

>    The 256x256 column shows some of the effects of spilling out of
L2-cache.

>Just to be perfectly clear, this table says that if Intel Ref C takes
>1 minute to do this benchmark, then GNU C++ takes just over 2 minutes.

>    I consider the above figures absolutely astounding in light of the high
>    level of abstractness in the written solution for SML!  And
furthermore, not
>    only do I get to use higher-order logic and reasoning about the
problem, it
>    also runs faster than the status quo in image processing!!! WOW!

>    Several points worth noting:

>    1. In order to acheive these speeds for SML I had to abandon the
gratuitous
>    creation of new image arrays for every intermediate arithmetic result.
That
>    gave speeds of 1/3 of IDL. Rather, I adopted 3-address coding style
with
>    pre-allocated destination arrays. This is not as bad as it sounds,
because
>    these arrays would normally be reused throughout our entire modeling
runs
>    anyway.

>    2. Dylan achieves such intense performance by means of external C
routines,
>    getting nearly 70 percent of the machines capability. Standalone Dylan
>    performs at roughly 0.9 - 1.4 times the performance of IDL. The Dylan
>    external interface utilizes the same 3-address coding style as for SML.

>That is, standalone Dylan is 4 to 6 times _slower_ than Intel Ref C,
>but by using the FFI to call out to some critical routines, you got it
>to run at about 70% of the speed of Intel Ref C.

>Would it be possible to send me the standalone Dylan version of this
>benchmark (and your test data)?  I would like to (1) have a crack at
>tuning it a bit (I'll send the result back to you), and (2) make sure
>our compiler people see it in case there is anything obvious to be
>done.  I will see if I can benchmarking it here under both HD 1.1 and
>under HD pre-2.0; the 2.0 compiler has much better support for such
>things as limited vectors, which (if you aren't already using them)
>can save lots of time boxing/unboxing FP data.

>    At this point I am convinced of the viability of doing image
>    processing/modeling in a higher order language. My choices are
CLisp/CLOS,
>    Dylan, and SML. All have strengths/weaknesses. I now have to decide
which
>    one I will use for the next major project.

>You should definitely use Dylan.  ;-)

>Seriously, I think there's a lot to be said for the hybrid approach of
>using a language like Dylan for higher-level code, calling out to C
>for bit-twiddling and floating point computations.  But a little
>tuning might get straight Dylan to come even closer to C.



Sun, 03 Jun 2001 03:00:00 GMT  
 Performance comparisons of languages

Quote:

>    Now for the news you've been waiting to hear: Performance!

>                 Compiler        128x128        256x256
>    |----------  Intel Ref C       1.0            0.81
>    |---------   MSVC++            0.88           0.75
>    |-----       GNU C++           0.47           0.45
>    |----        SML/NJ            0.38           0.35
>    |-------     Dylan             0.70           0.58
>    |---         IDL 5.1           0.27           0.25

>    The 256x256 column shows some of the effects of spilling out of
L2-cache.

>Just to be perfectly clear, this table says that if Intel Ref C takes
>1 minute to do this benchmark, then GNU C++ takes just over 2 minutes.

I must say that I was initially surprised by the poor running of GNU C++. An
initial benchmark the other day showed about 20% faster speed that MSVC++,
and I was expecting to see the same this time around.

What I found is what we should all pay attention to... Benchmarking on
stupid simple little things, like how fast you can add a list of numbers, is
too small to give a reasonable representation of performance. Even my little
benchmark is dangerously small. Very tiny benchmarks give distorted views.

In fact the same tiny benchmark that showed GNU C++ at 20% faster than
MSVC++ also showed the Intel Ref C++ as consistently 5% slower than MSVC++.
What I found in the current benchmark, with the help of Intel's VTUNE, was
that a larger body of code gave their compiler more opportunity to reorder,
unroll loops, and generally avoid pipeline stalls. That wasn't the case when
I compared a loop consisting of only 10 instructions. The current bench gave
several dozens of lines of assembly.

Just a note of caution...

DM



Sun, 03 Jun 2001 03:00:00 GMT  
 Performance comparisons of languages
Quote:

> <snip>

As IDL code, your code (as least in the timed portion) was written as
something that is readilly vectorizable. It is not clear to me how
straightforward a translation to another language would be. In
translation this code may not be testing what you think it is testing.
There are a lot of potential optimizations in a compiler that can turn
apparent ops into no-ops. IDL is unlikely to make these optimizations as
it confines its standard analyses to the expression level. Note my
comments preceed questionable code.

Quote:
> ;; tstspeed.pro
> ;;
> ;; DM/RMSC 07/98
> ;; -----------------------------------------------

IDL could recognize this as a null operation and optimize it away.
Quote:
> pro null,x,y,z
> end

> pro doitdp,limit
> nx = 256
> x = dindgen(nx,nx)
> xarr = dblarr(nx,nx,15)
> coff = 1 + findgen(15)

I don't understand the implications of the 0's in xarr[0,0,ix]. Its an
idiom I have never seen in IDL. I have seen similar things in Fortran,
but the semantics need not be the same.
Quote:
> for ix = 0,14 do xarr[0,0,ix] = 0.01*dindgen(128,128)
> nel = n_elements(x)
> print,'N Elements: ',n_elements(x)
> limit = long(limit)
> nops = float(limit)*float(nel)
> tstart = systime(1)

Why calculate this? It seems as if you consider this as a measurement of
overhead of user defined procedure calls. The closest subsequent matchup
I can see is with y = dcomplex(cos(x),sin(x)), but language intrinsics
need not be implemented the same as user procedures and y =
dcomplex(cos(x),sin(x)) involves a number of intrinsic procedure calls.
Quote:
> for jx=0L,limit-1 do begin
>   null,x2,x,1.0d0
> end
> tstop = systime(1)
> oh = tstop - tstart
> t = 0
> x2 = x
> tstart = systime(1)

A smart compiler could reconize that this loop only needs to be computed
zero (if limit <= 0) or once (if limit > 0), as all its calculations are
constants. A slightly smarter compiler could realize that Y is never
used and only the size of x is used so that none of the calculations
need to be performed
Quote:
> for jx=0L,limit-1 do begin

this calculates a constant that can be propagated out of the loop
Quote:
>   x = dblarr(nx,nx)

this loop is difficult to optimize, but its result is a constant that
could be propagated out of the loop.
Quote:
>   for kx = 0,14 do x = x + coff[kx]*xarr[*,*,kx]

Y is never used, so its calculation can be optimized away in a compiler.
I expect it is the {*filter*} expression in IDL. On a statement level it
implies an array creation, two transcendental calculations and
assignments
Quote:
>   y = dcomplex(cos(x),sin(x))
> end
> tstop = systime(1)
> print,'total: ',t

The following is an odd definition of ops. I would have expected an
additional scale factor.
Quote:
> print,'NOps: ',n_elements(x)*float(limit)

Should you really be subtracting oh below? Note on my machine it
scarcely matters, oh is so small.

Quote:
> dur = (tstop - tstart - oh)
> print,'Dur: ', dur
> print,'Overhead: ',oh
> print,'ns/op: ',1e9*dur/nops
> print,'ops/us: ',nops/dur*1e-6
> end
> <snip>

--

William B. Clodius              Phone: (505)-665-9370
Los Alamos Nat. Lab., NIS-2     FAX: (505)-667-3815
PO Box 1663, MS-C323            Group office: (505)-667-5776



Sun, 03 Jun 2001 03:00:00 GMT  
 Performance comparisons of languages

Quote:

> comments preceed questionable code.

>> ;; tstspeed.pro
>> ;;
>> ;; DM/RMSC 07/98
>> ;; -----------------------------------------------
>IDL could recognize this as a null operation and optimize it away.
>> pro null,x,y,z
>> end

Yes, but IDL is not a smart compiler. Actually this snippet was a leftover
from earlier tests that another note warns about...

Quote:
>> pro doitdp,limit
>> nx = 256
>> x = dindgen(nx,nx)
>> xarr = dblarr(nx,nx,15)
>> coff = 1 + findgen(15)
>I don't understand the implications of the 0's in xarr[0,0,ix]. Its an
>idiom I have never seen in IDL. I have seen similar things in Fortran,
>but the semantics need not be the same.
>> for ix = 0,14 do xarr[0,0,ix] = 0.01*dindgen(128,128)

This is shorthand syntax for xarr[*,*,ix] = ...
It says that starting at element location [0,0,ix] place the LHS array...

Quote:
>> nel = n_elements(x)
>> print,'N Elements: ',n_elements(x)
>> limit = long(limit)
>> nops = float(limit)*float(nel)
>> tstart = systime(1)
>Why calculate this? It seems as if you consider this as a measurement of
>overhead of user defined procedure calls. The closest subsequent matchup
>I can see is with y = dcomplex(cos(x),sin(x)), but language intrinsics
>need not be implemented the same as user procedures and y =
>dcomplex(cos(x),sin(x)) involves a number of intrinsic procedure calls.

True enough, but this was again a leftover attempt to subtract out whatever
overhead the loop causes in IDL. It is actually measurable on the scale of
this test... I was actually attempting to measure the overhead of the outer
loop control.

Quote:
>> for jx=0L,limit-1 do begin
>>   null,x2,x,1.0d0
>> end
>> tstop = systime(1)
>> oh = tstop - tstart
>> t = 0
>> x2 = x
>> tstart = systime(1)
>A smart compiler could reconize that this loop only needs to be computed
>zero (if limit <= 0) or once (if limit > 0), as all its calculations are
>constants. A slightly smarter compiler could realize that Y is never
>used and only the size of x is used so that none of the calculations
>need to be performed

Again, IDL is not a smart compiler. Rather, it compiles everything to a
p-Code (rather like FORTH) and executes it. Why, it doesn't even have
short-circuit eval of conditional expressions!

Quote:
>> for jx=0L,limit-1 do begin
>this calculates a constant that can be propagated out of the loop
>>   x = dblarr(nx,nx)
>this loop is difficult to optimize, but its result is a constant that
>could be propagated out of the loop.
>>   for kx = 0,14 do x = x + coff[kx]*xarr[*,*,kx]
>Y is never used, so its calculation can be optimized away in a compiler.
>I expect it is the {*filter*} expression in IDL. On a statement level it
>implies an array creation, two transcendental calculations and
>assignments
>>   y = dcomplex(cos(x),sin(x))

You know... If IDL were smart you are absolutely correct about all the
possible optimizations. But I think that the figures reported in the
benchmark sort of prove how stupid the compiler is...

DM



Sun, 03 Jun 2001 03:00:00 GMT  
 Performance comparisons of languages
... from a posting on the comp-lang-ml thread...

Quote:

>You may find SAC worth looking at, it's at

>http://www.informatik.uni-kiel.de/~sacbase/index.html

Thanks, I just had a look at that URL. I should say that although I work
with numerically intensive models and data reduction, I have found that
approximately 80% of my effort is actually spent on non-numerical aspects,
such as file management, data acquisition automation, and other
organizational sorts of things.  That is what is so frustrating about IDL.

If all I needed were numerical modeling capabilities I should be quite happy
to stay with IDL -- it is interactive, vectorized, has nice built-in
graphics, etc.  But when it comes to doing anything non-numerical it is an
absolute nightmare!

I find it quite interesting that despite my analysis activities which are so
heavily tilted toward numerics, that only accounts for 20% of my programming
effort....

DM



Sun, 03 Jun 2001 03:00:00 GMT  
 Performance comparisons of languages

Quote:

> <snip>
> You know... If IDL were smart you are absolutely correct about all the
> possible optimizations. But I think that the figures reported in the
> benchmark sort of prove how stupid the compiler is...

> DM

My point was intended to be, not that IDL could do this, but that a
compiler could do this, that you were comparing IDL with compiled
languages, and that similar optimizations could occur on a
straightforward translation of the code to another language. A perenial
problem with benchmarks is being sure that you are measuring what you
think you are measuring. It is common to write benchmarks that in the
end do not do anything with intermediate calculations, assume that what
you are measuring is the effort to perform the intermediate calculations
(which you would use in real code), and then be surprized to discover
that some fraction of the intermediate calculations are optimized away,
when that optimization could not be validly done for real code.

--

William B. Clodius              Phone: (505)-665-9370
Los Alamos Nat. Lab., NIS-2     FAX: (505)-667-3815
PO Box 1663, MS-C323            Group office: (505)-667-5776



Sun, 03 Jun 2001 03:00:00 GMT  
 
 [ 14 post ] 

 Relevant Pages 

1. Three languages: A performance comparison

2. Three languages: A performance comparison

3. Interesting Language performance comparisons - Ruby, OCAML etc

4. COBOL vs other languages: Performance Comparisons - anyone ??????

5. Three languages: A performance comparison

6. language advocacy and language comparison

7. Performance Comparisons

8. Performance comparison of recent Smalltalks?

9. performance comparison

10. performance comparisons

11. Open Source database performance comparisons

12. Performance comparison (ODBC <-> native Oracle driver)

 

 
Powered by phpBB® Forum Software