
Looks like the French WIN!
Quote:
> Date: Thu, 17 Dec 1998 01:54:44 -0700
> I just finished my benchmarks in OCAML (very crude code no doubt), but even
> with the gratuitous creation of large arrays in every iteration the code
> clocks in at 0.63 of Intel Ref C++ for 128x128 and 0.54 for 256x256.
> No outboard C code was needed -- all of this is native OCAML!
> From the standpoint of speed, brevity of source code, and high level of
> non-numerical abstractness for all my other concerns above and beyond the
> numerical modeling aspects, it sure looks like OCAML is ONE HOT COMPILER!!
> ...
Dear David McClain,
thank you very much for your interest in Dylan. I have looked over
your benchmark code in Dylan, C++, and OCAML and have performed some
tests on my machine. I have compared C++ (MSVC++ version 6.0) and
Dylan (Harlequin Dylan version 2.0 alpha 2). The bottom line is that
your Dylan code runs 0.63 of the speed of C++/MSVC++. Minor sources
changes bring it to within 1% of C++/MSVC++. Equivalent loop unrolling
(as done in your C++ code) in Dylan yields equivalent performance to
C++/MSVC++.
For reference, the C++ source code can be found in
Dylan-Bench\Projects\Dylan\math-array\base_test\
and the original Dylan source code can be found in
Dylan-Bench\Projects\Dylan\fm-math-array\more-tests\
corresponding to the zip file found on your web site. I ran the test
for ten iterations with edge size 128. The performance I received for
MSVC++ on my 450MHZ PII (NT 4.0 SVP 3) computer with 128MB was:
ops/usec = 0.499512
nsec/op = 2001.95
I've assigned this performance 1.0. Your Dylan code runs unchanged at
a 0.66 relative speed.
There are a number of performance problems with your original Dylan
code. The biggest one is your attempt to coerce zern[0] with
as(<mvector>, zern[0])
Unfortunately, this is a rather inefficient coercion mechanism.
A more efficient mechanism would be to bind a typed local variable as
follows
let zern_0 :: <mvector> = zern[0];
and to use zern_0 in place of zern[0]. doing this increases the
relative speed of the Dylan to 0.88. (The need to do this type check
in the first place is due to the fact that we currently don't fully
support limited vectors, only limited vectors of certain base types,
such as double floats. This should be remedied soon. )
The next inefficiency is the use of extra integer iteration variables
in your for loops. For example,
for(sval in src, ix from 0 below dst.size) ... end;
could be rewritten much more efficiently as
for(sval keyed-by ix in src) ... end;
as ix is already being calculated internally. Doing this for all the
integer iteration variables improves the relative speed to 0.99. It
turns out that this is what you had done in the OCAML code.
The final inefficiency has to do with not unrolling the loop as you
had done in the C code. I wrote a new for loop macro which allows
easy loop unrolling which I then used to unroll the for loop in
mvaccum as was done in the C++ version. Doing this yields a speed
almost exactly the same as the C++ version, that is, 1.00.
I achieved this performance with version 2.0 alpha 2 of Harlequin
Dylan which has yet to be released. I have reason to believe that
version 1.1 would give similar results (although i can't test this as
I'm on vacation and don't have access to that version). I would be
happy to make my source code changes of your code available to you and
anyone else.
I am delighted by your e{*filter*}ment about Dylan and am committed to
making you a continued happy customer. If you have any more
benchmarks that you would like me to look over, I would be more than
happy.
jonathan bachrach
harlequin inc