Puzzler: Binomial Coefficients

Jim Weigang writes on Saturday, March 23:

Quote:

> Impressive! The remaining difference from assembler is practically

> insignificant compared to the improvement so far. In achieving a

> 15.1x speedup you've gone from 1.743 secs down to 0.115 secs; achieving

> a 19.4 ratio would involve saving only 0.025 additional secs, a mere

> 1.5% of the original figure. (As Clark Wiedmann used to remind me,

> speed ratios can be a deceptive measure of time savings.) ...

How far it has come matters less than how far it has to go.

But am I not complaining in this case: hand coding in C,

coming within 30% of hand-coded assembler is probably the best

that can be expected, esp. when the one-instruction quotient/

remainder facility plays such an important role.

Quote:

> My FACTOR{delta}APL function was a direct translation of the assembler

> program to APL using only scalar operations, and it was provided mostly

> to document the algorithm used by FACTOR. Roger's qco program uses

> vector operations, speeding it up greatly over pure scalar code, so it's

> not really comparable to FACTOR{delta}APL. Here's an APL version of

> qco: ...

Yes, I was aware of that difference. My interest in the model

is not as a direct translation of the C code. Generally,

I find such translations not as useful as a model written in the

best J style. This is the case whether the model is written after

the fact (as for q: here) or especially beforehand, when I am

groping for an understanding of the problem (see the model on the

gamma function posted a few weeks ago). Having written the model

"in the best J style", the hand translation into C is relatively

straightforward.

The timing on qco was to answer the question, regarding execution

speed, how close can the best J that I can write come to the best C

that I can write? (Answer: not yet as close as one would like.)

Quote:

> For the test case of 130000+{iota}5000, QCO runs in 10.78 secs

> (486/66, +II/386 v5.2), which I estimate to be about 40 times slower

> than Roger's latest q: . (But perhaps Roger could time QCO on +III

> using the same computer as for the J3.02x timings.) It is this factor

> of 40, not the 660, which should be compared with J's factor of 89.

I entered the QCO function into APL*PLUS III v1.2 running on NT on

a P90, and the result on the argument 130000+{iota} 5000 (in 0-origin)

is 4.636 seconds, so the estimate of a factor of 40 slower that q:

is very accurate.

q: J3.01 1.7425

q: J3.02x 0.1152

qco J3.02x 10.245

QCO APL*PLUS 4.636

In running QCO, I had to use just a stub for the function primesto,

as I do not have the assember version of that.

{del} z {=.} primesto n

z {=.} (n {>:} PRIMES)/PRIMES

{del}

On the argument in question, this takes 0.0003 seconds, which I

gather is same insignificant magnitude that the assembler function

would have taken, relative to the time for QCO itself.