J Performance Improvements

Joey K. Tuttle of Aptos, CA, uses J to analyze the bill from a

service provider, in the form of a 26 MB text file, and

to convert a 60 MB dBase file into "flat" files more amenable

for manipulations. The execution time for these applications

have been reduced by attacking several performance bottlenecks:

Unless noted otherwise, all execution times are seconds on

a 80486/50, running J/PC386.

a. ".y was used to convert a character matrix into a numeric

matrix, and was a major user of CPU time. I have implemented

as "JKT special" in ". , which scans the character matrix to detect

whether it is a representation of real numbers with the same

number of numbers in each row. For general y, the scan is

expected to fail almost immediately (with execution reverting

to the general routine), and therefore is expected to incur no

measurable performance penalty; for special y, when the scan

succeeds, a newly written routine is invoked.

The following table lists the ratio of execution times between

J2.06 (the current version) and Jx (the next version, with the

JKT special) for the expression ".y , where y=.": ?(m,n)$1e6 .

1 2 3 4 5 10 20 40 80 100 200 400 800

1 1.49 1.47 1.57 1.38 1.46 1.34 1.37 1.27 1.29 1.20 1.25 1.29 1.26

2 2.09 1.86 1.86 1.75 1.77 1.29 1.45 1.45 1.29 1.32 1.28 1.24 1.28

4 2.91 2.50 2.00 2.45 1.69 1.77 1.56 1.46 1.31 1.31 1.28 1.33 1.23

8 3.81 4.00 2.75 2.00 2.50 1.87 1.62 1.42 1.34 1.32 1.17 1.29 1.25

10 4.10 4.45 3.44 1.96 2.26 2.11 1.62 1.45 1.33 1.32 1.18 1.26 1.24

20 7.00 4.23 3.15 2.82 2.59 1.86 1.63 1.47 1.41 1.39 1.29 1.26 1.25

40 6.73 4.11 3.60 3.10 2.57 2.01 1.63 1.50 1.32 1.32 1.26 1.26 1.23

80 7.59 4.89 3.75 3.03 2.73 2.03 1.74 1.50 1.35 1.33 1.28 1.24 1.24

100 7.51 4.67 3.57 3.06 2.70 2.04 1.67 1.42 1.34

200 7.43 4.86 3.74 3.11 2.76 1.99 1.65 1.45 1.34

400 7.84 4.91 3.92 3.33 2.68 2.06 1.67 1.44 1.33

800 8.24 4.94 3.76 3.07 2.84 2.11 1.65 1.43 1.34

1000 8.13 5.06 3.94 3.26 2.92

2000 8.09 5.09 3.90 3.21 2.84

4000 8.34 5.15 3.87 3.25 2.84

8000 8.39 5.23 3.91 3.27 2.87

10000 8.39 5.23 3.94 3.29 2.91

20000 8.32 5.30 4.12 wf wf wf means J2.06 WS Full-ed but

40000 wf wf wf wf wf Jx computed the result

80000 wf wf wf

I conclude that (0) the new scan and conversion routine is at

least 20 percent faster than the old execute; and that (1) when the

number of rows is large relative to the number of numbers per row,

the new routine, essentially implementing integrated rank support,

can be up to 8 times faster than the old routine, with the improvement

coming from removing the overhead in invoking ". for each row (plus

the 20 percent).

b. The application was signalling WS Full on ;y , and taking a long

time to do so. y is a 600-element list of character matrices,

individually boxed, and the total number of characters is about 60K.

On investigation, it turns out that the character matrices have

a varying number of columns. In J2.06, ;y uses a fast routine

only in some special cases, and matrices with non-identical number

of columns is not one of them. The second JKT special is a new

linear general algorithm, replacing the old quadratic one.

The time comparison is quite dramatic:

The argument y are boxed character matrices with 1 to 10 rows and

n J2.06 Jx %

1 0.0003 0.0004 0.75

2 0.0006 0.0005 1.20

4 0.0011 0.0006 1.83

8 0.0027 0.0011 2.45

10 0.0033 0.0016 2.06

20 0.0087 0.0022 3.95

40 0.0258 0.0044 5.86

80 0.0868 0.0082 10.59

100 0.1264 0.0105 12.04

200 0.4290 0.0220 19.50

400 wf 0.0380

800 wf 0.0880

1000 wf 0.1040

2000 wf 0.2090

4000 wf 0.4230

8000 wf 0.8560

c. From the habit of long APL usage, *./ .= (and dot equals)

was used for classification. In this case, the performance bottleneck

was removed simply by using e. .

With these improvements (more precisely, with J functions which

model the improved primitives), the 60 MB dBase application runs

in 2.5 minutes instead of 20 and the 26 MB bill analysis runs

in 15 seconds instead of 140 (on a SPARCStation Classic).

I am indebted to Joey for reporting these performance bottlenecks,

and for his kind permission to describe his applications in this

note. The reader is urged to report other such performance quirks

so that they can be fixed. Cliff A. Reiter, in his APL95 paper

on Finite Automata, reports a quirk in the tesselation operator;

therefore, I expect that sometime there will be a "CAR Special".