J Performance Improvements 
Author Message
 J Performance Improvements

Joey K. Tuttle of Aptos, CA, uses J to analyze the bill from a
service provider, in the form of a 26 MB text file, and
to convert a 60 MB dBase file into "flat" files more amenable
for manipulations.  The execution time for these applications
have been reduced by attacking several performance bottlenecks:

Unless noted otherwise, all execution times are seconds on
a 80486/50, running J/PC386.

a.  ".y was used to convert a character matrix into a numeric
matrix, and was a major user of CPU time.  I have implemented
as "JKT special" in ". , which scans the character matrix to detect
whether it is a representation of real numbers with the same
number of numbers in each row.  For general y, the scan is
expected to fail almost immediately (with execution reverting
to the general routine), and therefore is expected to incur no
measurable performance penalty; for special y, when the scan
succeeds, a newly written routine is invoked.

The following table lists the ratio of execution times between
J2.06 (the current version) and Jx (the next version, with the
JKT special) for the expression ".y , where y=.": ?(m,n)$1e6 .

         1    2    3    4    5    10   20   40   80   100  200  400  800

    1   1.49 1.47 1.57 1.38 1.46 1.34 1.37 1.27 1.29 1.20 1.25 1.29 1.26
    2   2.09 1.86 1.86 1.75 1.77 1.29 1.45 1.45 1.29 1.32 1.28 1.24 1.28
    4   2.91 2.50 2.00 2.45 1.69 1.77 1.56 1.46 1.31 1.31 1.28 1.33 1.23
    8   3.81 4.00 2.75 2.00 2.50 1.87 1.62 1.42 1.34 1.32 1.17 1.29 1.25
   10   4.10 4.45 3.44 1.96 2.26 2.11 1.62 1.45 1.33 1.32 1.18 1.26 1.24
   20   7.00 4.23 3.15 2.82 2.59 1.86 1.63 1.47 1.41 1.39 1.29 1.26 1.25
   40   6.73 4.11 3.60 3.10 2.57 2.01 1.63 1.50 1.32 1.32 1.26 1.26 1.23
   80   7.59 4.89 3.75 3.03 2.73 2.03 1.74 1.50 1.35 1.33 1.28 1.24 1.24
  100   7.51 4.67 3.57 3.06 2.70 2.04 1.67 1.42 1.34
  200   7.43 4.86 3.74 3.11 2.76 1.99 1.65 1.45 1.34
  400   7.84 4.91 3.92 3.33 2.68 2.06 1.67 1.44 1.33
  800   8.24 4.94 3.76 3.07 2.84 2.11 1.65 1.43 1.34
 1000   8.13 5.06 3.94 3.26 2.92
 2000   8.09 5.09 3.90 3.21 2.84
 4000   8.34 5.15 3.87 3.25 2.84
 8000   8.39 5.23 3.91 3.27 2.87
10000   8.39 5.23 3.94 3.29 2.91
20000   8.32 5.30 4.12  wf   wf       wf means J2.06 WS Full-ed but
40000    wf   wf   wf   wf   wf                Jx computed the result
80000    wf   wf   wf  

I conclude that (0) the new scan and conversion routine is at
least 20 percent faster than the old execute; and that (1) when the
number of rows is large relative to the number of numbers per row,
the new routine, essentially implementing integrated rank support,
can be up to 8 times faster than the old routine, with the improvement
coming from removing the overhead in invoking ". for each row (plus
the 20 percent).

b.  The application was signalling WS Full on ;y , and taking a long
time to do so.  y is a 600-element list of character matrices,
individually boxed, and the total number of characters is about 60K.

On investigation, it turns out that the character matrices have
a varying number of columns.  In J2.06,  ;y  uses a fast routine
only in some special cases, and matrices with non-identical number
of columns is not one of them.  The second JKT special is a new
linear general algorithm, replacing the old quadratic one.
The time comparison is quite dramatic:

The argument y are boxed character matrices with 1 to 10 rows and

   n      J2.06    Jx        %

   1     0.0003  0.0004     0.75
   2     0.0006  0.0005     1.20
   4     0.0011  0.0006     1.83
   8     0.0027  0.0011     2.45
  10     0.0033  0.0016     2.06
  20     0.0087  0.0022     3.95
  40     0.0258  0.0044     5.86
  80     0.0868  0.0082    10.59
 100     0.1264  0.0105    12.04
 200     0.4290  0.0220    19.50
 400       wf    0.0380  
 800       wf    0.0880  
1000       wf    0.1040  
2000       wf    0.2090  
4000       wf    0.4230  
8000       wf    0.8560  

c.  From the habit of long APL usage, *./ .= (and dot equals)
was used for classification.  In this case, the performance bottleneck
was removed simply by using e. .

With these improvements (more precisely, with J functions which
model the improved primitives), the 60 MB dBase application runs
in 2.5 minutes instead of 20 and the 26 MB bill analysis runs
in 15 seconds instead of 140 (on a SPARCStation Classic).
I am indebted to Joey for reporting these performance bottlenecks,
and for his kind permission to describe his applications in this
note.  The reader is urged to report other such performance quirks
so that they can be fixed.  Cliff A. Reiter, in his APL95 paper
on Finite Automata, reports a quirk in the tesselation operator;
therefore, I expect that sometime there will be a "CAR Special".



Fri, 16 Jan 1998 03:00:00 GMT  
 
 [ 1 post ] 

 Relevant Pages 

1. JS-EAI with *JS*-callback

2. js.exception 3279.js

3. Performance improvements in 5.0?

4. Performance improvements between VW 3.0 and VW 5i

5. vw2.5 performance improvements

6. A performance improvement!

7. Question on performance improvement through re-design.

8. Python 1.5b1, code review for possible performance improvements

9. Performance improvement opportunities

10. ODBC performance Improvements

11. malloc performance affecting python performance

12. ST80 performance; STA performance and booleans

 

 
Powered by phpBB® Forum Software