APL2 on IBM's vector processors 
Author Message
 APL2 on IBM's vector processors

I read an article in the Oct. 2 Computerworld that IBM has
added support to APL2 for their vector processors.  Does
anyone know anything about this?  I'd like to know what the
support is.  Is it embedded in the implementations of the
different primitive functions, or is it simply an auxiliary
processor or some other loosely coupled approach?

-- Ned          uunet!h-three!ned



Fri, 19 Mar 1993 19:32:00 GMT  
 APL2 on IBM's vector processors
;> I read an article in the Oct. 2 Computerworld that IBM has added
;> support to APL2 for their vector processors.  Does anyone know
;> anything about this?  I'd like to know what the support is.  Is it
;> embedded in the implementations of the different primitive
;> functions, or is it simply an auxiliary processor or some other
;> loosely coupled approach?

A number of primitives will use the vector processing facility (as
long as the arrays are large enough).  Also, some idioms are
recognized and dealt with specially.
--



Fri, 19 Mar 1993 01:10:00 GMT  
 APL2 on IBM's vector processors

Quote:
>A number of primitives will use the vector processing facility (as
>long as the arrays are large enough).  Also, some idioms are
>recognized and dealt with specially.

I have heard that code is included to to recognize when square roots
are called for, so that when, for example, an expression like

                A*.5

comes up, code for square roots is executed, rather than code for
exponentiation.  I guess this makes the results on the Harris benchmark
look better than it used to.

--
    L. J.{*filter*}ey, Faculty of Mathematics, University of Waterloo.





Fri, 19 Mar 1993 22:51:00 GMT  
 APL2 on IBM's vector processors

Quote:

> I read an article in the Oct. 2 Computerworld that IBM has
> added support to APL2 for their vector processors.  Does
> anyone know anything about this?

We run APL2 on a vector head of a 3090, and have for quite some time. The
primitives invoke vectorized code in conditions where the implementers have
decided it is profitable. I believe this takes into account what size of
vector your machine has and how big the cache memory is, etc. It speeds up
several things by a long shot, but IBM has failed to appropriately vectorize
the otherwise useful quad-divide primitive. We find for large matrices that
the ESSL (Engineering and Scientific Subroutine Library) SVD (Singular Value
Decomposition) code runs up to thirty times faster. Fans of numerical
analysis will observe that the ESSL SVD routine, (Golub-Reinsch split with
Chan) is a more robust tool for numerical linear algebra, and the Hanson-Lawson
QR algorithm (now with row balancing) is used for the quad-divide implementation
because it's normally faster than SVD. We nearly never use the quad-divide
primitive any more when speed is an issue, we have written quad-NA functions
for ESSL as 'plug compatible' replacements.

We once asked IBM (APL2 systems) why they didn't vectorize quad-divide in the
correct way, and they mentioned that quad-divide had to have the exact same
'DOMAIN ERROR' behavior as previous APL, and this somehow got in their way.
(The point is that the SVD can do cases that the old method can't, so it will
sometimes return a correct answer instead of a DOMAIN ERROR. Uhh - We weren't
too impressed, either. This idea was behind their move to row-balanced Hanson-
Lawson, as well.)

It is now pretty well known that you have to time the primitives extensively
if you intend using them on a large scale, a particular example being the
APL idiom for uniques in APL2;

(V iota V) = iota rho V ...

which was recognized by some incarnations as an idiom, and was interpreted
without the dyadic iota. Then they put in a very fast dyadic iota, which the
idiom did not use. As a result, you could speed up the uniques construction by
breaking the recognition of the idiom as:

(V iota V) = iota ravel rho V ...

and it would use the dyadic iota, and go much faster. You're on your own in
the APL world in this respect.

From the point of view of parallelism, APL primitives are a great way to
exploit parallelism. Why most of the time you don't get a huge benfit is
because the APL interpreter is usually required to do too much overloading
on a primitive. ( + can apply to numeric arrays of any rank, etc. ) The
problem is that you can't really vectorize the all the shape and rank data
processing that APL needs to do before it figures out which way to interpret
a primitive. This preserves a nontrivial serial task which bottlenecks the
parallel hardware. We found this to be true using more than one brand of
parallel machine. The recent inception of array operations in fortran 8x is
likely to take better advantage of parallel hardware in the near future.

(Have I enraged enough APL implementers? Good. You guys need a poke every now
and then... :-) )

Later,
Andrew Mullhaupt

Disclaimer: The opinions expressed above are not necessarily those of Morgan
Stanley & Co., Inc.



Fri, 19 Mar 1993 15:12:00 GMT  
 
 [ 4 post ] 

 Relevant Pages 

1. Ordering IBM's APL2/PC

2. IBM's APL2/PC Version 1.02

3. IBM's APL2 for Sun Solaris

4. IBM's APL2/PC

5. IBM's APL2 announcements at APL93

6. Cross Platform Shared Vars and IBM's APL2

7. APL glyphs, Yterm's emulation and IBM's APL2 . IBM's TryAPL2

8. processor 12 in apl2/win

9. Test vectors for parallel processor

10. Fortran Benchmarks on vector and parallel processors

11. no vector = vector*vector in BLAS?

12. IBM APL2 Workstation (AP145)

 

 
Powered by phpBB® Forum Software