Sparse Array Storage 
Author Message
 Sparse Array Storage

Sparse Array Storage.

Sparse arrays (or nearly sparse) occur  frequently with commercial data because of missing
values or partly used fields.  When storing such data as partly inverted structures, it is
desirable to save space and I/O by compressing out the empty characters or numbers.  

The following  tacit verbs can be used to pack and unpack such data prior to storage.  
Since J stores bits at the character level, these too are packed/unpacked  by pbit and upbit..

A fuller set, as well as related articles and J code for inversion and analysis can be found at
http://www.*-*-*.com/ ~gordon/index.html

NB. Type 1 - pack empty ch or zero value fields  


NB. basic bit pack used on flags and in other to compress bits


NB. Type special - pack sparse arrays of numbers  


NB. utilities
sor0 =: 3 : 'if. 2=3!:0 y. do. nots y. else. not0 y. end.'

ntoc =: ({&a.)
cton =: a.&i.


No doubt there are faster and better ways to do some of these, but ...



Sat, 26 Sep 1998 03:00:00 GMT  
 Sparse Array Storage

Ken Ian Gordon writes on Tuesday, April 9:

Quote:
> NB. basic bit pack used on flags and in other to compress bits



(I rudely rearranged the order of presentation in the original
article to make my points.)

The "pbit" verb can be stated more succinctly by exploiting
negative infixes:  (-n)f\x  applies f to non-overlapping moving
"windows" of size n in items of x.  The window size here would
be 31, and the verb the identity [ .  Thus:



1

1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1

This version of "pbit" does not give the same results as the
original "pbit", but the differences are only in the "slop-over"
in the final 31 bits, and unpacks to the same result.

Quote:
> NB. Type 1 - pack empty ch or zero value fields




I admit to a bias against constructs of the form (f g)~ or (f~ g)~ ,
because I get confused after one or two commutes keeping track of
which argument is which.  Thus I myself would favor:



The verb "spread", taking a boolean left argument b and a right
argument x, computes  (-.b) expand x  in a way much superior
to my own:

   (c*+/\c=.-.b) { (0{1{.0$x) , x

When I devised this solution (in 1987), I had a bias against

because in most APL implementations the second grade (let alone
the first) was slower than some linear-time circumlocutions.  
It was Arthur Whitney who first pointed out to me that with the
improvements to /: (and \:) there is no longer any reason to
void the simpler expressions.

Finally, it may be more direct to retain the original boolean
mask rather than its negation, whence the decoding function can
be exactly "expand".  (My troubles with negation are similar to
my troubles with commute.)  Thus:




Quote:
> NB. Type special - pack sparse arrays of numbers



As far as I can tell, these functions are equivalent to



That is, to pack, retain the shape, then apply the vector case
to the ravelled argument; to unpack, reshape the unpacked vector.

Quote:
> NB. utilities
> ntoc =: ({&a.)
> cton =: a.&i.

By the way, the interpreter "knows" that "ntoc" and "cton"
are inverses.  Thus:

   {&a. ^: _1
a.&i.
   a.&i. ^: _1
{&a. :.(a.&i.)



Sun, 27 Sep 1998 03:00:00 GMT  
 Sparse Array Storage
There never was -any- reason to use double grade; both the
grade functions return permutations, and the inverse of a
permutation P is calculated cheaply (and, by inspection,
correctly) by

   I[P]{is}{iota}{rho}I{is}P

// mike

--



Wed, 30 Sep 1998 03:00:00 GMT  
 Sparse Array Storage
Roger Hui writes on Wednesday, April 10:

Quote:
> When I devised this solution (in 1987), I had a bias against

> because in most APL implementations the second grade (let alone
> the first) was slower than some linear-time circumlocutions.
> It was Arthur Whitney who first pointed out to me that with the
> improvements to /: (and \:) there is no longer any reason to
> avoid the simpler expressions.

Mike Kent writes on Saturday, April 13:

Quote:
> There never was -any- reason to use double grade; both the
> grade functions return permutations, and the inverse of a
> permutation P is calculated cheaply (and, by inspection,
> correctly) by

>    I[P]{is}{iota}{rho}I{is}P

Oh, this is the cheaper "linear-time circumlocution" I had
in mind.  Actually, the above expression is incomplete;
one really needs to say:

 i{lev}  i[p]{is}{iota}{rho}i{is}p   or    
 i,0{rho}i[p]{is}{iota}{rho}i{is}p

because the required result is i, not i[p].

There are several reasons why one might use double grade up,
or single grade up on an argument known to be a permutation:

. It is shorter:  {grade up}p vs. the above.
. It is more direct.
. It is more reliable:  There is no need to avoid conflicts
  in the use of the name i, and there is no need to localize i.

And with the improved performance grade (in A+, J, K), /:p is
only slightly slower or even faster than the circumlocution.
(Since you have access to K, you might try the timings in K.)



Wed, 30 Sep 1998 03:00:00 GMT  
 
 [ 4 post ] 

 Relevant Pages 

1. the storage format in direct solver for sparse matrix

2. Q: FEM assembly into Sparse storage scheme?

3. Sparse storage format examples in F90?

4. Q: FEM assembly into Sparse storage scheme?

5. Sparse arrays in APL

6. Sparse arrays in APL

7. sparse matrix or dynamic array help!

8. sparse matrix or dynamic array

9. Sparse Array/Dictionary

10. tcl inspired sparse multi-dimensional arrays in Python

11. Sparse arrays, mem usage

12. sparse arrays

 

 
Powered by phpBB® Forum Software