Interesting Programming Problem, Need Some Help
Author 
Message 
Greg Hei #1 / 7

Interesting Programming Problem, Need Some Help
Quote:
> ...Given a square array (NxN) filled with zeros and ones
(in practice, there are many more zeros than ones), what is the distribution of patterns of size 1, 2, 3, ...., where a pattern is defined any string of adjacent 1's, either horizontally or /and vertically distributed, and the number of "singlets" is also of interest Sounds like a transitive closure problem over the relation of adjacency of occupied cells. A unique representative for each connected component could also be found using the UNION/ FIND algorithms of Aho Hopcroft Ullman in almost linear time. Quote: > APL and fortran solutions preferred, but any language appreciated. Some background in J... D m .1.1...1 ..11.1.. ..111.11 .1....1. .11...1. ..1..... 11.1.... .1..11.1 To select the cells adj=: ,# [:i. [:*/ $ adj m 1 3 7 10 11 13 18 19 20 22 23 25 30 33 34 38 42 48 49 51 57 60 61 63 of which these adj m*.1.!.0"1 m 10 18 19 22 33 48 60 are connected rowwise to these 1+adj m*.1.!.0"1 m 11 19 20 23 34 49 61 and similarly these adj m*.1.!.0 m 3 10 11 22 25 30 34 49 are connected columnwise to these 8+adj m*.1.!.0 m 11 18 19 30 33 38 42 57 Give such relations (not forgetting the diagonal) one just needs to construct the transitive closure then select and count unique rows. Several threads in comp.lang.apl have mentioned algorithms for transitive closure (16 hits in dejanews for "transitive and closure"). In particular Tom Chwastyk quotes one in APL (as opposed to the J above;) greg heil
http://www.***.com/

Wed, 18 Jun 1902 08:00:00 GMT 


Greg Hei #2 / 7

Interesting Programming Problem, Need Some Help
Quote:
> > ...Given a square array (NxN) filled with zeros and ones
(in practice, there are many more zeros than ones), what is the distribution of patterns of size 1, 2, 3, ...., where a pattern is defined any string of adjacent 1's, either horizontally or /and vertically distributed, and the number of "singlets" is also of interest Quote: > Sounds like a transitive closure problem over the relation
of adjacency of occupied cells. A unique representative for each connected component could also be found using the UNION/ FIND algorithms of Aho Hopcroft Ullman in almost linear time. Now that i think of it UNION/FIND is the way to go. Given the set of equations between cells developed below the UNION algorithm builds a forest of trees. When two trees are equated it walks up one (FIND) planting the other (and each node on the path) at the top of the other. Processing all equations thusly gives one tree for each connected component. A second pass is required to count the sizes of the trees. This stage uses FIND which starts at a node (occupied cell in the original array) and walks to the root (bringing up all nodes on the path) at the top, one would increment a counter. At the end of this stage you have a labeling as well. Timing of the algorithm is _ever so slightly_ super linear, by a factor related to the inverse of Ackermans function. The tree walk/lifting is a tad tricky but i did do it in '73 in APL (w/o AHU's help;) and could probably be induced to do it again if you find AHU inadequate. Quote: > Some background in J... > D m > .1.1...1 > ..11.1.. > ..111.11 > .1....1. > .11...1. > ..1..... > 11.1.... > .1..11.1 > To select the cells > adj=: ,# [:i. [:*/ $ > adj m > 1 3 7 10 11 13 18 19 20 22 23 25 30 33 34 38 42 48 49 51 57 60 61 63 > of which these > adj m*.1.!.0"1 m > 10 18 19 22 33 48 60 > are connected rowwise to these > 1+adj m*.1.!.0"1 m > 11 19 20 23 34 49 61 > and similarly these > adj m*.1.!.0 m > 3 10 11 22 25 30 34 49 > are connected columnwise to these > 8+adj m*.1.!.0 m > 11 18 19 30 33 38 42 57
Gathering the above together adj=: ,# [:i. [:*/ $ left=: 13 : '0 1+/adj y.*.1.!.0"1 y.' above=: 13 : '(0,#y.)+/ adj y.*.1.!.0 y.' (left,.above) m 10 18 19 22 33 48 60 3 10 11 22 25 30 34 49 11 19 20 23 34 49 61 11 18 19 30 33 38 42 57 Which are the equations that would need to be run on the forest seeded by adj m Using more complicated connectivity criteria, this is a common step in image analysis. greg heil
http://www.scn.org/tl/anvil

Wed, 18 Jun 1902 08:00:00 GMT 


Jim Weigan #3 / 7

Interesting Programming Problem, Need Some Help
Leonard Howell wrote on May 21: Quote: > Here is a interesting problem that perhaps someone has solved before > or has an idea of how to program the solution. Given a square array > (NxN) filled with zeros and ones (in practice, there are many more > zeros than ones), what is the distribution of patterns of size 1, 2, > 3, ...., where a pattern is defined any string of adjacent 1's, > either horizontally or /and vertically distributed, and the number > of "singlets" is also of interest To simply the problem, I'm > ignoring diagonal connections. For example, in the following array, > there are: 5 ones, 1 two, 1 three, 2 fours, and 1 six.
This is a variation/subset of the "blob finder" problem. Once you know the frequency of different size blobs, it seems likely that a followup request will be to identify the individual blobs in the matrix. Here's my algorithm for finding blobs: * Visit each nonzero cell (top to bottom, left to right) * If the cell is 1, assign it a new blob index (>= 2) * Examine the following neighbors of the cell: . . . . X n n n n * If a neighbor is 1, replace it with X's blob index * If a neighbor is not 1 or 0, remember that blob n is equivalent to blob X * After visiting all the cells, make another pass through the matrix, recoding the initial blob indices to merge equivalent blobs and use sequential indices The blob population counts can be accumulated by bumping a counter whenever you change a 1 to a blob index. This algorithm has enough inherent iterativeness that I didn't spend much time trying to "parallelize" it for fast APL execution. A straightforward iterative coding (for the APL+ dialect) is given below. It would make an interesting test for Bob Bernecky's APEX APL compiler, or it could be translated to a conventional compiled language. If it's really going to be run using an interpreter, further improvements could be made. (In particular, the :For K loop could be parallelized easily if Dyalog APL's P[i]{<}.+1 feature were available in APL+.) As written, the program follows links along diagonal lines in addition to vertical and horizontal lines. Line [16] can be modified if you want to ignore diagonal neighbors. If you don't want the blob indices and can't afford the space taken by the integer version of the matrix, the code could be modified to work with just two rows of the Boolean matrix at a time (as Bill Huber suggested). Here are some examples: B{<}1=?10 15{rho}4 B 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 1 1 1 0 1 0 0 1 0 0 0 1 1 1 1 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 DISP BLOBS B . . . . . . B . . . . A . . . 3 1 < three singletons . . . . B B B . . . . . . . . 2 2 < two doubles . . . . . . . . . . . . . . . 2 4 C C . . . D D . . . . . E . E 1 5 . . . . . . . . . . . . . E . 1 13 < one 13item blob . . . . . . . . . F . . E . E . . . . F . . G . F . . . . . . H . . F F . . . . F F F . I . . H . . . F F F F . . . . . . H H . . . F . . . . . . . . B8 0 0 0 0 0 0 1 0 Leonard's example 1 0 0 1 1 0 1 1 0 1 1 0 1 1 0 0 1 1 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 1 1 1 0 0 0 1 0 1 0 0 0 0 1 DISP BLOBS1 B8 BLOBS1 ignores diagonal neighbors . . . . . . A . 5 1 B . . C C . A A 1 2 . D D . C C . . 1 3 D D D . . . E . 2 4 . . D . . . . F 1 6 . G . . . . . F . . H H H . . . I . H . . . . J And here's the code: {del} Z{<}BLOBS B;A;C;F;I;J;K;P;R;S;T;U
[18]
[22]
[30] :Endif
[32] :Endif [33]
[36] T{<}{enclose}I J+K
[38] :Select A [39] :Caselist 0 U
[45] ((R=R[A])/R){<}U
[48] :Endselect [49] :Endfor [50]
[53]
[55] P{<}C{take}P [56]
[59] U{<}T={iota}{rho}T{<}R
[61] T{<}0,T[R]1
[64] Z{<}T[1+Z] [65]
[69] F{<}T[;2]GROUPEDSUM(1{take}{rho}T){rho}1 {+
[70] [71] Z{<}(1 1{drop}{neg}1 {neg}1{drop}Z) ({reverse}F) {+
{del} {del} Z{<}G GROUPEDSUM D;L;T
[13] D{<}D[T]
{del} {del} Z{<}DISP A;F
[2] Z F{<}A [3] Z{<}((2{times}2{pick}{rho}Z){rho}1 0)\('.',65{drop}#AV)[1+Z] [4] Z{<}Z F {del} For more information about the {keywords} used above, see http://www.chilton.com/~jimw/aplascii.html . Jim

Wed, 18 Jun 1902 08:00:00 GMT 


nhbkm.. #4 / 7

Interesting Programming Problem, Need Some Help
: Here is a interesting problem that perhaps someone has solved : before or has an idea of how to program the solution. Given a : square array (NxN) filled with zeros and ones (in practice, there : are many more zeros than ones), what is the distribution of patterns : of size 1, 2, 3, ...., where a pattern is defined any string of adjacent : 1's, either horizontally or /and vertically distributed, and the number : of "singlets" is also of interest To simply the problem, I'm ignoring : diagonal connections. For example, in the following array, there are: : 5 ones, 1 two, 1 three, 2 fours, and 1 six. : 0 0 0 0 0 0 1 0 : 1 0 0 1 1 0 1 1 : 0 1 1 0 1 1 0 0 : 1 1 1 0 0 0 1 0 : 0 0 1 0 0 0 0 1 : 0 1 0 0 0 0 0 1 : 0 0 1 1 1 0 0 0 : 1 0 1 0 0 0 0 1 : APL and FORTRAN solutions preferred, but any language appreciated. : Thanks, Leonard Howell A pretty simple solution using Fortran can be found at ftp://caver.muk.unihannover.de/pub/f90/area2d.f90 It is probably less efficient than those proposed by the previous posters. But efficiency was not very important for the purpose I have written it for. (The alternative was counting manually :) OTOH it's not too bad, can work both with and without diagonal connections and it works reliably even for the strangest input. The algorithm is working on logical arrays and is looking recursively for all .TRUE. neighbours of a .TRUE. cell and marks them as .FALSE. in a workcopy of the input array. In other words: it is traversing trees, each covering one continous area. This is repeated for all "still.TRUE." cells in a single pass. The maximum possible recursion depth can be huge. In worst case it equals the number of cells of the largest area. Thus, to avoid stack overflows, the scan function is implemented without using recursion but mimics its own stack by using a linked list of visited cells. Regards Michael  ***************************************************************************** Michael Steffens Institut f. Meteorologie u. Klimatologie Tel.: +495117624413 Universitaet Hannover
PGP fingerprint = FA BE 6C 1C F6 C3 EC 33 DD 42 6B 7F DE CF 84 B8

Wed, 18 Jun 1902 08:00:00 GMT 


Leonard Howel #5 / 7

Interesting Programming Problem, Need Some Help
Jim, Many thanks for the program. I also downloaded your APLASCII.w3 workspace and when I tried to run the programs, I got an evolution error from line 2 of DISP which looks like Z F{<}A. I replaced it with (Z F){<}A and that took care of the problem. I'm using APL+WIN Ver. 3  is there some "setting" that needs to be set to accept code such as Z F{<}A ? Also, I ran BLOBS several times and found a case where the program apparently needs some tweaking. I haven't dug through the logic yet enough to know where to make changes, so I thought you'd like to take a look at the following case and decide best how to fine tune BLOBS. A 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 1 0 0 1 1 0 0 1 1 0 1 1 1 0 0 0 0 0 0 0 1 0 0 1 1 0 1 1 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 1 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0 1 1 0 1 0 0 BLOBS A 0 0 1 0 0 0 0 0 0 0 3 1 8 0 0 0 0 8 0 0 0 0 1 2 0 8 8 0 0 8 8 0 0 2 1 5 8 0 8 8 8 0 0 0 0 0 1 21 0 0 8 0 0 8 8 0 3 3 0 8 0 0 8 0 8 0 0 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 5 0 8 8 8 0 0 0 0 0 0 5 8 0 0 0 0 4 0 5 5 0 5 0 0 note that the 5's should really be 8's. Thanks, Leonard

Wed, 18 Jun 1902 08:00:00 GMT 


Leonard Howel #6 / 7

Interesting Programming Problem, Need Some Help
I want to thank everyone for the tremendous response and and help I received on this problem. I have received programs in C++, FORTRAN90, J, and APL to solve this problem, as well as important verbal insights on this problem and its associated programming. Right now APL is the only language I feel very confortable in programming, and for that reason I chose to first concentrate on Jim Weigang's provided APL program. However, I am also e{*filter*}d about trying to learn J in the near future when I can get a little money to buy it as it looks very powerful too. I've got FORTRAN90 but I still only use FORTRAN77, so again it's a matter of finding the time to sit down and learn it. Below is a note I sent to Jim W. this morning regarding this problem and everyone's help, and I thought I'd post it here for those that are interested. Thanks so much for the apl program and all the analysis you did. It will most likely be an important part of a large Monte Carlo simulation of the performance characteristics of the Orbiting array of Wideangle Light collectors (OWL): a Pair of Earth Orbiting "Eyes" to Study Air Showers Initiated by 10^20 Electron Volt Quanta. The matrix cells can be thought of as pixels of the detector and the blobs will be flashes caused by incident cosmic rays and photon noise in the atmosphere (e.g., star light reflections). We will probably want to add other features for discriminating the blobs, e.g., look for straight line patterns, at a later date. We hope to first fly a proof of concept instrument on a high altitude balloon with something like a 50 by 50 pixel detector (that should run very fast !), and then later a spaceborne detector on a satellite with a 1000 by 1000 pixel imager. For the next few days I will be getting more familiar with the science, etc., and then resume programming. The apl news group and the Fortran group has been a tremendous help on this problem and it really underscores the utility of the Internet, news groups, etc., and I will give appropriate references to you and others that have contributed when I document (and hopefully publish) what comes out of all this. We will likely have a homepage for this project before too long and I'll post things of interest there too. (you can take a look at one of our other project homepages at http://www.***.com/ ). I'll post this to the apl newsgroup too so everyone can get a feel for the application. I'm e{*filter*}d to see that apl can solve the large array problem (1000 by 1000) in realistic time, and I guess from everything I've been exposed to from the many helpful responses I received, it is time to think about learning another language. J looks very interesting and powerful, and then I also received a C++ program and a FORTRAN90 program in response to this post, so I will need to look at them sometime too, as I do have a C compiler (came free with my Fortran compiler). I don't have J but might buy it when I can get some more money here at work. I recently bought a Micron PC for $2500 and then the APL+WIN 3.0 upgrade was $1200, so I'm at a point where I need to be productive with what I've got, at least for a while, before I go asking for more software tools. Thanks again and I'll *talk* to you soon, Leonard

Wed, 18 Jun 1902 08:00:00 GMT 


Aaron Kulki #7 / 7

Interesting Programming Problem, Need Some Help
Hmmmmm, whether you realize it or not, I think you hit on a really good solution... Quote:
> > > Here is a interesting problem that perhaps someone has solved > > > before or has an idea of how to program the solution. Given a > > > square array (NxN) filled with zeros and ones (in practice, there > > > are many more zeros than ones), what is the distribution of patterns > > > of size 1, 2, 3, ...., where a pattern is defined any string of adjacent > > > 1's, either horizontally or /and vertically distributed, and the number > > > of "singlets" is also of interest To simply the problem, I'm ignoring > > > diagonal connections. For example, in the following array, there are: > > > 5 ones, 1 two, 1 three, 2 fours, and 1 six. > > > 0 0 0 0 0 0 1 0 > > > 1 0 0 1 1 0 1 1 > > > 0 1 1 0 1 1 0 0 > > > 1 1 1 0 0 0 1 0 > > > 0 0 1 0 0 0 0 1 > > > 0 1 0 0 0 0 0 1 > > > 0 0 1 1 1 0 0 0 > > > 1 0 1 0 0 0 0 1 > > > APL and FORTRAN solutions preferred, but any language appreciated. > > > Thanks, Leonard Howell > >  > > I can't offer code, but here is a simple approach. > > The general idea is to scan all the cells in the matrix and count the > > runs of which each cell is the anchor. A cell is an anchor of a run if > > it is the leftmost or bottommost cell in the run (those directions are > > chosen arbitrarily). Taking a vertical run as an example, a cell is the > > bottommost of a run if its value is one and the position below it is > > not. Now count ones up the column to find the run length. If the > > length exceeds one, increment the corresponding frequency counter. The > > same idea works in other directions. Since each run has exactly one > > anchor, this algorithm can't count a row more than once. If the code > > looks for runs in four different directions, it will find each run. If > > a cell has four runs of length one, it's a singleton. > > To make the programming easier in FORTRAN, start by appending rows of > > zeros above and below the matrix, and appending columns of zeros at its > > left and right edges. Then loop over only the original matrix. > > I think the algorithm can be executed for all cells in parallel for APL, > > but I don't see how to count singletons cheaply in APL. > >  > > Jonathan Fry > > Developer > > SPSS Inc.
>  > Kindly ignore my reply above. It solves a different problem. >  > Jonathan Fry > Developer > SPSS Inc.
 Aaron R. Kulkis Unix Systems Administrator  I speak for me, not my employer.  Threads: a poor substitute that you implement in an OS when it can't handle true context switching properly.

Wed, 18 Jun 1902 08:00:00 GMT 


