Interesting Programming Problem, Need Some Help
Author Message
Interesting Programming Problem, Need Some Help

Quote:

> ...Given a square array (NxN) filled with zeros and ones

(in practice, there are many more zeros than ones), what
is the distribution of patterns of size 1, 2, 3, ...., where
a pattern is defined any string of adjacent 1's, either
horizontally or /and vertically distributed, and the number
of "singlets" is also of interest

Sounds like a transitive closure problem over the relation
of adjacency of occupied cells. A unique representative for
each connected component could also be found using the UNION/
FIND algorithms of Aho Hopcroft Ullman in almost linear time.

Quote:
> APL and fortran solutions preferred, but any language appreciated.

Some background in J...

D m
.1.1...1
..11.1..
..111.11
.1....1.
.11...1.
..1.....
11.1....
.1..11.1

To select the cells
1 3 7 10 11 13 18 19 20 22 23 25 30 33 34 38 42 48 49 51 57 60 61 63
of which these
10 18 19 22 33 48 60
are connected rowwise to these
11 19 20 23 34 49 61
and similarly these
3 10 11 22 25 30 34 49
are connected columnwise to these
11 18 19 30 33 38 42 57

Give such relations (not forgetting the diagonal) one
just needs to construct the transitive closure then
select and count unique rows.

Several threads in comp.lang.apl have mentioned algorithms
for transitive closure (16 hits in dejanews for "transitive
and closure"). In particular Tom Chwastyk quotes one in
APL (as opposed to the J above;)

greg heil

http://www.*-*-*.com/

Wed, 18 Jun 1902 08:00:00 GMT
Interesting Programming Problem, Need Some Help

Quote:

> > ...Given a square array (NxN) filled with zeros and ones

(in practice, there are many more zeros than ones), what
is the distribution of patterns of size 1, 2, 3, ...., where
a pattern is defined any string of adjacent 1's, either
horizontally or /and vertically distributed, and the number
of "singlets" is also of interest

Quote:
> Sounds like a transitive closure problem over the relation

of adjacency of occupied cells. A unique representative for
each connected component could also be found using the UNION/
FIND algorithms of Aho Hopcroft Ullman in almost linear time.

Now that i think of it UNION/FIND is the way to go. Given the
set of equations between cells developed below the UNION algorithm
builds a forest of trees. When two trees are equated it walks up
one (FIND) planting the other (and each node on the path) at the top
of the other. Processing all equations thusly gives one tree for each
connected component. A second pass is required to count the sizes
of the trees. This stage uses FIND which starts at a node (occupied
cell in the original array) and walks to the root (bringing up all
nodes on the path) at the top, one would increment a counter. At
the end of this stage you have a labeling as well.

Timing of the algorithm is _ever so slightly_ super linear, by
a factor related to the inverse of Ackermans function. The tree
walk/lifting is a tad tricky but i did do it in '73 in APL (w/o
AHU's help;) and could probably be induced to do it again if you

Quote:
> Some background in J...
>         D m
> .1.1...1
> ..11.1..
> ..111.11
> .1....1.
> .11...1.
> ..1.....
> 11.1....
> .1..11.1
> To select the cells
>         adj=: ,# [:i. [:*/ \$
> 1 3 7 10 11 13 18 19 20 22 23 25 30 33 34 38 42 48 49 51 57 60 61 63
> of which these
> 10 18 19 22 33 48 60
> are connected rowwise to these
> 11 19 20 23 34 49 61
> and similarly these
> 3 10 11 22 25 30 34 49
> are connected columnwise to these
> 11 18 19 30 33 38 42 57

Gathering the above together

left=: 13 : '0 1+/adj y.*.1|.!.0"1 y.'
above=: 13 : '(0,#y.)+/ adj y.*.1|.!.0 y.'
(left,.above) m
10 18 19 22 33 48 60  3 10 11 22 25 30 34 49
11 19 20 23 34 49 61 11 18 19 30 33 38 42 57

Which are the equations that would need to be run on the forest
seeded by

Using more complicated connectivity criteria, this is a common
step in image analysis.

greg heil

http://www.scn.org/tl/anvil

Wed, 18 Jun 1902 08:00:00 GMT
Interesting Programming Problem, Need Some Help

Leonard Howell wrote on May 21:

Quote:
> Here is a interesting problem that perhaps someone has solved before
> or has an idea of how to program the solution.  Given a square array
> (NxN) filled with zeros and ones (in practice, there are many more
> zeros than ones), what is the distribution of patterns of size 1, 2,
> 3, ...., where a pattern is defined any string of adjacent 1's,
> either horizontally or /and vertically distributed, and the number
> of "singlets" is also of interest  To simply the problem, I'm
> ignoring diagonal connections. For example, in the following array,
> there are:  5 ones, 1 two, 1 three, 2 fours, and 1 six.

This is a variation/subset of the "blob finder" problem.  Once you know
the frequency of different size blobs, it seems likely that a followup
request will be to identify the individual blobs in the matrix.

Here's my algorithm for finding blobs:

* Visit each nonzero cell (top to bottom, left to right)

* If the cell is 1, assign it a new blob index (>= 2)

* Examine the following neighbors of the cell:
. . .
. X n
n n n

* If a neighbor is 1, replace it with X's blob index

* If a neighbor is not 1 or 0, remember that blob n is
equivalent to blob X

* After visiting all the cells, make another pass through the
matrix, recoding the initial blob indices to merge equivalent
blobs and use sequential indices

The blob population counts can be accumulated by bumping a counter
whenever you change a 1 to a blob index.

This algorithm has enough inherent iterativeness that I didn't spend
much time trying to "parallelize" it for fast APL execution.  A
straightforward iterative coding (for the APL+ dialect) is given below.
It would make an interesting test for Bob Bernecky's APEX APL compiler,
or it could be translated to a conventional compiled language.  If it's
really going to be run using an interpreter, further improvements could
be made.  (In particular, the :For K loop could be parallelized easily
if Dyalog APL's P[i]{<-}.+1 feature were available in APL+.)

to vertical and horizontal lines.  Line [16] can be modified if you
want to ignore diagonal neighbors.  If you don't want the blob indices
and can't afford the space taken by the integer version of the matrix,
the code could be modified to work with just two rows of the Boolean
matrix at a time (as Bill Huber suggested).

Here are some examples:

B{<-}1=?10 15{rho}4
B
0 0 0 0 0 0 1 0 0 0 0 1 0 0 0
0 0 0 0 1 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 0 0 0 1 1 0 0 0 0 0 1 0 1
0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 1 0 0 1 0 1
0 0 0 0 1 0 0 1 0 1 0 0 0 0 0
0 1 0 0 1 1 0 0 0 0 1 1 1 0 1
0 0 1 0 0 0 1 1 1 1 0 0 0 0 0
0 1 1 0 0 0 1 0 0 0 0 0 0 0 0

DISP BLOBS B
. . . . . . B . . . . A . . .     3  1    <- three singletons
. . . . B B B . . . . . . . .     2  2    <- two doubles
. . . . . . . . . . . . . . .     2  4
C C . . . D D . . . . . E . E     1  5
. . . . . . . . . . . . . E .     1 13    <- one 13-item blob
. . . . . . . . . F . . E . E
. . . . F . . G . F . . . . .
. H . . F F . . . . F F F . I
. . H . . . F F F F . . . . .
. H H . . . F . . . . . . . .

B8
0 0 0 0 0 0 1 0                 Leonard's example
1 0 0 1 1 0 1 1
0 1 1 0 1 1 0 0
1 1 1 0 0 0 1 0
0 0 1 0 0 0 0 1
0 1 0 0 0 0 0 1
0 0 1 1 1 0 0 0
1 0 1 0 0 0 0 1

DISP BLOBS1 B8             BLOBS1 ignores diagonal neighbors
. . . . . . A .     5 1
B . . C C . A A     1 2
. D D . C C . .     1 3
D D D . . . E .     2 4
. . D . . . . F     1 6
. G . . . . . F
. . H H H . . .
I . H . . . . J

And here's the code:

{del} Z{<-}BLOBS B;A;C;F;I;J;K;P;R;S;T;U

[18]

[22]

[30]          :Endif

[32]      :Endif
[33]

[36]          T{<-}{enclose}I J+K

[38]          :Select A
[39]          :Caselist 0 U

[45]              ((R=R[A])/R){<-}U

[48]          :Endselect
[49]      :Endfor
[50]

[53]

[55]  P{<-}C{take}P
[56]

[59]  U{<-}T={iota}{rho}T{<-}R

[61]  T{<-}0,T[R]-1

[64]  Z{<-}T[1+Z]
[65]

[69]  F{<-}T[;2]GROUPEDSUM(1{take}{rho}T){rho}1 {+

[70]
[71]  Z{<-}(1 1{drop}{neg}1 {neg}1{drop}Z) ({reverse}F) {+

{del}

{del} Z{<-}G GROUPEDSUM D;L;T

[13]  D{<-}D[T]

{del}

{del} Z{<-}DISP A;F

[2]   Z F{<-}A
[3]   Z{<-}((2{times}2{pick}{rho}Z){rho}1 0)\('.',65{drop}#AV)[1+Z]
[4]   Z{<-}Z F
{del}

http://www.chilton.com/~jimw/aplascii.html .

Jim

Wed, 18 Jun 1902 08:00:00 GMT
Interesting Programming Problem, Need Some Help

: Here is a interesting problem that perhaps someone has solved
: before or has an idea of how to program the solution.  Given a
: square array (NxN) filled with zeros and ones (in practice, there
: are many more zeros than ones), what is the distribution of patterns
: of size 1, 2, 3, ...., where a pattern is defined any string of adjacent
: 1's, either horizontally or /and vertically distributed, and the number
: of "singlets" is also of interest  To simply the problem, I'm ignoring
: diagonal connections. For example, in the following array, there are:
:  5 ones, 1 two, 1 three, 2 fours, and 1 six.

: 0 0 0 0 0 0 1 0
: 1 0 0 1 1 0 1 1
: 0 1 1 0 1 1 0 0
: 1 1 1 0 0 0 1 0
: 0 0 1 0 0 0 0 1
: 0 1 0 0 0 0 0 1
: 0 0 1 1 1 0 0 0
: 1 0 1 0 0 0 0 1

: APL and FORTRAN solutions preferred, but any language appreciated.
: Thanks, Leonard Howell

A pretty simple solution using Fortran can be found at

ftp://caver.muk.uni-hannover.de/pub/f90/area2d.f90

It is probably less efficient than those proposed by the previous posters.
But efficiency was not very important for the purpose I have written it
for. (The alternative was counting manually :-)
OTOH it's not too bad, can work both with and without diagonal connections
and it works reliably even for the strangest input.
The algorithm is working on logical arrays and is looking recursively for
all .TRUE. neighbours of a .TRUE. cell and marks them as .FALSE. in a
work-copy of the input array. In other words: it is traversing trees,
each covering one continous area. This is repeated for all "still-.TRUE."
cells in a single pass.
The maximum possible recursion depth can be huge. In worst case it equals
the number of cells of the largest area. Thus, to avoid stack overflows,
the scan function is implemented without using recursion but mimics its own
stack by using a linked list of visited cells.

Regards
Michael
--
*****************************************************************************
Michael Steffens                     Institut f. Meteorologie u. Klimatologie
Tel.: +49-511-7624413                                   Universitaet Hannover

PGP fingerprint = FA BE 6C 1C F6 C3 EC 33  DD 42 6B 7F DE CF 84 B8

Wed, 18 Jun 1902 08:00:00 GMT
Interesting Programming Problem, Need Some Help

Jim,

and when I tried to run the programs, I got an evolution error from line 2
of DISP which looks like Z F{<-}A.  I replaced it with (Z F){<-}A and that
took care of the problem.  I'm using APL+WIN Ver. 3 - is there some
"setting" that needs to be set to accept code such as Z F{<-}A  ?

Also, I ran BLOBS several times and found a case where the program
apparently needs some tweaking.  I haven't dug through the logic yet enough
to know where to make changes, so I thought you'd like to take a look at the
following case and decide best how to fine tune BLOBS.

A
0 0 1 0 0 0 0 0 0 0
1 0 0 0 0 1 0 0 0 0
0 1 1 0 0 1 1 0 0 1
1 0 1 1 1 0 0 0 0 0
0 0 1 0 0 1 1 0 1 1
0 1 0 0 1 0 1 0 0 0
0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 1 0 1 1 1
0 0 0 0 0 0 1 1 0 0
0 0 1 0 1 1 0 1 0 0

BLOBS A
0 0 1 0 0 0 0 0 0 0    3  1
8 0 0 0 0 8 0 0 0 0    1  2
0 8 8 0 0 8 8 0 0 2    1  5
8 0 8 8 8 0 0 0 0 0    1 21
0 0 8 0 0 8 8 0 3 3
0 8 0 0 8 0 8 0 0 0
0 0 0 0 0 0 0 8 0 0
0 0 0 0 0 5 0 8 8 8
0 0 0 0 0 0 5 8 0 0
0 0 4 0 5 5 0 5 0 0

note that the 5's should really be 8's.

Thanks, Leonard

Wed, 18 Jun 1902 08:00:00 GMT
Interesting Programming Problem, Need Some Help

I want to thank everyone for the tremendous response and and help I received
on this problem.  I have received programs in C++, FORTRAN90,  J, and APL to
solve this problem, as well as important verbal insights on this problem and
its associated programming.  Right now APL is the only language I feel very
confortable in programming, and for that reason I chose to first concentrate
on Jim Weigang's provided APL program.  However, I am also e{*filter*}d about
trying to learn J in the near future when I can get a little money to buy it
as it looks very powerful too.  I've got FORTRAN90 but I still only use
FORTRAN77, so again it's a matter of finding the time to sit down and learn
it.  Below is a note I sent to Jim W. this morning regarding this problem
and everyone's help, and I thought I'd post it here for those that are
interested.

Thanks so much for the apl program and all the analysis you did.  It will
most likely be an important part of a large Monte Carlo simulation of the
performance characteristics of the Orbiting array of Wide-angle Light
collectors (OWL): a Pair of Earth Orbiting "Eyes" to Study Air Showers
Initiated by 10^20 Electron Volt Quanta.  The matrix cells can be thought of
as pixels of the detector and the blobs will be flashes caused by incident
cosmic rays and photon noise in the atmosphere (e.g., star light
reflections).  We will probably want to add other features for
discriminating the blobs, e.g., look for straight line patterns, at a later
date.  We hope to first fly a proof of concept instrument on a high altitude
balloon with something like a 50 by 50 pixel detector (that should run very
fast !), and then later a spaceborne detector on  a satellite with a 1000 by
1000 pixel imager.

For the next few days I will be getting more familiar with the science,
etc., and then resume programming.  The apl news group and the Fortran group
has been a tremendous help on this problem and it really underscores the
utility of the Internet, news groups, etc., and I will give appropriate
references to you and others that have contributed when I document (and
hopefully publish) what comes out of all this.  We will likely have a
homepage for this project before too long and I'll post things of interest
there too.  (you can take a look at one of our other project homepages at
http://www.*-*-*.com/ ).

I'll post this to the apl newsgroup too so everyone can get a feel for the
application.  I'm e{*filter*}d to see that apl can solve the large array problem
(1000 by 1000) in realistic time, and I guess from everything I've been
exposed to from the many helpful responses I received, it is time to think
about learning another language.  J looks very interesting and powerful, and
then I also received a C++ program and a FORTRAN90 program in response to
this post, so I will need to look at them sometime too, as I do have a C
compiler (came free with my Fortran compiler).  I don't have J but might buy
it when I can get some more money here at work.  I recently bought a Micron
PC for \$2500 and then the APL+WIN 3.0 upgrade was \$1200, so I'm at a point
where I need to be productive with what I've got, at least for a while,
before I go asking for more software tools.

Thanks again and I'll *talk* to you soon,

Leonard

Wed, 18 Jun 1902 08:00:00 GMT
Interesting Programming Problem, Need Some Help

Hmmmmm, whether you realize it or not, I think you hit on a really
good solution...

Quote:

> > > Here is a interesting problem that perhaps someone has solved
> > > before or has an idea of how to program the solution.  Given a
> > > square array (NxN) filled with zeros and ones (in practice, there
> > > are many more zeros than ones), what is the distribution of patterns
> > > of size 1, 2, 3, ...., where a pattern is defined any string of adjacent
> > > 1's, either horizontally or /and vertically distributed, and the number
> > > of "singlets" is also of interest  To simply the problem, I'm ignoring
> > > diagonal connections. For example, in the following array, there are:
> > >  5 ones, 1 two, 1 three, 2 fours, and 1 six.

> > > 0 0 0 0 0 0 1 0
> > > 1 0 0 1 1 0 1 1
> > > 0 1 1 0 1 1 0 0
> > > 1 1 1 0 0 0 1 0
> > > 0 0 1 0 0 0 0 1
> > > 0 1 0 0 0 0 0 1
> > > 0 0 1 1 1 0 0 0
> > > 1 0 1 0 0 0 0 1

> > > APL and FORTRAN solutions preferred, but any language appreciated.
> > > Thanks, Leonard Howell

> > --
> > I can't offer code, but here is a simple approach.

> > The general idea is to scan all the cells in the matrix and count the
> > runs of which each cell is the anchor.  A cell is an anchor of a run if
> > it is the leftmost or bottommost cell in the run (those directions are
> > chosen arbitrarily).  Taking a vertical run as an example, a cell is the
> > bottommost of a run if its value is one and the position below it is
> > not.  Now count ones up the column to find the run length.  If the
> > length exceeds one, increment the corresponding frequency counter.  The
> > same idea works in other directions.  Since each run has exactly one
> > anchor, this algorithm can't count a row more than once.  If the code
> > looks for runs in four different directions, it will find each run.  If
> > a cell has four runs of length one, it's a singleton.

> > To make the programming easier in FORTRAN, start by appending rows of
> > zeros above and below the matrix, and appending columns of zeros at its
> > left and right edges.  Then loop over only the original matrix.

> > I think the algorithm can be executed for all cells in parallel for APL,
> > but I don't see how to count singletons cheaply in APL.
> > --------------------
> > Jonathan Fry
> > Developer
> > SPSS Inc.

> --
> Kindly ignore my reply above.  It solves a different problem.
> --------------------
> Jonathan Fry
> Developer
> SPSS Inc.

--

Aaron R. Kulkis

--------------------------------
I speak for me, not my employer.
--------------------------------

Threads: a poor substitute that you implement in an OS
when it can't handle true context switching properly.

Wed, 18 Jun 1902 08:00:00 GMT

 Page 1 of 1 [ 7 post ]

Relevant Pages