Good code/bad code & looping 
Author Message
 Good code/bad code & looping

Some recent posts on c.l.a have drawn me out of lurk mode to complain.
My apologies to the unnamed author whose material I'm commenting on.
I'm trying to criticize the ideas, which are fairly widespread, not the
person who happened to utter them here.  I hope my comments will not
discourage further postings by the author.

   A recent posting claimed that for "most things that one wants to do,
.. there's a better way to do it in APL than using a loop".  This may
be true, depending on what you want to do, but I believe it gives an
incorrect impression about the use of loops in APL code.  My experience
is that virtually all real-life nontrivial tasks involve the use of
loops and ifs in APL, just not as many as in other languages.  Many of
these tasks simply cannot be coded without loops.  Others can be coded
either with or without loops.

   Some loops can be eliminated by using operations on larger arrays.
Doing so usually involves a tradeoff between the number of primitive
functions executed and the amount of memory used.  A certain amount of
loop elimination is important for achieving reasonable efficiency in
APL.  But once the loop elimination has yielded an array size of O(1000)
elements, there is little CPU efficiency to be gained by trading a loop
for a larger array operation.  Doing so just results in increased
memory consumption, not significantly faster execution.

   But relentless elimination of every possible loop also has a human
cost: it can make APL code very difficult to write, painful for others
to read, and much harder to modify.  Some examples:

 - If you want to change a program to handle a special case in a loop,
   it's trivial to add an if statement that detects the case and executes
   the special code.  If the loop is being performed instead via array
   operations, the change may be much more difficult to implement, and it
   may involve inefficient techniques such as performing the special code
   for every element and then zeroing out the result for non-special
   elements.

 - One project I'm working on contains code written by a programmer who
   went overboard eliminating loops.  The problem does not intrinsically
   require a lot of memory, but his code gives WS FULL even with 4
   megabytes of workspace available.  So we're having to modify the code
   to pull some of the loops out of the (nested) array operations. The
   code is virtually incomprehensible, with {each}, {mix}, and {split}
   splattered all over the place, and about five everchanging dimensions/
   layers of arrays/nesting.  The rewritten code is easier to understand
   because it involves juggling one or two fewer dimensions at once, with
   the removed dimensions being dealt with at a higher level using loops.
   A loop can be a useful tool in reducing a complex problem to a
   simpler, more easily understood level.

 - In his APL85 paper (on pg. 82), Clark Wiedmann describes a
   programming problem posed to STSC employees.  The task was to model a
   state-switching device that toggled state based on signals from two
   separate "on" and "off" wires.  The device can be modeled trivially

   12 programmers contributed 14 distinct solutions.  Nine of the first
   ten were incorrect.  [The correct one was the looping version.]  The
   eleventh solution (by Jeff Chilton) was the first correct one that was
   non-iterative.  It was:

   1{drop}((1,OFF>ON)PORSCAN 0,ON>OFF){/=}(1,OFF{/=}ON)PNESCAN 0,ON^OFF

   The two subroutines [are partitioned or-scan and notequal-scan, from
   Bob Smith's APL79 paper]."  Clark estimated that the "quest for a
   non-looping solution cost about $13,000 in programming time and
   correspondence."  A year or two after this description was published,
   someone sent Clark a stunningly short solution that everyone had
   missed.  One should not underestimate the amount of work it sometimes
   takes to find noniterative solutions in APL, or assume that finding
   such solutions is worth the amount of time spent on them.

   The posting goes on to call looping code "bad" and programmers that
write such code "lazy".  Nonlooping code is "good, idiomatic, APL".
Personally, I don't think this attitude is at all helpful in trying to
increase the use of APL.  In other languages, if a program works
correctly and is documented, you're done and you use the program.  If
the APL community demands that users to live up to some higher standard
of writing "good" nonlooping programs, we are imposing an extra burden
not imposed by other languages, and this will turn users away from APL.
Not everyone is skilled enough to write nonlooping solutions, and not
every loop should be eliminated.  I think it's vitally important that
nonlooping solutions be presented to novices in terms of, "Look, here's
an _easier_ way of doing the job, and here's a clear explanation of how
it works.  It's efficient, and once you see how it works, it really
isn't that obscure."  Nonlooping solutions that can't be presented in
this way, because they are outrageously complex, should not be put
forward as examples of good APL programming.

   Manugistsc's control structures are described as an "abomination"
that "{*filter*} the language", resulting in something that is "purportedly
APL."  Come on, they're just another way of writing branch statements.
If you don't use :IF X, you'll end up doing the same thing with
{goto}(~X)/L1.  If they seem clunky (in comparison to symbolic
alternatives that have been proposed), perhaps that will discourage
excessive use of them.  And if they seem too conventional, bear in mind
that many ideas which seem good on paper turn out to be a bad idea when
you try them in practice.  Most of the proposals for more "APLish"
control structures have not been put into practice, and nobody can be
sure whether or not they would really be palatable.  By adopting the
standard control structures found in other languages, STSC implemented a
tried and true (and already familiar) notation.  They did so without
making massive changes to the language.  You can easily write a program
that translates :keywords to branch statements when porting APL*PLUS III
code to other APLs.  I think they did a fine job, and if you don't like
the result you can always stick to branching and labels.

                                                Jim



Fri, 14 Mar 1997 05:13:54 GMT  
 Good code/bad code & looping

Quote:

>Some recent posts on c.l.a have drawn me out of lurk mode to complain.
>My apologies to the unnamed author whose material I'm commenting on.

Wasn't me :-) but I hope I did not give the impression that I think good
APL must be "loopless" or completely aviod gotos.  Near the beginning of
this thread (Sept 1) I wrote the following in a reply to Eke:

"I would guess, that a simple if-then-else block would alleviate this problem
tremendously.  I.e., most of the labels and gotos are if-then-else rather
than loops.  The problem is that visually they are indistinguishable so the
combination looks like spagetti.  Right now APLers are forced to write code
that is/appear flow-oriented because there is no if-then-else (blocks)...

"Attractive ideas do not necessarily make good proposals.  Everybody proposes
something different because nothing yet works...

"I think it will have to be a modest proposal (I really hate to say that).
Something like a simple if-then-else nested block, or perhaps a GOSUB."

I will say again, that a defect shared by many (most?) previous proposals
is that "control" is not founded on the basic lessons of structured
programming.  Specifically, I'm referring to controls that are single-line
oriented, unnested/unnestable, lacking an "else" provision, or "unsafe"
functional expressions which amount to executable strings.  In the past
month, we discussed multi-line nested blocks and lexically scoped local
functions at length.  I think this is real progress.  I also think these
facilities _can_ and should be added to APL.

Quote:
>   But relentless elimination of every possible loop also has a human
>cost: it can make APL code very difficult to write, painful for others
>to read, and much harder to modify...

One should write code in the loopless _or any other_ style only if it makes
the code less complex (yes a subjective criterion), or the performance benefits
are really critical.  Sometimes loopless code is both idiomatic AND efficient.
(Note I did not say loopless=idiomatic, or loopy not-in idiomatic.)

Quote:
> - One project I'm working on contains code written by a programmer who
>   went overboard eliminating loops...  The
>   code is virtually incomprehensible, with {each}, {mix}, and {split}
>   splattered all over the place, and about five everchanging dimensions/
>   layers of arrays/nesting...

I will take it on faith that the above code really is incomprehensible, i.e.
BAD.  In general however, "high dimensional" or "deep" APL programming CAN
be simpler than loops etc.  Although an old study shows "most" APLers seldom
if ever use dimensions greater than three, I don't think this should be taken
as suggesting that > 3D (or dyadic transpose) is BAD.  If a company expects
that future maintenance will be done by APLers who are high-D disadvantaged,
the company/manager should explain it IN ADVANCE to the "clever" programmer.

Quote:
>   The posting goes on to call looping code "bad" and programmers that
>write such code "lazy".  Nonlooping code is "good, idiomatic, APL".
>Personally, I don't think this attitude is at all helpful in trying to
>increase the use of APL.  In other languages, if a program works
>correctly and is documented, you're done and you use the program.  If
>the APL community demands that users to live up to some higher standard
>of writing "good" nonlooping programs, we are imposing an extra burden
>not imposed by other languages, and this will turn users away from APL.

I'm not sure I agree (to both points).  In a _compiled_ language, maybe,
possibly, to some extent, in the world of 100% correct specifications...
If the task requires high efficiency, then _interpreted_ solutions must
be tuned, at the very least, if not flatly discarded.  As for the other
point, I'm not sure that telling people "yes you can write loops in APL"
will do much good either.  What would be the point of using slow-interpreted
structure-deficient funny-symbols incompatible-keyboard APL if one cannot
or is not interested in writing "good nonlooping programs" when the need
arises?  The APLer has to know the "tricks of the trade" because the need
_will_ arise.  The "extra burden" is imposed not by matters of philosophy
but by interpretive overhead.

(I agree that a novice--or expert--shouldn't be admonished for writing loops,
unless it is bad code or grossly expensive.)

Quote:
>Not everyone is skilled enough to write nonlooping solutions, and not
>every loop should be eliminated.  

On the other hand, not everyone is skilled enough to write looping solutions,
and not every hard APL idiom should be eliminated.

Quote:
>I think it's vitally important that
>nonlooping solutions be presented to novices in terms of, "Look, here's
>an _easier_ way of doing the job, and here's a clear explanation of how
>it works.  It's efficient, and once you see how it works, it really
>isn't that obscure."  Nonlooping solutions that can't be presented in
>this way, because they are outrageously complex, should not be put
>forward as examples of good APL programming.

Agreed.  However, they might embody important techniques that are useful
elsewhere.  For example, the dyadic rotate in flat APL for expressing
"parallelism".  Or, uses of unequal or various other scan.  Or, loopless
code to transform a string into a table.  These _are_ nonobvious, even
complex tricks (now rendered somewhat obsolete by nested APL), but for a
long time they were an essential part of the APL craft.  Should they no
longer be taught?  I would like to hear people's opinion on this one.

Quote:
>If they [Manugistics control structs] seem clunky (in comparison to symbolic
>alternatives that have been proposed), perhaps that will discourage
>excessive use of them.

This is an interesting argument in favor of "clunkiness" (not my term).

Quote:
>And if they seem too conventional, bear in mind
>that many ideas which seem good on paper turn out to be a bad idea when
>you try them in practice.  Most of the proposals for more "APLish"
>control structures have not been put into practice, and nobody can be
>sure whether or not they would really be palatable.

Nested blocks and lexically scoped subroutines have been effectively used
since Algol60, and I _think_ they will be "palatable" in APL.  (Function
arrays on the other hand may or may not be...)

Quote:
>By adopting the
>standard control structures found in other languages, STSC implemented a
>tried and true (and already familiar) notation.

Flow control alone does not structured programming make.  I do not object
(much) to the "notation"--only to the lack of _real_ structured programming
facilities like safe subroutines.  REAL STRUCTURE (in your words, "standard
control structures") is a lot more than if-then-else and loop.

Quote:
>... if you don't like
>the result you can always stick to branching and labels.

That's not a very useful argument is it?  

I said earlier, I just don't want to see another decade of complacency as a
result of this breakthrough.




Sat, 15 Mar 1997 02:33:05 GMT  
 Good code/bad code & looping
In case it wasn't clear from my original posting, I'm not speaking out
against noniterative APL code, just against the practice of blindly
branding looping code as "bad" and scolding beginners who haven't
learned how to avoid unnecessary looping.

   Bill Chang's examples of dyadic rotate, notequal and other scans, the
vanilla-APL methods of vector-to-matrix conversion, and manipulation of
n-dimensional arrays are all important basic techniques that I think
should be taught to fledgling APLers.  (Even to nested APLers who think
they don't need flat-array techniques.)  These techniques are not what I
would call "outrageously complex".  Modeling the state machine with
partition operations is an example where I think loop avoidance went too
far to be considered reasonable.

   Yes, complex nonlooping solutions are sometimes dictated by
requirements of CPU efficiency.  But the vast majority of code is not
that critical, either because the application runs fast enough that
there's no need to improve it, or because the section of code is not a
CPU bottleneck.  Rarely do I find APL users saying, "The profiler showed
that this routine uses 20% of the execution time.  How can I speed it
up?".  Instead, some APLers seek noniterative solutions for all tasks,
without considering whether such a solution is necessary, worth the
cost, or clear enough to be maintainable.

   I should point out that not all APL programmers do this.  (Certainly
not _me_, ha ha.)  An APLer with enough experience and a healthy
disregard for what purists think of his code can dig right in to a
problem, instinctively lay out loops and ifs for the parts that need it,
and write non-iterative code for the doable tasks within the loops.

   Thinking more about that application with the incomprehensible code
that gives WS FULLs, I realized that although the WS FULLs result from
eliminating too many loops, the incomprehensiblity stems in large part
from a lack of comments.  Here's a concrete example.  A colleague
reported that the following line was giving a WS FULL:

   ANN{<-}{mix}{each}{mix}{each}{mix}{each}{mix}{each}
      {split}{split}{split}{split}1 3 5 4 2{transpose}
      {mix}(({shape}POS),2,NAA,iben){reshape}ANN

   (This is written for APL*PLUS, evolution level 1.  {mix}A is
   {disclose}A in APL2; it removes a level of nesting and puts the
   revealed dimension(s) to the right of existing dimensions.  {split}A
   is {enclose}[{shape}{shape}A]A in APL2; it hides the last dimension,
   sinking it into a new level of nesting.)

The code had no comments describing the structure of ANN.  It turns
out that ANN has dimensions [POS;2,NAA,iben][16].  That is, it's a
nested matrix having a row for each element of POS, and a column for
each element of a raveled 2-by-NAA-by-iben array (three dimensions
raveled into one).  Each item of the matrix is a 16-element vector.  POS
has shape 11, NAA is 86, iben is 7, and the data is floating point, so
the array contains 1.7 megabytes worth of data.  The colleague figured
out that the following loop did the same job, without resorting to small
arrays or getting a WS FULL (and it ran faster to boot):

    I{<-}0
    R{<-}({shape}POS){reshape}0
   L10:{->}(({shape}R)<I{<-}I+1)/L11
    R[I]{<-}{enclose}2 4 3 1{transpose}{mix}(2,NAA,NBEN){reshape}ANN[I;]
    {->}L10
   L11:ANN{<-}R

From the loop, you can easily see that the result is a vector and that
each item is a 4-dimensional array.  It turns out that the goal of the
original statement is to transform the structure of ANN from

             [POS;2,NAA,iben] [16]
to
             [POS] [16;2;iben;NAA]

Too bad there wasn't a comment explaining this, not only to clarify the
original statement, but also to aid in understanding subsequent
statements that use the transformed ANN.  (Yes, I realize that the
splits and mix-eaches can be replaced with {enclose}[2 3 4 5], but this
isn't available in the version of APL*PLUS we're using.)

   Regarding nested blocks and lexically scoped subroutines:  I think
these fall into the category of "making major changes to the language".
Given the lack of agreement among vendors about how to extend APL, I'm
glad STSC didn't try to make these changes unilaterally.  Doing so would
almost certainly widen the gulf between major APL implementations in a
way that would make porting applications much more difficult.  Adding
control structures today in no way rules out the possibility of adding
other enhancements such as these in the future.

                                                Jim
~



Sun, 16 Mar 1997 05:29:07 GMT  
 Good code/bad code & looping

<Much good stuff about sensible use of loops in APL, sacrificed to the
 bandwidth God>

I agree whole heartedly with Jim, and can add an example of my own.  Several
years ago, I had to rescue a fortran program from our mainframe, and convert
it to APL.  The program analyzes the characteristics of microwave transmission
lines, and uses a great deal of numerical integration.  The FORTRAN version
was loop-city, and I was bound and determined to get rid of as many of them
as possible.  I also wanted to be able to run it on reasonably small machines,
and write it in a manner that wasn't too obscure.

I started with all of the simple loops that created various arrays in the
original program.  That was easy.  The main computational section contained
(including sub-routine calls) about 7 nested loops.  By writing a function
to handle each level of nesting as an added array dimension, I didn't have
to mentally wrestle with too many dimensions at once.  The inner-most
function creates a vector.  The next function calls that to create a matrix.
the next one calls that to create a 3-dimensional array, and so on.  Once
everything is set up, the array is 'collapsed' in a similar manner, one
dimension per function.  I eventually managed to consolidate a lot of this
without confusing things too much, but setting up this initial hierarchy was
essential for my sanity.

When the dust cleared, the program worked, but if I divided the integrations
into fine enough increments, the arrays got too big, and I got a WS FULL.
Because everything was 'layered', it was easy to go in and re-install a
loop at a point with a fairly 'shallow' dimension, and then it ran fine.

At the risk of tying two threads together, this also points out why I try
to write 'functions' more than 'programs'.  I view the top level function
as a 'program', because it ties everything together into an operational
unit, and it duplicates the FORTRAN program.  I didn't write _any_ of the
low level functions because I thought I would use them elsewhere.  Some
people believe that re-useability is the only excuse for not wrapping
everything into one huge function.  I wrote it that way because I'm not very
bright.  I CAN't keep all this stuff straight unless I break it into small
blocks.  It also means that I (or anyone else) can come back to it after
a few years, and have no trouble seeing how things work.  I'm sick and
tired of people telling me that APL is a 'write only language', and that it
can't be maintained because it's too cryptic.  One of the most experienced
APL programmers we have here refuses to use comments because he claims he
can always recreate code faster than he can figure out how he did it last
time.  I think this sort of programming is inexcusable, and contributes
to the bad reputation APL has in many quarters.

I will now step down from the soapbox I seem to have inadvertently stepped
onto, and get back to work.

Doug White
MIT Lincoln Laboratory



Sat, 15 Mar 1997 23:19:08 GMT  
 Good code/bad code & looping
Jim Weigang:
: Modeling the state machine with partition operations is an example
: where I think loop avoidance went too far to be considered
: reasonable.

State machine transition is easily expressible in J, using ({ {&M),
where M is the transition matrix.  For example:

or

(start is the starting state -- a row index into M [typically a simple
scalar], and M is typically a matrix of states (row indices into M)).

APL*PLUS/III (or whatever it's called) doesn't provide an easy way of
doing this (you must expose the internal guts of the machine to the
global name space) unless you use CASE.  So, CASE is a way of getting
around the problems of representing constant data in functions.

:     I{<-}0
:     R{<-}({shape}POS){reshape}0
:    L10:{->}(({shape}R)<I{<-}I+1)/L11
:     R[I]{<-}{enclose}2 4 3 1{transpose}{mix}(2,NAA,NBEN){reshape}ANN[I;]
:     {->}L10
:    L11:ANN{<-}R
: From the loop, you can easily see that the result is a vector and
: that each item is a 4-dimensional array.  It turns out that the goal
: of the original statement is to transform the structure of ANN from
:              [POS;2,NAA,iben] [16]
: to
:              [POS] [16;2;iben;NAA]
: Too bad there wasn't a comment explaining this, not only to clarify
: the original statement, but also to aid in understanding subsequent
: statements that use the transformed ANN.  (Yes, I realize that the
: splits and mix-eaches can be replaced with {enclose}[2 3 4 5], but
: this isn't available in the version of APL*PLUS we're using.)

Once again, we see "control structures" being used to work around the
absence of an APL feature.  I suspect this would be even more obvious
had the algorithm been expressed in J.  [Consider the immediate
simplification that could be realized using the rank operator.  Then
consider that this whole business of enclosed arrays is probably a
workaround for not having the rank operator.]

--
Raul D. Miller           n =: p*q             NB. 9<##:##:n [.large prime p, q

                         NB.  public e, n, y
                         x -: n&|&(*&y)^:d 1  NB. 1=(d*e)+.p*&<:q



Tue, 18 Mar 1997 00:16:18 GMT  
 Good code/bad code & looping
Jim Weigang:
: >   The posting goes on to call looping code "bad" and programmers
: >that write such code "lazy".  Nonlooping code is "good, idiomatic,
: >APL".  Personally, I don't think this attitude is at all helpful
: >in trying to increase the use of APL.  In other languages, if a
: >program works correctly and is documented, you're done and you use
: >the program.  If the APL community demands that users to live up
: >to some higher standard of writing "good" nonlooping programs, we
: >are imposing an extra burden not imposed by other languages, and
: >this will turn users away from APL.

Aside from the oversimplified nature of this "nonloopy is good"
statement, I don't think you're doing the APL community justice on
this point.  Basically, the APL community has a stance against code
bloat.

In my opinion, what scares users away is the nonstandard interface
required to program APL.  You can't program in APL using "text".  (Or,
you can to some degree, but there are so many transliteration schemes
that "APL" isn't a meaningful choice.)  Quality control folklore comes
in as a rationalization to avoid this issue.

--
Raul D. Miller           n =: p*q             NB. 9<##:##:n [.large prime p, q

                         NB.  public e, n, y
                         x -: n&|&(*&y)^:d 1  NB. 1=(d*e)+.p*&<:q



Mon, 17 Mar 1997 21:31:58 GMT  
 Good code/bad code & looping
Raul D. Miller:

Quote:

>            ... ([ handler) ...

That's certainly the most elegant way to deal with pesky side effects
I've ever seen in J.  Thanks for that idiom!

                                                Martin Neitzel



Tue, 18 Mar 1997 22:42:52 GMT  
 
 [ 7 post ] 

 Relevant Pages 

1. Good code/bad code & looping

2. Good/Bad code

3. good code, bad hacks: what are the symptoms?

4. Anyone have a good coding style example for coding up FIR FILTER?(EASY METHOD)

5. Dylan coding style & best practices

6. P-code, T-code, and Uni-Code Intermediate Languages

7. P-code, T-code and Uni-Code Intermediate Languages

8. Example of Python code (C code to Python code)

9. good code/good language

10. Imaginary numbers & J: good news/bad news

11. ## Cadence: The Good, The Bad, & The Ugly ##

12. ## Cadence: The Good, The Bad, & The Ugly ##

 

 
Powered by phpBB® Forum Software