Puzzler: Comparing Functions
Author Message
Puzzler: Comparing Functions

Here's the "Limbering Up" puzzler from the 1994 Quarter 3 Zark APL Tutor
News.  Unlike last quarter's Numbers-to-Words problem, this one is
nontrivial, to say the least!  By the way, two of the four published
solutions to the Numbers-to-Words problem were posted on comp.lang.apl
(mine and Ray Powell's).

*       *       *       *

When maintaining a large APL application, you will occasionally find
yourself face-to-face with two copies of the same function.  Yet the
copies are not the same.  Perhaps two different programmers enhanced the
same function in different ways.  You would like to know how the
functions are different.

The first objective is to bring the two functions together so they can
be compared.  Yet, if you copy both functions into the workspace, only
one will survive (since they have the same name!)  To compare the
functions you must convert them to character array representations via
#CR (canonical representation) or #VR (visual representation).  For
example:

)COPY WS1 MODEL
SAVED...
CR1{<-}#CR'MODEL'
)COPY WS2 MODEL
SAVED...
CR2{<-}#CR'MODEL'

Now you have two character matrices that "look like" the function.  Your
task is to write a function CRCOMARE (or VRCOMPARE if you're using #VR)
that returns/displays the differences between the functions, like so:

CR1 CRCOMPARE CR2
[1]  A+B+C
[2]  D{<-}3
---- replaced by ----
[1]  A+B+C & D{<-}3

[5]  A{<-}B{<-}0 0{reshape}0

[6]  A{<-}0 0{reshape}0
[7]  B{<-}0 0{reshape}0
---- deleted ----

[9]  Z{<-}0.01{times}D
---- replaced by ----
[7]  Z{<-}D{divide}100

If using #VR, make sure you ignore the line numbers, which are part of
the visual representation, when comparing the old and new functions.
Otherwise, if a line is inserted between [6] and [6], each line in the
new function from [7] up will appear different because of its new line
number.

If using #CR, make sure you display the line numbers with the modified
lines of code.

Mail your solution on paper or APL*PLUS PC or APL*PLUS II disk to:

Zark Incorporated
23 Ketchbrook Lane
Ellington, CT   06029
U.S.A.

The notable functions and their authors' names will be printed in the
next issue of the Zark APL Tutor News.  Entries must be received by
October 15, 1994.  Good luck and happy limbering.

Fri, 14 Mar 1997 06:15:16 GMT
Puzzler: Comparing Functions

Quote:

>Here's the "Limbering Up" puzzler from the 1994 Quarter 3 Zark APL Tutor
>News.  Unlike last quarter's Numbers-to-Words problem, this one is
>nontrivial, to say the least!

This is the "diff" utility in Unix, and coincidentally the solution is a
dynamic programming algorithm _very_ similar to the "approximate string
matching" code I just reposted.  The differences are (1) boundary conditions;
(2) equality of lines instead of equality of symbols; and (3) saving the
rows and back pointers so the optimal edit sequence (diff) can be recovered.
(Richard Levine said he added ASM to the Toronto Toolkit for this very
application but I don't think a new version of TT ever came out...)

Sat, 15 Mar 1997 03:05:18 GMT
Puzzler: Comparing Functions

Quote:

>>Here's the "Limbering Up" puzzler from the 1994 Quarter 3 Zark APL Tutor
>>News.  Unlike last quarter's Numbers-to-Words problem, this one is
>>nontrivial, to say the least!
>This is the "diff" utility in Unix, and coincidentally the solution is a
>dynamic programming algorithm _very_ similar to the "approximate string
>matching" code I just reposted.  The differences are (1) boundary conditions;
>(2) equality of lines instead of equality of symbols; and (3) saving the
>rows and back pointers so the optimal edit sequence (diff) can be recovered.
>(Richard Levine said he added ASM to the Toronto Toolkit for this very
>application but I don't think a new version of TT ever came out...)

The puzzler is indeed best solved by using the Unix diff algorithm (or
similar) which is quite complex.  However, it can also be solved in a
non-optimal fashion with a simpler step-through comparison procedure.  I
think perhaps that's the method they expected people to use (although it
has a tendency to produce large difference reports with certain coding
styles).

I'm more interested in the diff algorithm.  I've written my own passable
version of same in APL2 and wrapped a very nice function comparison and
formatting routine around it.  However, I've had to deal with some strange
situations that have come up with it.  Since you mentioned this, you
appear to have or know about other APL diff implementations.  I would like
to review other implementations and compare it to my own and see if I can
come up with some improvements.  Can you perhaps point me to somewhere I
might be able to find some source code?

I got a copy of your approximate string matching algorithm but haven't had
a chance to review it yet.  I'm hoping to see some interesting stuff in
there (though the code looks impressively short for a runnable version).

TIA

Davin Church

Thu, 20 Mar 1997 02:43:48 GMT
Puzzler: Comparing Functions
After years of using a very crude comparison program (which just showed
lines in version 1 not found in version 2 and vice versa), a friend sent
me a paper by Webb Miller and Eugene Myers entitled "A File Comparison
Program", from _Software--Practice and Experience, Vol. 15(11), pp
1025-1040 (Nov 1985)_.  Here's the abstract:

This paper presents a simple method for computing a shortest
sequence of insertion and deletion commands that converts one given
file to another.  The method is particularly efficient when the
difference between the two files is small compared to the files'
lengths.  In experiments performed on typical files, the program
often ran four times faster than the UNIX _diff_ command.

Notice that the output from this program is the (or one of the)
_shortest_ possible sequence of insert/delete commands.  No hokey edge
conditions here; this is the real stuff!  I translated the published C
program to APL and used it directly for a while.  Later, I recoded the
iterative core of the program in 80386 assembler, and it now runs nice
and fast.

The output format is different from what Zark is asking for in their
puzzler.  Here's a sample of the format I settled on:

-[15]  old line deleted from version 1
-[16]  another deleted line
+[15]  replacement line in version 2
----------
-[30]  deleted
----------

A "-" in the first column indicates a line found only in the first
version; a "+" marks a line from the second version.  Horizontal lines
separate groups of changes, indicating unchanged lines common to the two
versions.

I've used this program a lot for a couple of years, and I'm generally
pleased with the results.  There are times, however, when it is hard to
tell what has changed when looking at only the original version and the
difference.  (As happens when there are more than two versions of a
program and you're trying to merge multiple differences into a single
version.)  In these situations, I would have found it helpful for the
program to display not just the changed lines, but also a few lines
above and below the changed lines to give some idea of the context.
This could be done pretty easily, as the output formatting code is just
simple APL.

The pre-assembler version of the program is listed below.  As usual,
it's written in plain vanilla APL [except for a few diamonds (&) and a
dyadic grade to sort a character matrix in MATIOTA], so everyone should
be able to run it.  By the way, use an assembler version of MATIOTA (aka
ROWFIND) if you have one.  #IO is assumed to be 1.  APL*PLUS and APL2
users can translate the listing below back to APL symbols using the
APLASCII workspace available via ftp from watserv1.waterloo.edu.

I checked with Gene Myers, and he says that the C program is
the GNU diff program.

Jim

{del}X MMCOMP Y;A;B;C;D;DL;I;K;KS;L;LD;M;N;O;P;Q;R;T;U;CHG;CHGi;CHGn;OK

[6]    DL{<-}X OVER Y

+}lines

[9]    B{<-}DL MATIOTA Y
[10]   M{<-}''{reshape}{shape}A & N{<-}''{reshape}{shape}B

+}identical lines at end

[13]   KS{<-}LD{<-}(2+M+N){reshape}0

[17]   CHG{<-}1000 3{reshape}1{take}0 {neg}1 & CHGn{<-}''{reshape}{shape}{+
+}CHG & CHGi{<-}0

+}last identical line
[31]   LD[O]{<-}R
[32]   L{<-}({neg}1 1)[1+R=M]
[33]   U{<-}(1 {neg}1)[1+R=N]
[34]   D{<-}0
[35]   {->}(~L>U)/L1
[36]   #{<-}'No differences.'
[37]   {->}0
[38]

+}---

+}version:

[43]   K{<-}L-2 & OK{<-}O+K

[46]

[48]   Q{<-}LD[OK+1]
[49]   {->}(K=-D)/L3
[50]   P{<-}LD[OK-1]
[51]   {->}((K=D){or}Q<P)/L4
[52]  L3:R{<-}Q+1
[53]   MMAPPEND(-R),0,KS[OK+1]
[54]   {->}L5
[55]  L4:R{<-}P
[56]   MMAPPEND R,(R+K),KS[OK-1]
[57]  L5:KS[OK]{<-}CHGi
[58]   C{<-}R+K
[59]  L6:{->}((R{>=}M){or}C{>=}N)/L7
[60]   {->}(A[P{<-}R+1]{/=}B[Q{<-}C+1])/L7
[61]   R{<-}P & C{<-}Q
[62]   {->}L6
[63]  L7:LD[OK]{<-}R
[64]

[66]   {->}(R{/=}M)/L8 & L{<-}K+2
[67]  L8:{->}(C{/=}N)/L2 & U{<-}K-2
[68]   {->}L2
[69]  L9:L{<-}L-1 & U{<-}U+1
[70]   {->}L1

+}---
[74]

[76]  L10:I{<-}KS[OK]

[84]

[86]   P{<-}(|CHG[I;1])+{neg}1{times}0>CHG[I;1]
[87]  L12:{->}(P{>=}(|CHG[I;1])+{neg}1{times}0>CHG[I;1])/L13

[89]  L13:P{<-}|CHG[I;1]
[90]   {->}(CHG[I;1]<0)/L14
[91]   #{<-}DTB'+',(6{take}'[',({format}{neg}1+CHG[I;2]),']'),Y[CHG[I;2];]
[92]   {->}L15
[93]  L14:#{<-}DTB'-',(6{take}'[',({format}{neg}1+-CHG[I;1]),']'),X[-CHG[I;{+
+}1];]

[95]   {->}(I{/=}0)/L12
[96]

{del}

{del}MMAPPEND A

[4]    CHGn{<-}''{reshape}{shape}CHG
[5]   L1:CHG[CHGi;]{<-}A
{del}

{del}Z{<-}A OVER B;S

[2]    A{<-}(S{<-}{neg}2{take}1 1,{shape}A){reshape}A
[3]    S{<-}0 1{times}S{max}{shape}B{<-}({neg}2{take}1 1,{shape}B){reshape}B
[4]    Z{<-}((S{max}{shape}A){take}A),[#IO](S{max}{shape}B){take}B
{del}

{del}Z{<-}A MATIOTA B;C;F;N;P;R;T

+}different row

+}was a vector

{del}

{del}Z{<-}DTB A;B

[2]    {->}(2>{shape}{shape}B{<-}A{/=}' ')/L1
[3]    B{<-}{or}{slashbar}B
[4]   L1:Z{<-}({reverse}{or}\{reverse}B)/A
{del}

Sun, 23 Mar 1997 09:19:32 GMT
Puzzler: Comparing Functions
Many thanks to Jim Wiegang for publishing his APL code.
I have not done this for a while, but I
used to (1) write out two ascii transliteration files, (2) run 'diff',
(3) read the result back into APL, and (4) convert the contents back to APL
symbols.  For large functions (the kind John R Clark does not like)
this was the fastest way at my disposal.  For smaller functions
it was quicker to do it all in APL.  I think that my DIFF function
is not as good as the one Jim produced, but was workable for small
matrices.

Lee

--
Prof. Leroy J.{*filter*}ey, Faculty of Mathematics, U of Waterloo, Canada  N2L 3G1

...!uunet!watmath!ljdickey
1-519-888-4567, ext 5559

Tue, 01 Apr 1997 05:40:53 GMT

 Page 1 of 1 [ 6 post ]

Relevant Pages