Andrew Mullhaupt:

Iverson clearly set out to prove that it was possible to have a

less readable language than APL without using a unique, runic

character set.

Funny, that's what most of the APL programmers who I know say about

_any_ language which doesn't use the APL character set.

Wait a while. As extensible as any language purports to be, as the

years go by, people end up having to extend the language by

reclaiming the syntax errors. If the language survives, the syntax

errors will go away.

I dunno... what kind of extensions did you have in mind?

> delim=: short_seq in not_in esc

> out=: ; transl {~ alph i. delim <;.2 in

Andrew Mullhaupt:

Ah yes. It's all becoming clear to me. This means that you must

have gone wrong somewhere. Should you be using so many alphabetic

characters?

Obviously an invitation for more documentation:

e. is a set membership operator. It checks each item in the left

argument for membership in the set of objects in the right argument.

The result is boolean, and each 1 or 0 corresponds to an element in

the left argument. For example,

'This is a test' e. 'hers'

0 1 0 1 0 0 1 0 0 0 0 1 1 0

Here, the first element of the result is 0 because 'T' does not occur

in 'hers'

-. is logical negation. It just changes 0s to 1s and vice versa.

the result of e. and feeds it into -. So the line that says

defines an boolean operation which returns 1 for each element of the

left argument which is not in the right argument.

Quote:

}. drops an item off the front of an array. For example,

}. 1 2 3 4

2 3 4

, is a generic catenate operation. For example,

2 3 4 , 0

2 3 4 0

& curries an infix operation by fixing one of the arguments. The

result is a prefix operation. Therefore, the part of the code which

says

defines a left shift operation. There are other ways of defining left

shift, but that's not important here.

Quote:

>: is similar to C's >= (in other words, it returns 1 where the right

argument is greater than or equal to the left argument). The reason

Quote:

>: is used instead of >= lies in J's parsing rules -- in J, >: is a

single token while >= is two tokens. If you feed >: boolean

arguments, the result behaves according to this truth table:

right

arg

>: 0 1

left 0 1 0

arg 1 1 1

Next, a {*filter*} sequence of two functions results in a derived function.

If f and g are functions, and x is data,

(f g) x

is equivalent to

x (f g) x

which is equivalent to

x f g x

which is equivalent to

x f (g x)

It's equivalent to other things, but I'll stop here.

Anyways, the function definition

applies to a boolean list and returns a 1 for each 1 in the original.

It also returns a 1 for each 0 in the original which has a 0 to the

right [and, because ,&0 shifts a 0 onto the right end of the list,

you're guaranteed that the rightmost element of the result is a 1].

But, short_seq returns a 0 for each 0 in the argument which has a 1 to

the right. In other words, if 0s in the argument mark each occurance

of an escape character, 0s in the result mark each occurance of an

escape character followed by a non-escape character.

For example:

short_seq 1 1 0 1 0 0 0 1 1 0 1 0 1 0

1 1 0 1 1 1 0 1 1 0 1 0 1 1

Or, if 'in' is a variable holding the text 'ab/c///de/f/g/', and 'esc'

is a variable holding '/', then

delim=: short_deq in not_in esc

will set delim to: 1 1 0 1 1 1 0 1 1 0 1 0 1 1

;. is a functional which will apply a function to each part of a

sequence. If f is a function, n is a number, x is a boolean array

with a 1 indicating delimiting characters, and y is the sequence to be

parsed,

x f;.n y

will apply f to each of the subsequences in y indicated by x. The

number n indicates if delimiters are leading or trailing delimiters,

and whether or not the delimiters are to be seen by f. If n is 2,

delimiters are trailing, and delimiters are seen by f.

< when used as a monadic operation is analogous to & used as a monadic

operation in C. In other words, it returns a reference to an array.

In J, array references have a print representation which consists of a

box drawn around the contents of the array.

So, with the sample I've been using ('in' defined as

'ab/c///de/f/g/'), monadic < is applied to the following sequences:

'a'

'b'

'/c'

'/'

'/'

'/d'

'e'

'/f'

'/g'

'/'

And, the print representation of that is:

+-+-+--+-+-+--+-+--+--+-+

|a|b|/c|/|/|/d|e|/f|/g|/|

+-+-+--+-+-+--+-+--+--+-+

i. is a lookup function, which will look up items in the right

argument which appear in the left argument, and return their indices.

For example, if 'alph' is a list which contains character sequences

(either single character, or escaped characters), such as:

+-+-+-+-+-+-+-+-+-+-+-+--+--+--+--+--+--+--+--+--+--+

|/|a|b|c|d|e|f|g|h|i|j|/a|/b|/c|/d|/e|/f|/g|/h|/i|/j|

+-+-+-+-+-+-+-+-+-+-+-+--+--+--+--+--+--+--+--+--+--+

then

alph i. delim <;.2 in

would have the result

1 2 13 0 0 14 5 16 17 0

Ideally, alph would contain every ascii character, and every escape

sequence, but for test cases that is not necessary.

{ is an indexing function. x{y is analogous to y[x] in C.

~ is a functional which reverses the order of arguments to an

operation. So, a{~b is equivalent to b{a in J, and is similar to a[b]

in C.

So, if transl is a translate table, for instance

+-+-+-+-+-+-+-+-+-+-+-+---+---+---+---+---+---+---+---+---+---+

|/|a|b|c|d|e|f|g|h|i|j|APE|BAT|CAT|DOG|ELF|FLY|GNU|HOT|ILK|JON|

+-+-+-+-+-+-+-+-+-+-+-+---+---+---+---+---+---+---+---+---+---+

then

transl {~ 1 2 13 0 0 14 5 16 17 0

would yield

+-+-+---+-+-+---+-+---+---+-+

|a|b|CAT|/|/|DOG|e|FLY|GNU|/|

+-+-+---+-+-+---+-+---+---+-+

; when used monadically, will take a list of array references and

and catenate those arrays together. For example

; transl {~ 1 2 13 0 0 14 5 16 17 0

yields

abCAT//DOGeFLYGNU/

>To provide full generality, you'd want to replace the definition

>of 'short_seq' with something a little more involved (like a state

>machine with more than two states)...

Yes. Prefereably a universal self-replicating multi-tape Turing

machine with proof of undecidability for every input, which

rewrites itself into the shortest possible J representation and

composes poetry, too.

Um... I don't think that's necessary. All I was trying to say was

that if you want a slash in the result you might want to make it so

that '//' returns a slash. If you don't need that (for instance, if

you want '/s' to return a slash), then the above code would do fine.

[Yes, I recognize Andrew's comments as sarcasm, but I think he's

laying it on a little thick. Anyways, hopefully this will have

cleared up any major questions about that section of code...]

--