Parsing 
Author Message
 Parsing


Alan Graham writes on Monday, April 1:

Quote:
> The "function" of the blank is crap.
> In APL2, A(B) is a two-item vector if A and B are data.
> In J, F(G) is a Fork if A and B are functions.

"Fork" is the case of three functions juxtaposed in isolation,
not two; two functions juxtaposed is a "hook".  One can indeed
make hook a conjunction and introduce a symbol for it; however,
assigning an interpretation to a train of two functions,
together with fork (and a train of one function being just
the function), permit a train of any length to be interpreted.

Quote:
> It's NOT the blank, it's what are the semantics of justaposed words?
> Consider two words (tokens) without an explicit "operation" between
> them as in A B

> APL\360
> 1. function application when A is a monadic function and B is data
> 2. two-item vector when A and B are numeric constants

> APL2
> 1. function application
> 2. two-item vector when A and B are data (notice simpler than APL\360)

> Dyalog APL
> 1. & 2. (same as APL2)
> 3. Derived function when A is a function or data and B is an operator

> J
> 1. & 2. (same as APL\360)
> 3. (same as Dyalog APL)
> 4. Fork when A and B are functions

> It looks like J has the most definitions for the juxtaposition of a
> pair of words.  ...

How are rules counted?  If rule 1 in APL2, "function application",
suffices to handle both (function data) and (function operator),
it must necessarily be a more complex rule than rule 1 in
Dyalog APL, which apparently requires rule 3 to handle separately
the case (function operator).  Moreover, how complex are the objects
that these rules manipulate and produce?  If arbitrarily complex
objects are permitted, _one_ rule would suffice.

The J parsing rules are specified as a 10 by 5 table in section
II E of the dictionary.  The table is taken directly from the
interpreter; that is, these really are the rules, and not the
rules as you wish they were.  The table, pasted into here from
the C source, reads as follows.  (The dictionary uses a bold font
to encode the numeric triplets, reducing the number of columns to 5.)

#define AVN     (     ADV+VERB+NOUN)
#define CAVN    (CONJ+ADV+VERB+NOUN)
#define EDGE    (MARK+ASGN+LPAR)

PT cases[] = {
 EDGE,      VERB,      NOUN, ANY,       monad,   1,2,1,
 EDGE+AVN,  VERB,      VERB, NOUN,      monad,   2,3,2,
 EDGE+AVN,  NOUN,      VERB, NOUN,      dyad,    1,3,2,
 EDGE+AVN,  VERB+NOUN, ADV,  ANY,       adv,     1,2,1,
 EDGE+AVN,  VERB+NOUN, CONJ, VERB+NOUN, conj,    1,3,1,
 EDGE+AVN,  VERB,      VERB, VERB,      trident, 1,3,1,
 EDGE,      CAVN,      CAVN, CAVN,      trident, 1,3,1,
 EDGE,      CAVN,      CAVN, ANY,       bident,  1,2,1,
 NAME+NOUN, ASGN,      CAVN, ANY,       is,      0,2,1,
 LPAR,      CAVN,      RPAR, ANY,       punc,    0,2,0,

Quote:
};

The J interpreter is built around this table.

A sentence is placed on a queue and is parsed from right to left;
as parsing proceeds, successive words are moved from the queue onto
a stack.  Eligibility for execution is defined by the above rules,
and is completely determined by the classes of the first four words
on the stack:  The classes are compared with the first four columns
of the table, and the first row that agrees in all four columns
is selected.  The words indicated by the numeric triplet are subject
to the action shown in column 4 (monad, dyad, etc.), and are replaced
by the result.  If no row is satisfied, the next word is transferred
from the queue.

As can be surmised from the table, the words being manipulated
are nouns, verbs, adverbs, and conjunctions (these can be
produced by the actions), and left parenthesis, right parenthesis,
copula (ASGN), name, and end-of-sentence (MARK) (these can not
be produced by the actions).  Since eligibility for execution
depends _only_ on the classes of the words, these objects are
honest-to-goodness APL/J objects, the same nouns, verbs, adverbs,
and conjunctions available to the user.


"APL & J one more time", posted on March 12, the J parsing rules
are the APL\360 parsing rules with the following modifications
and extensions:

- The anomalous [] and [;] are removed.
- The anomalous {jot} is removed.
- The anomalous niladic functions are removed.
- Function assignment is permitted.  Since this replaces what
  would be an error in APL\360, it is a consistent extension.
- Indirect assignment is permitted.  Since this replaces what
  would be an error in APL\360, it is a consistent extension.
- The trident and bident rules are added, implementing forks
  and hooks.  Since these replace what would be errors in APL\360,
  they are consistent extensions.

Quote:
> ... The rule that a vector of numeric constants is one word
> is stange.

Not only is this not "stange", but it is the treatment in APL\360.
This was so even in the early days, when vector constants were written
(1,2,3,4) instead of 1 2 3 4, but Larry Breed nevertheless took
pains to treat the vector as a single object.  

Moreover, if you treat 1 2 3 4 as four separate words, do you also treat
'abcd' as four separate words (or is it six)?  If not, why is 'abcd',
a 4-element character constant, syntactically different from 1 2 3 4,
a 4-element numeric constant?  If so ('abcd' as separate words),
I would be interested in how you treat the quote.  And if you really
do treat 1 2 3 4 as four separate words, I would be interested in
seeing the timing results from your parser for 1 ... 100000.

Quote:
> The juxtaposition rules for Zero are the same as for Dyalog APL,
> e.g. F G is a SYNTAX ERROR.

Actually, F G in Dyalog APL is not a syntax error but the function

is hook.)  At least, this was the case in 1995, when that was the
answer given in this forum.

Quote:
> I have nothing against Fork, I would define an explicit symbol for
> Fork. (J is the missing fork ... there must be a joke in there
> somewhere.)

I would be interested in seeing such a definition, and in seeing how
phrases such as  mean=:+/ * #  would look in your definition of fork.

Quote:
> I have nothing against an explicit Link function ( link:{_y _x}. ).
> It is so easy to make. I wonder why it should be primitive.


being so easily defined, should not be primitive.  But why stop there?  
Why not have only "nand" (or "nor") as the one primitive, and define
everything else in terms of it.

Quote:
> It feels like 1982 again. ;-)

Since 1982, a number of papers and books have been published that
shed much light on this topic:

K.E. Iverson, Rationalized APL, 1983 1 6.
K.E. Iverson, A Dictionary of APL, Quote-Quad, 1987 9.
K.E. Iverson and E.E. McDonnell, Phrasal Forms, APL89, 1989 8.
K.E. Iverson, J Introduction and Dictionary, 1996.

In particular, Table 2 on page 38 of "A Dictionary of APL"
presents a model of the parsing process; you may find it instructive
to enter these APL functions and try them on a few examples.



Sun, 20 Sep 1998 03:00:00 GMT  
 Parsing

Quote:

> - The anomalous [] and [;] are removed.
> - The anomalous {jot} is removed.
> - The anomalous niladic functions are removed.

For a simple user I think these characteristics of J as differences
from APL are the the most obvious and some people have a hard time
get adjusted to and understand.

Once you get past this barrier of understanding these three items you
will really begin to love J and you can not understand how you could
stand these things in APL.

[ and ] are heavily used in APL to get at individual items of data
and of course [;] to index into a matrix.

Because they are not there in J you will have to learn how to do things
the J way if you want to use J.

Niladic functions are a more obvious blessing.

In J typing the name of a verb simply shows you its contents.

In order for a verb to run you have to assign it an argument.

It is enough to give it an empty right argument and you do not have
to use it in your verb at all.

The names of the arguments you pass to the verb are always x. for a
left argument and y. for a right argument.

Because x. and or y. can be multidimensional boxed arrays that is not
a limitation at all.

Inside a verb you can also read and assign a noun globally by using =:

If you only use =. then the noun is assigned locally and does not exist
when you exit a verb. There is no need to declare a noun local as you
need to in APL.

I would say that the benefits of J above APL are endless but they are
many. And I see it more and more clearly every day.

Which proves that you do not need to be an expert to learn and love J
Even a person like me can.

--

http://www2.simi.is/~gosi
http://www.jsoftware.com



Tue, 22 Sep 1998 03:00:00 GMT  
 Parsing
Bjorn Helgason writes on Friday, April 5:

Quote:

> > - The anomalous [] and [;] are removed.
> > - The anomalous {jot} is removed.
> > - The anomalous niladic functions are removed.

> For a simple user I think these characteristics of J as differences
> from APL are the the most obvious and some people have a hard time
> get adjusted to and understand.

> Once you get past this barrier of understanding these three items you
> will really begin to love J and you can not understand how you could
> stand these things in APL.

> [ and ] are heavily used in APL to get at individual items of data
> and of course [;] to index into a matrix.  ...

Actually, I meant that the anomalous [] axis, the [;] indexing,
the (;) "mixed output", are removed, but I see that I didn't say
it clearly or completely enough.  The case between these anomalous
constructs, [ ] axis, [;] indexing, (;) mixed output, {jot},
and niladic functions, and their replacements in J, is more finely
balanced than depicted in your article.


Tue, 22 Sep 1998 03:00:00 GMT  
 Parsing

Roger Hui writes on Wednesday, April 3:

Quote:
>"Fork" is the case of three functions juxtaposed in isolation,
>not two; two functions juxtaposed is a "hook".  One can indeed
>make hook a conjunction and introduce a symbol for it; however,
>assigning an interpretation to a train of two functions,
>together with fork (and a train of one function being just
>the function), permit a train of any length to be interpreted.

Thanks for the correction.
I had forgotten that there are two function "strand" interpretations:
Hook: A B and Fork: A B C.
So, the interpretation of function strands A B C D is 1-Hook and 1-Fork.
Is it A B (C D), or (A B) C D, or A (B C) D?

Quote:
>> It's NOT the blank, it's what are the semantics of justaposed words?
>> Consider two words (tokens) without an explicit "operation" between
>> them as in A B

>> APL\360
>> 1. function application when A is a monadic function and B is data
>> 2. two-item vector when A and B are numeric constants

>> APL2
>> 1. function application
>> 2. two-item vector when A and B are data (notice simpler than APL\360)

>> Dyalog APL
>> 1. & 2. (same as APL2)
>> 3. Derived function when A is a function or data and B is an operator

>> J
>> 1. & 2. (same as APL\360)
>> 3. (same as Dyalog APL)
>> 4. Fork when A and B are functions

>> It looks like J has the most definitions for the juxtaposition of a
>> pair of words.  ...

>How are rules counted?  If rule 1 in APL2, "function application",
>suffices to handle both (function data) and (function operator),
>it must necessarily be a more complex rule than rule 1 in
>Dyalog APL, which apparently requires rule 3 to handle separately
>the case (function operator).  Moreover, how complex are the objects
>that these rules manipulate and produce?  If arbitrarily complex
>objects are permitted, _one_ rule would suffice.

Because APL2 has not implemented general F G (as in +/) I don't count it.
If you type +/ alone or try to assign it, you get a SYNTAX ERROR.

Dyalog APL (a better APL2) has a general F G, so I count it.

- Show quoted text -

Quote:
>The J parsing rules are specified as a 10 by 5 table in section
>II E of the dictionary.  The table is taken directly from the
>interpreter; that is, these really are the rules, and not the
>rules as you wish they were.  The table, pasted into here from
>the C source, reads as follows.  (The dictionary uses a bold font
>to encode the numeric triplets, reducing the number of columns to 5.)

>#define AVN     (     ADV+VERB+NOUN)
>#define CAVN    (CONJ+ADV+VERB+NOUN)
>#define EDGE    (MARK+ASGN+LPAR)

>PT cases[] = {
> EDGE,      VERB,      NOUN, ANY,       monad,   1,2,1,
> EDGE+AVN,  VERB,      VERB, NOUN,      monad,   2,3,2,
> EDGE+AVN,  NOUN,      VERB, NOUN,      dyad,    1,3,2,
> EDGE+AVN,  VERB+NOUN, ADV,  ANY,       adv,     1,2,1,
> EDGE+AVN,  VERB+NOUN, CONJ, VERB+NOUN, conj,    1,3,1,
> EDGE+AVN,  VERB,      VERB, VERB,      trident, 1,3,1,
> EDGE,      CAVN,      CAVN, CAVN,      trident, 1,3,1,
> EDGE,      CAVN,      CAVN, ANY,       bident,  1,2,1,
> NAME+NOUN, ASGN,      CAVN, ANY,       is,      0,2,1,
> LPAR,      CAVN,      RPAR, ANY,       punc,    0,2,0,
>}>;

>The J interpreter is built around this table.

>A sentence is placed on a queue and is parsed from right to left;
>as parsing proceeds, successive words are moved from the queue onto
>a stack.  Eligibility for execution is defined by the above rules,
>and is completely determined by the classes of the first four words
>on the stack:  The classes are compared with the first four columns
>of the table, and the first row that agrees in all four columns
>is selected.  The words indicated by the numeric triplet are subject
>to the action shown in column 4 (monad, dyad, etc.), and are replaced
>by the result.  If no row is satisfied, the next word is transferred
>from the queue.

>As can be surmised from the table, the words being manipulated
>are nouns, verbs, adverbs, and conjunctions (these can be
>produced by the actions), and left parenthesis, right parenthesis,
>copula (ASGN), name, and end-of-sentence (MARK) (these can not
>be produced by the actions).  Since eligibility for execution
>depends _only_ on the classes of the words, these objects are
>honest-to-goodness APL/J objects, the same nouns, verbs, adverbs,
>and conjunctions available to the user.

I appreciate that you give us the rules as code rather than as just
English.
It would be better if we had J code to describe J instead of assembly
language.
(C is my favorite semi-portable assembly language!)
This 10 by 5 table is way too complex.
The rules for APL\360 and APL2 are simple once you remove brackets and
jot.
J is worse with the addition of function strands.

- Show quoted text -

Quote:

>"APL & J one more time", posted on March 12, the J parsing rules
>are the APL\360 parsing rules with the following modifications
>and extensions:

>- The anomalous [] and [;] are removed.
>- The anomalous {jot} is removed.
>- The anomalous niladic functions are removed.
>- Function assignment is permitted.  Since this replaces what
>  would be an error in APL\360, it is a consistent extension.
>- Indirect assignment is permitted.  Since this replaces what
>  would be an error in APL\360, it is a consistent extension.
>- The trident and bident rules are added, implementing forks
>  and hooks.  Since these replace what would be errors in APL\360,
>  they are consistent extensions.

>> ... The rule that a vector of numeric constants is one word
>> is stange.

>Not only is this not "stange", but it is the treatment in APL\360.
>This was so even in the early days, when vector constants were written
>(1,2,3,4) instead of 1 2 3 4, but Larry Breed nevertheless took
>pains to treat the vector as a single object.  

It's too bad about Larry's added special case for numeric vector
constants.
Jim Brown's generalisation of this to Vector Notation is nice: it also
simplifies the rules.
Unfortunately, it also confuses some very bright people (like Bjorn and
Roger).
Beginners have no problem understanding that 1 2 3 is the same as 1 (1+1)
3
or even 1(1+1)3.
APL2 even generalizes A[I] B[J] C[K] to be the expected answer.

Quote:
>Moreover, if you treat 1 2 3 4 as four separate words, do you also treat
>'abcd' as four separate words (or is it six)?  If not, why is 'abcd',
>a 4-element character constant, syntactically different from 1 2 3 4,
>a 4-element numeric constant?  If so ('abcd' as separate words),
>I would be interested in how you treat the quote.  And if you really
>do treat 1 2 3 4 as four separate words, I would be interested in
>seeing the timing results from your parser for 1 ... 100000.

The "explanation" that 1 2 3 is a single word, works but is very hard for
humans to swallow.
Come on it's THREE words, the numbers 1, 2, and 3.
The interpreter recognizes the special case of a numeric vector constant
for performance reasons.
This special case is even in APL2 (mainframe at least).
Give me a break Roger; 'abcd' is one word. Any quoted string is one word.

Quote:
>> The juxtaposition rules for Zero are the same as for Dyalog APL,
>> e.g. F G is a SYNTAX ERROR.

>Actually, F G in Dyalog APL is not a syntax error but the function

>is hook.)  At least, this was the case in 1995, when that was the
>answer given in this forum.

You are right. Zero does not have function strands of any length.
Zero does have Hook (just like Dyalog APL) as F & G.

Quote:
>> I have nothing against Fork, I would define an explicit symbol for
>> Fork. (J is the missing fork ... there must be a joke in there
>> somewhere.)

>I would be interested in seeing such a definition, and in seeing how
>phrases such as  mean=:+/ * #  would look in your definition of fork.

No problem:
   mean:+/ tine * fork #
If you defined the operators "fork" and "tine", where tine builds the
operand to fork.

Quote:
>> I have nothing against an explicit Link function ( link:{_y _x}. ).
>> It is so easy to make. I wonder why it should be primitive.


>being so easily defined, should not be primitive.  But why stop there?  
>Why not have only "nand" (or "nor") as the one primitive, and define
>everything else in terms of it.

No, I do not take an extreme minimalistic approach --- or I'd like Pascal
that does not even have power!
J and original APL\360 have too many primitives. To understand and debug
an APL (or J) program written by
someone else, I need to know Domino, Inner, and Outer products even if I
don't care about Linear Algebra.
Yet the darn language (APL) doesn't even have a simple IF which every
programmer on the planet needs!
Thanks for putting it into J.
How many primitives are right? Just enough and no more!

Quote:
>> It feels like 1982 again. ;-)
>Since 1982, a number of papers and books have been published that
>shed much light on this topic:

>K.E. Iverson, Rationalized APL, 1983 1 6.
>K.E. Iverson, A Dictionary of APL, Quote-Quad, 1987 9.
>K.E. Iverson and E.E. McDonnell, Phrasal Forms, APL89, 1989 8.
>K.E. Iverson, J Introduction and Dictionary, 1996.

>In particular, Table 2 on page 38 of "A Dictionary of APL"
>presents a model of the parsing process; you may find it instructive
>to enter these APL functions and try them on a few examples.

And very nice papers they are. I've read them all.
Write your J parser in J in a few simple lines and show us the simple
rules.

Alan Graham



Thu, 24 Sep 1998 04:00:00 GMT  
 
 [ 4 post ] 

 Relevant Pages 

1. Confused by 'PARSE ARG' or 'PARSE VAR'

2. Odd ,,Parse error: parse error, unexpected T_VARIABLE''

3. Need Objects that Parse RTF files!!!

4. Problem parsing XML file

5. Compiler in 5.1 doesn't parse as before

6. Parsing Verbs

7. Parsing and Tracing J functions

8. APL Transliterations (and parsing)

9. j parsing in 2.9

10. MVS parsing and scanning

11. how to construct OLE interface classes OR html parsing

12. goodies parsing .ini file?

 

 
Powered by phpBB® Forum Software