Parsing PL/I 
Author Message
 Parsing PL/I

Browsing through some old literature I came across an interesting
article on PL/I compiler construction (see attached).  In particular the
authors enumerated cases where lookahead was required to augment a
straight recursive descent parser.

Since I'm aware of at least two projects to build a PL/I parser I
thought I'd post the syntax exceptions here.  The entire article is
worth reading.

[ pli_parse.txt 5K ]
Abrahams, Paul W. "The CIMS PL/I Compiler"
_Proceedings of the SIGPLAN Symposium on Compiler Construction_
Aug 6-10, 1979 pp. 107-116

Appendix A.

        In this appendix we give examples of those  constructs  in  PL/I  that
cannot be distinguished using an LL(l) grammar, and indicate how those con-
structs are distinguished in CIMS PL/I.  In the examples, BAL  indicates  a
parenthesis-balanced sequence of symbols.  As our only objective is to dis-
tinguish the constructs from each other, we don't  need  to  analyze  their
internal  structure or validate them;  any errors will be caught during the
actual parse.  For each construct, we give examples of sequences  that  are
difficult   to  distinguish;   indicate  the  syntactic  category  of  each
sequence;  and give the method used to distinguish the sequences.

        1.      ON ERROR SYSTEM  ;      /* SYSTEM IS A KEYWORD */
                ON ERROR SYSTEM=3;      /* SYSTEM=3 IS AN ASSIGNMENT */

                   If SYSTEM is followed by ";" it is a keyword;  otherwise not.

        2.      GET:            /*      GET IS A STATEMENT-NAME */
                GET DATA;       /*      GET-STATEMENT */
                GET->X=5;    /*      ASSIGNMENT-STATEMENT */
                GET.X=5;        /*      ASSIGNMENT-STATEMENT */
                GET=5;          /*      ASSIGNMENT-STATEMENT */
                GET,X=5;        /*      ASSIGNMENT-STATEMENT */
                STOP;           /*      STOP-STATEMENT */
                GET(3):         /*      GET IS A STATEMENT-NAME */
                GET(3)->X=5; /*      ASSIGNMENT-STATEMENT */
                GET(3).X=5;     /*      ASSIGNMENT-STATEMENT */
                GET(3)=5;       /*      ASSIGNMENT-STATEMENT */
                GET(3),X=5;     /*      ASSIGNMENT-STATEMENT */
                STOP(3):        /*      STOP IS A STATEMENT-NAME */

In order to distinguish statement keywords,  statement-names,  and
identifiers    in   any   context   other   than   "IF(BAL)="   or
"DECLARE(BAL),", examine the token  T  that  follows  the  initial
token  of the statement.  If T is anything other than "(", the ca-
tegory of T determines whether or not the initial token is a  key-
word,  a statement-name, or an identifier.  If T is "(", then find
the matching ")" and make the determination in  the  same  way  as
after the initial token.

        3.      IF(5)=THEN+THEN;        /* ASSIGNMENT-STATEMENT */
                IF(5)=THEN+THEN THEN;   /* IF-STATEMENT */

In   order   to    distinguish    an    if-statement    from    an
assignment-statement  when  the  statement begins with "IF(BAL)=",
look ";" or for "THEN" not preceded by an operator, ".", ",",
or  "("    If  ";"  is  found  first  then  the  statement  is  an
assignment-statememt;  otherwise it is an if-statement.

        4.      DECLARE(A,B),C(3)=CHAR(12) ;    /* ASSIGNMENT-STATEMENT */
                DECLARE (A,B),C(3) CHAR(12);    /* DECLARE-STATEMENT */
                DECLARE(A,B),C(3)          ;    /* DECLARE-STATEMENT */

In   order   to   distinguish   a   declare-statement   from    an
assignment-statement  when  the statement begins with "DCL(BAL),",
look for ";" or for an integer, identifier, or ")" followed by  an
identifier or ";".  If we see ";" first (and it is not preceded by
an integer, identifier, or ")") then the statement is  an  assign-
ment-statement;  otherwise it is a declare-statement.

        5.      ELSE(SIZE):X=3;         /*      (SIZE) IS A CONDITION-PREFIX */
                ELSE(SIZE)  =3;         /*      ASSIGNMENT TO ELSE(SIZE) */
                ELSE(2)     =3;         /*      ASSIGNMENT TO ELSE(2) */
                ELSE(2):   X=3;         /*      ELSE(2) IS A STATEMENT-NAME */
                ELSE:      X=3;         /*      ELSE IS A STATEMENT-NAME */
                ELSE STOP;              /*      ELSE IS A KEYWORD */
                ELSE ;                  /*      ELSE IS A KEYWORD */
                ON ERROR        SNAP  ; /*      SNAP IS A KEYWORD */
                ON ERROR        SNAP=5; /*      ASSIGNMENT TO SNAP */

"SNAP" following an on-statement behaves like "ELSE".   If  "ELSE"
is  followed  by  ":",  then  the  "ELSE" is a statement-name.  If
"ELSE" is followed by a token other than  "("  or  ":",  then  the
"ELSE" is a keyword.  If the "ELSE" is followed by "(", then clas-
sify "ELSE" according to (1) whether the material up to the match-
ing  ")"  contains  an identifier, and (2) whether the token after
the matching ")" is ":".

        6.      DO WHILE(P=0);  /* DO WHILE */
                DO WHILE(P)=0;  /* DO WITH CONTROL VARIABLE WHILE(P) */

Find the ")" matching the "(" after "WHILE".  If the next token is
";"  the  statement  is  a do-while;  otherwise the statement is a
do-statement with control variable.

        7.      PUT LIST((A(I) DO 1=1 TO 10));  /* DO WITH ITERATED LIST */
                PUT LIST((A(I)) );              /* LIST WITH SINGLE ITEM (A(I)) */

To determine whether a "(" in a data-list starts  a  parenthesized
item or an iteration-list, look for "DO" not preceded by ".", ",",
"(", or an operator, and not within nested parentheses.  If such a
"DO"  is  found before the matching ")" is found, the original "("
starts an iteration-list, and otherwise it is part of a  parenthe-
sized item.

        8.      DCL A(50) INITIAL       ((N+6),12);     /*      (N+6) IS A SINGLE ITEM */
                DCL A(50) INITIAL       ((N+6));        /*      (N+6) IS A SINGLE ITEM */
                DCL A(50) INITIAL       ((N+6)3);       /*      N+6 ITERATES A SINGLE ITEM */
                DCL A(50) INITIAL       ((N+6)-3);      /*      N+6 ITERATES A SINGLE ITEM */
                DCL A(50) INITIAL       ((N+6)*);       /*      N+6 ITERATES A SINGLE ITEM */
                DCL A(50) INITIAL       ((N+6)J);       /*      N+6 ITERATES A SINGLE ITEM */
                DCL A(50) INITIAL       ((6) 'A');      /*      6 IS A STRING REPETITION FACTOR */
                DCL A(50) INITIAL       ((6)(3)'A');    /*      6 ITERATES A SINGLE ITEM */
                DCL A(50) INITIAL       ((6)(3,'A'));   /*      6 ITERATES A LIST */

To classify the "(" at the beginning of an  "INITIAL"  list,  find
the  matching ")" and look at the following token T.  If T is any-
thing other than "(", we can classify immediately.  If T  is  "(',
then  see  if  T  is followed immediately by the sequence integer,
")", string.  If so, the first "("  introduces  iteration  over  a
single item;  otherwise it introduces iteration over a list.

Sun, 22 Dec 2002 03:00:00 GMT  
 [ 1 post ] 

 Relevant Pages 

1. PL/I parse code now available

2. Logo Komeniusz PL (Logo Comenius PL)

3. Derivation of PL/I (was Usenet group for PL/M language)

4. Mapping local files to FILE declarations in PL/I with IBM VisualAge PL./I for Windows

5. Difference PL/1 PL/I

6. What is the difference between DEC PL/1 and OS/390 PL/1

7. Initialization Expressions in PL/I (was ANSI PL/I)

8. Migrating from OS/VS PL/I to VA PL/I

9. PL/I FAQ -- Frequently asked questions about PL/I (minor update)

10. The PL/I Connection (PL/I Newsletter No. 6)

11. How VOS PL/I tames PL/I

12. PL1 JOBS PL/I Jobs PL/1 JOBS


Powered by phpBB® Forum Software