On 21 Dec 2000 17:46:17 +0100, Yoann Padioleau
Quote:
>> PCCTS with ANTLR ( http://www.antlr.org/)
>> Its easy to use und good to controll. Much better than lex and yac.
>Why pccts/antlr is better than yacc ?
I have tried to use lex and yacc on my special kind of language and
got nothing than problems. Its was realy hard to write rules so that
yacc could parse the language without errors. Error handling was eaven
worse. How do you tell the user that he has to recompile and recompile
his code to find only one syntax error per compilation since yacc
makes it hard to do it otherwise. And error messages like 'I have
found an error somewhere near line 354' dont make error finding
easier. And if you solve this problem, you only have an parser, not an
compiler. Now you must define actions for the rules. But the rules
dont lool like what you have thought of when constructing your
language since for yacc they need to be heavily addapted.
So I tried to wrote my own parser generator and compiler, but this did
not work either. Some day I found pccts and tried it. It took my 2
days (!) to write the lexer and parser to recognize my language. After
a week i could construct a usable syntax tree and had fully features
syntax error handling (line, column, expected rule and found token).
After 3 weeks I had adapted the code generation of my hand written
compiler. And the generated compiler is fast. Now it takes 4 secs on a
200MHz processor to compile 600kb of code. Thats why I think pccts is
much better than yacc.
Here some more infos about pccts:
pccts can generate code for different programming languages. I have
only used the c/c++ version.
With antlr you describe lexer and parser. Token declaration use
regular expressions. For each token you can define an action that is
executed if the token is recognized. With a set of special lexer
functions you can controll the lexer behaviour. There functions to
track lines and collumns, skip a token, append or replace recognized
text and so on. Here a example:
#token ID "[a-zA-Z][a-zA-Z0-9]*"
#token "//(~[\n \r])* (\n\r | \n | \r\n | \r)"
<< skip(); newline(); set_begcol(0); set_endcol(0); >>
Its possible to define your own functions within the grammar file and
call it from parser actions. On recognition errors special functions
are called for further error processing. Normaly just dumping errors
to stderr they can be overwritten for more complex error handling. The
paresed tokens are stored within an object of a userdefined class
(c++) derived from an abstract class. If you dont need this you can
use standard classes too. The lexer supports different 'lexer classes'
and switching between them. This makes it possible to handle the same
input differently dependend on the contex, for example switching
between normal code and block comments. The parsed tokens are
available as an stream.
The stream is used by the parser. For the parser rules are defined
within ebnf. Here actions can be used too. Its possible to parse rules
dependend on a semantic action, wich makes defining the rules
somethimes easer. So you may choose the right rule dependent on seeing
an variable or method idenfifier. With the used parsing technology,
wich makes it possible to use an ifinite lookahead if needed even
complex grammers can easy be implemented.
There ar two possible methods to use the rules: They can behave like
funktions taking arguments and returnning an result:
start
: <<int r;>> // init-action declares local var r
expr[3,4] > [r] <<printf("result %d\n");>>
;
expr[int a, int b] > [int result]
: i:INT <<$result = $a+$b+atoi($i->getText());>>
;
Or to construct an syntax tree from the token streem. This can be done
automatic trough the rule structure, or 'manualy' through defining
actions to construct the tree. If you use automatic construction its
possible to surpress tokens or choose or rotate the tree so that the
actual token is the new root of the rule. With this methods its very
simple to construct the desired syntax tree.
The resulting tree can be processed with a tree walker. To construct
one simple use the programm 'sorcerer'. The structure of the tree is
defines with special rules. Within the rule actions you can now
transform the tree for further processing or do other stuff, normaly
compiler actions like semantic checking and code generation.
I hope this description answers your question. If you want more to
know read the pccts book that is available as pdf for download.
Merry Christmas,
Philipp