HELP needed on a simple Parse::RecDescent program (problem: some rules are matched twice) 
Author Message
 HELP needed on a simple Parse::RecDescent program (problem: some rules are matched twice)

Hi All,

I am a new user of Parse::RecDescent.  I tried to use v1.80 to
implement a simple parser with a simple grammar for parsing c/c++/java
code.  The result looks good except that 5 rules seem to be matched
twice (see output and expected output below).  I could not figure out
why.  Any help would be greatly appreciated.  Please e-mail

Regards,

Eric Liao

Code here:
=======================================================================
use strict;
use Parse::RecDescent;

$::RD_AUTOACTION = q {
                        for ($::i = 0; $::i<= $#item; $::i++) {
                                print "($::i) $item[$::i] ";
                        }
                                        print "\n";
                };

$::grammar = q{

program         :       BRACE_LEFT statement(s) BRACE_RIGHT

statement       :       expression SEMICOLON

expression      :       unary_expr PLUS_OP expression
                        |       unary_expr

unary_expr      :       UNARY_OP unary_expr
                        |       brack_expr

brack_expr      :       PAREN_LEFT expression PAREN_RIGHT
                        |       token_expr

token_expr      :       constant
                        |       IDENTIFIER

constant        :       NUMBER
                        |       STRING

SEMICOLON       :       ';'
BRACE_LEFT      :       '{'
BRACE_RIGHT     :       '}'
UNARY_OP        :       '-'
PLUS_OP         :       '+'
                        |       '-'
PAREN_LEFT      :       '('
PAREN_RIGHT     :       ')'
NUMBER          :       /\\d+/
STRING          :       /"[^"\\\\]+(\\\\.[^"\\\\]*)*"/
IDENTIFIER      :       /[a-z]\\w*/i

Quote:
};

my $parser = new Parse::RecDescent($::grammar)  or  die "invalid
grammar";
undef $/;
my $text = <DATA>;
defined $parser->program($text) or die "malformed program";

__DATA__
{
   "hello";

Quote:
}

================================================================
Actual output:

(0) BRACE_LEFT (1) {
(0) STRING (1) "hello"
(0) constant (1) 1
(0) token_expr (1) 1
(0) brack_expr (1) 1
(0) unary_expr (1) 1
(0) STRING (1) "hello"
(0) constant (1) 1
(0) token_expr (1) 1
(0) brack_expr (1) 1
(0) unary_expr (1) 1
(0) expression (1) 1
(0) SEMICOLON (1) ;
(0) statement (1) 1 (2) 1
(0) BRACE_RIGHT (1) }
(0) program (1) 1 (2) ARRAY(0x1d1fecc) (3) 1

===========================================
Expected output:

(0) BRACE_LEFT (1) {
(0) STRING (1) "hello"
(0) constant (1) 1
(0) token_expr (1) 1
(0) brack_expr (1) 1
(0) unary_expr (1) 1
(0) expression (1) 1
(0) SEMICOLON (1) ;
(0) statement (1) 1 (2) 1
(0) BRACE_RIGHT (1) }
(0) program (1) 1 (2) ARRAY(0x1d1fecc) (3) 1



Sat, 16 Aug 2003 07:32:09 GMT  
 HELP needed on a simple Parse::RecDescent program (problem: some rules are matched twice)

Eric> I am a new user of Parse::RecDescent.  I tried to use v1.80 to
Eric> implement a simple parser with a simple grammar for parsing c/c++/java
Eric> code.  The result looks good except that 5 rules seem to be matched
Eric> twice (see output and expected output below).  I could not figure out
Eric> why.  Any help would be greatly appreciated.  Please e-mail

You can't print out your result while you are parsing it, because you
can't "unprint" a backtrack.  You backtracked in "expression" from the
first subrule to the second subrule, but you'd already printed out the
result of a successful first step of that first subrule.

Don't do that. Pass the data upstairs, and have the final top-level
rule do all the work.

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095

Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!



Sat, 16 Aug 2003 07:59:00 GMT  
 HELP needed on a simple Parse::RecDescent program (problem: some rules are matched twice)

Quote:
> expression :       unary_expr PLUS_OP expression
>                    |       unary_expr

Here you have a rule with two productions that both start off the same
way.  Each production is attempted in order.  In the case of your
actual data:

Quote:
> __DATA__
> {
>    "hello";
> }

The first production fails, and then the parser back-tracks and tries
the second production.  Since you are printing the data for every
production, you see the productions that comprise this failed
attempt.  This explains the extra output you saw.

Here is the relevant section from the docs:

       Note that this behaviour is quite different from the
       "prefer the longer match" behaviour of yacc. For example,
       if yacc were parsing the rule:

               seq : 'A' 'B'
                   | 'A' 'B' 'C'

       upon matching "AB" it would look ahead to see if a 'C' is
       next and, if so, will match the second production in
       preference to the first. In other words, yacc effectively
       tries all the productions of a rule breadth-first in
       parallel, and selects the "best" match, where "best" means
       longest (note that this is a gross simplification of the
       true behaviour of yacc but it will do for our purposes).

       In contrast, `Parse::RecDescent' tries each production
       depth-first in sequence, and selects the "best" match,
       where "best" means first. This is the fundamental
       difference between "bottom-up" and "recursive descent"
       parsing.

HTH...
--
Ren Maddox



Sat, 16 Aug 2003 07:49:17 GMT  
 HELP needed on a simple Parse::RecDescent program (problem: some rules are matched twice)

say such a terrible thing:

Quote:

>Eric> I am a new user of Parse::RecDescent.  I tried to use v1.80 to
>Eric> implement a simple parser with a simple grammar for parsing c/c++/java
>Eric> code.  The result looks good except that 5 rules seem to be matched
>Eric> twice (see output and expected output below).  I could not figure out
>Eric> why.  Any help would be greatly appreciated.  Please e-mail

>You can't print out your result while you are parsing it, because you
>can't "unprint" a backtrack.  You backtracked in "expression" from the
>first subrule to the second subrule, but you'd already printed out the
>result of a successful first step of that first subrule.

>Don't do that. Pass the data upstairs, and have the final top-level
>rule do all the work.

Alternatively, you could make the grammar LL(1) so there is no need for
backtracking. There is only the one problematic non-terminal so it
shouldn't be too hard. Simply replace this:

expression      :       unary_expr PLUS_OP expression
                        |       unary_expr

with this:

expression      :       unary_expr plus_expression

plus_expression :       PLUS_OP expression
                        |       # nothing

--

Fortune's real live weird band names #13:

Alien Sex Fiend



Sat, 16 Aug 2003 11:42:58 GMT  
 HELP needed on a simple Parse::RecDescent program (problem: some rules are matched twice)
Thanks, Randal and Ren, for your fast response!   I got it running
now!

Eric



Sat, 16 Aug 2003 15:57:22 GMT  
 HELP needed on a simple Parse::RecDescent program (problem: some rules are matched twice)
Gwyn,

Thanks!  It seems that LL has stricter conditions than LR. I am still
learning it.  I don't mind using either type of grammar and I don't
mind the backtracking, as long as the grammar is correct, has no left
recursion, and is good for Parse::Recdescent.  My initial problem was
solved.  As was pointed out, I simply printed at the wrong place (I
have since then learned how to pass the appropriate data upwards.),
and I naively thought that

execution of action = successful match of subrule

which is incorrect.



Quote:
>>You can't print out your result while you are parsing it, because you
>>can't "unprint" a backtrack.  You backtracked in "expression" from the
>>first subrule to the second subrule, but you'd already printed out the
>>result of a successful first step of that first subrule.

>>Don't do that. Pass the data upstairs, and have the final top-level
>>rule do all the work.

>Alternatively, you could make the grammar LL(1) so there is no need for
>backtracking. There is only the one problematic non-terminal so it
>shouldn't be too hard. Simply replace this:

>expression      :       unary_expr PLUS_OP expression
>                        |       unary_expr

>with this:

>expression      :       unary_expr plus_expression

>plus_expression :       PLUS_OP expression
>                        |       # nothing



Sun, 17 Aug 2003 05:54:10 GMT  
 HELP needed on a simple Parse::RecDescent program (problem: some rules are matched twice)

say such a terrible thing:

Quote:
>and I naively thought that

>execution of action = successful match of subrule

>which is incorrect.

Well one of the advantages of using an LL(1) grammar is that this is in fact
correct. It's not always easy to construct an LL(1) (or even LL(k))
grammar for a given language though. Way off topic for this group though
:)

--

Education is the process of driving a set of prejudices down your throat.

                -- Martin H. Fischer



Sun, 17 Aug 2003 12:48:12 GMT  
 HELP needed on a simple Parse::RecDescent program (problem: some rules are matched twice)
You're right. The <leftop> and <rightop> directives do make life
easier!  Glad you mentioned it.

On Tue, 27 Feb 2001 09:32:29 -0800, Terrence Monroe Brannon

Quote:

>I think Parse::RecDescent has a LEFTOP directive to facilitate this parsing
>idiom.



>> say such a terrible thing:

>> >Eric> I am a new user of Parse::RecDescent.  I tried to use v1.80 to
>> >Eric> implement a simple parser with a simple grammar for parsing c/c++/java
>> >Eric> code.  The result looks good except that 5 rules seem to be matched
>> >Eric> twice (see output and expected output below).  I could not figure out
>> >Eric> why.  Any help would be greatly appreciated.  Please e-mail

>> >You can't print out your result while you are parsing it, because you
>> >can't "unprint" a backtrack.  You backtracked in "expression" from the
>> >first subrule to the second subrule, but you'd already printed out the
>> >result of a successful first step of that first subrule.

>> >Don't do that. Pass the data upstairs, and have the final top-level
>> >rule do all the work.

>> Alternatively, you could make the grammar LL(1) so there is no need for
>> backtracking. There is only the one problematic non-terminal so it
>> shouldn't be too hard. Simply replace this:

>> expression      :       unary_expr PLUS_OP expression
>>                         |       unary_expr

>> with this:

>> expression      :       unary_expr plus_expression

>> plus_expression :       PLUS_OP expression
>>                         |       # nothing

>> --

>> Fortune's real live weird band names #13:

>> Alien Sex Fiend



Sun, 17 Aug 2003 16:15:11 GMT  
 
 [ 8 post ] 

 Relevant Pages 

1. HELP needed on a simple Parse::RecDescent program (problem: some rules are matched twice)

2. Parse::RecDescent: Matching rules in arbitrary input ?

3. Help: Problem with simple parsing and Parse::RecDescent

4. Converting SQL89 YACC rule to Parse::RecDescent

5. Parse::RecDescent -- Variable-sequence rules?

6. Parse::RecDescent: need help with actions and return values

7. Need help w/ Parse::RecDescent

8. need help with demo_logic example of Parse::RecDescent

9. Need help with Parse::RecDescent grammar

10. need help with demo_logic example of Parse::RecDescent

11. Need help with Parse::RecDescent grammar

12. Parse::RecDescent and catenating (no skip match)

 

 
Powered by phpBB® Forum Software