DCG Limitations? 
Author Message
 DCG Limitations?

Hi.
I'm very new to prolog (1 week), but I'm an OK
engineer and I learn fast :-).

I started writing around the PRESS code from the
Sterling and Shapiro book, using SWI prolog and
XPCE to make a gui for an equation solver. (to
help my daughter with her maths homework :-)

I thought to parse "proper" algebraic expressions
(so "2xy" means "2 * x * y" etc.) into equivalent
prolog structures, I wrote a tokenizing DFA to do the
preliminary parsing but very quickly hit a brick wall
wrt DCGs.

My problem is this:

I know that:

expr(X - Y) --> expr(X), ['-'], primary(Y).

won't work, because of the direct recursion on expr(X).

I know that I can rewrite this as:

expr(X - Y) --> primary(X), ['-'], expr(Y).

and that will parse *but* the associativity of the
operators is wrong. given [x, -, y, -, z] this builds
a right associative (x - (y - z)), instead of the correct
left associative ((x - y) - z).

Thoughts anyone?
--
Bill Hails



Sat, 11 Jun 2005 00:38:51 GMT  
 DCG Limitations?

Quote:
> I thought to parse "proper" algebraic expressions
> (so "2xy" means "2 * x * y" etc.) into equivalent
> prolog structures, I wrote a tokenizing DFA to do the
> preliminary parsing but very quickly hit a brick wall
> wrt DCGs.

In similar situations, I usually use op/3 to declare any required operators.
Then I just use the built-in parser in SWI to parse the term, using
read_term/2 or atom_to_term/3.

Quote:
> expr(X - Y) --> primary(X), ['-'], expr(Y).

> and that will parse *but* the associativity of the
> operators is wrong. given [x, -, y, -, z] this builds
> a right associative (x - (y - z)), instead of the correct
> left associative ((x - y) - z).

> Thoughts anyone?

A simple and quick solution (though a bit inefficient) would be to keep
track of the "depth" of the recursion. If each production contains at least
one terminal, then the maximum depth of the recursion is the length of the
parse string. Simply 'fail' if the recursion becomes too deep.

-Benjamin Johnston



Sat, 11 Jun 2005 07:31:22 GMT  
 DCG Limitations?

Quote:

>> (so "2xy" means "2 * x * y" etc.)

> In similar situations, I usually use op/3 to declare any required
> operators. Then I just use the built-in parser in SWI to parse the term,
> using read_term/2 or atom_to_term/3.

That's exactly what I was doing at first, just using the
built-in math operators, but my typos kept crashing the
program :-) I was typing "2x" instead of 2 * x which set
me off thinking about a parser. Especially nice for
things like 2ysin3 => 2 * y * sin(3).

Quote:
>> expr(X - Y) --> primary(X), ['-'], expr(Y).

>> and that will parse *but* the associativity of the
>> operators is wrong. given [x, -, y, -, z] this builds
>> a right associative (x - (y - z)), instead of the correct
>> left associative ((x - y) - z).

>> Thoughts anyone?

> A simple and quick solution (though a bit inefficient) would be to keep
> track of the "depth" of the recursion. If each production contains at
> least one terminal, then the maximum depth of the recursion is the length
> of the parse string. Simply 'fail' if the recursion becomes too deep.

> -Benjamin Johnston

Thanks, I'll give that a try. I have one case where a production doesn't
contain a terminal but I can set some reasonably high arbitrary limit and
fail on that.

The reason I'm still thinking this is a general problem with DCGs is that
I've found two implementations of mini languages in Prolog
(the one in Sterling and Shapiro and another from an archive) and both
dodge the issue by either accepting the mathematecally incorrect right
associativity or by making the mathematical operators non-associative.

Is anyone aware of any other implementations of algebraic parsers in
Prolog?

--
Bill Hails



Sat, 11 Jun 2005 17:56:29 GMT  
 DCG Limitations?

Quote:

> The reason I'm still thinking this is a general problem with DCGs is that
> I've found two implementations of mini languages in Prolog
> (the one in Sterling and Shapiro and another from an archive) and both
> dodge the issue by either accepting the mathematecally incorrect right
> associativity or by making the mathematical operators non-associative.
> Is anyone aware of any other implementations of algebraic parsers in
> Prolog?

One other common trick is to interpret the DCG rules by a bottom-up
parser. For instance a left-corner parser. This is explained e.g. in
Pereira and Shieber, Prolog and Natural Language Analysis.

Gj

--
Gertjan van Noord Alfa-informatica, RUG,  Postbus 716, 9700 AS Groningen
vannoord at let dot rug dot nl            http://www.let.rug.nl/~vannoord



Sat, 11 Jun 2005 19:47:21 GMT  
 DCG Limitations?

Quote:


>> The reason I'm still thinking this is a general problem with DCGs is that
>> I've found two implementations of mini languages in Prolog
>> (the one in Sterling and Shapiro and another from an archive) and both
>> dodge the issue by either accepting the mathematecally incorrect right
>> associativity or by making the mathematical operators non-associative.

>> Is anyone aware of any other implementations of algebraic parsers in
>> Prolog?

> One other common trick is to interpret the DCG rules by a bottom-up
> parser. For instance a left-corner parser. This is explained e.g. in
> Pereira and Shieber, Prolog and Natural Language Analysis.

I'll see if I can find that book.

In the meantime I've abandoned DCGs altogether, It's actually proved
quite pleasantly easy to write a parser from scratch with just basic
prolog, it's not quite working yet, but essentially it's just rules
like:

rational(Tokens, Expr, Remaining) :-
    int(Tokens, N, ['/'|Rest]),
    int(Rest, D, Remaining),
    Expr = N / D.

int([Num|Rremaining], Num, Remaining) :-
    integer(Num).

the point being that if I guarantee to consume at least one token for
each production I have to either fail or get to the end of the string.
I'm not a computer scientist, but I think that is pretty much what any
bottom-up parser will do.

Quote:
> Gj

--
Bill Hails


Sun, 12 Jun 2005 02:54:53 GMT  
 DCG Limitations?

Quote:

> In the meantime I've abandoned DCGs altogether, It's actually proved
> quite pleasantly easy to write a parser from scratch with just basic
> prolog, it's not quite working yet, but essentially it's just rules
> like:
> rational(Tokens, Expr, Remaining) :-
>     int(Tokens, N, ['/'|Rest]),
>     int(Rest, D, Remaining),
>     Expr = N / D.
> int([Num|Rremaining], Num, Remaining) :-
>     integer(Num).
> the point being that if I guarantee to consume at least one token for
> each production I have to either fail or get to the end of the string.
> I'm not a computer scientist, but I think that is pretty much what any
> bottom-up parser will do.

This is a much stronger restriction than neccessary. But this is beyond
comp.lang.prolog, presumably.

Gj

--
Gertjan van Noord Alfa-informatica, RUG,  Postbus 716, 9700 AS Groningen
vannoord at let dot rug dot nl            http://www.let.rug.nl/~vannoord



Sun, 12 Jun 2005 07:46:14 GMT  
 DCG Limitations?

Quote:

> I'll see if I can find that book.

Easy:

<http://www.mtome.com/pnla-digital.html>

Quote:
> In the meantime I've abandoned DCGs altogether, It's actually proved
> quite pleasantly easy to write a parser from scratch with just basic
> prolog, it's not quite working yet, but essentially it's just rules
> like:

> rational(Tokens, Expr, Remaining) :-
>   int(Tokens, N, ['/'|Rest]),
>   int(Rest, D, Remaining),
>   Expr = N / D.

> int([Num|Rremaining], Num, Remaining) :-
>   integer(Num).

> the point being that if I guarantee to consume at least one token for
> each production I have to either fail or get to the end of the string.
> I'm not a computer scientist, but I think that is pretty much what any
> bottom-up parser will do.

This is more strict that necessary, as Gertjan said. You may want to look at
<http://www.ifs.org.uk/~popx/prolog/tools/files/read.pl> for a complete
Prolog parser written in bottom-up operator precedence style. It could be
simplified and modified for your purposes. It goes together with  the
tokenizer <http://www.ifs.org.uk/~popx/prolog/tools/files/rdtok.gen>. Both
are written in a somewhat obsolete dialect, you may need to edit here and
there.

-- F



Mon, 13 Jun 2005 05:26:37 GMT  
 DCG Limitations?

Quote:

>I thought to parse "proper" algebraic expressions
>(so "2xy" means "2 * x * y" etc.) into equivalent
>prolog structures, I wrote a tokenizing DFA to do the
>preliminary parsing but very quickly hit a brick wall
>wrt DCGs.
...
>expr(X - Y) --> expr(X), ['-'], primary(Y).

>won't work, because of the direct recursion on expr(X).

One good way to handle this problem is to eliminate left recursion
by left-factoring the grammar.  For example, instead of

        expr --> expr, ['-'], primary.
        expr --> primary.

you can use

        expr --> primary, expr2.

        expr2 --> ['-'], primary, expr2.
        expr2 --> [].

The latter grammar parses the same language as the former,
but has the advantage that it is  not left recursive.
It's easy enough to construct the same (left-associative) parse tree
as you'd expect for the original grammar, by passing in the parse tree
for the left-hand-side to `expr2'.  Here's an example of how to do it.
This is an extract taken from one of the sample programs in the Mercury
distribution (samples/calculator.m).  If you delete the :- pred and :-
type declarations, the code should be valid Prolog.  (Some of the names
used are slightly different than in your original example: this uses
`minus/2' instead of `-/2' and `term' instead of `primary'.)

:- type expr
        ---> number(int)
        ;       plus(expr, expr)
        ;       minus(expr, expr)
        ;       times(expr, expr)
        ;       div(expr, expr).

:- pred expr(expr::out, list(char)::in, list(char)::out) is semidet.
expr(Expr) -->
        factor(Factor),
        expr2(Factor, Expr).

:- pred expr2(expr::in, expr::out, list(char)::in, list(char)::out) is semidet.
expr2(Factor, Expr) -->
        ( ['+'] -> factor(Factor2), expr2(plus( Factor, Factor2), Expr)
        ; ['-'] -> factor(Factor2), expr2(minus(Factor, Factor2), Expr)
        ; { Expr = Factor }
        ).

:- pred factor(expr::out, list(char)::in, list(char)::out) is semidet.
factor(Factor) -->
        term(Term),
        factor2(Term, Factor).

:- pred factor2(expr::in, expr::out, list(char)::in, list(char)::out)
        is semidet.
factor2(Term, Factor) -->
        ( ['*'] -> term(Term2), factor2(times(Term,Term2), Factor)
        ; ['/'] -> term(Term2), factor2(div(  Term,Term2), Factor)
        ; { Factor = Term }
        ).

--

The University of Melbourne         |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.



Tue, 14 Jun 2005 08:18:42 GMT  
 DCG Limitations?

Quote:



>> I'll see if I can find that book.
> Easy:

> <http://www.mtome.com/pnla-digital.html>

>[...]

> This is more strict that necessary, as Gertjan said. You may want to look
> at <http://www.ifs.org.uk/~popx/prolog/tools/files/read.pl> for a complete
> Prolog parser written in bottom-up operator precedence style. It could be
> simplified and modified for your purposes. It goes together with  the
> tokenizer <http://www.ifs.org.uk/~popx/prolog/tools/files/rdtok.gen>. Both
> are written in a somewhat obsolete dialect, you may need to edit here and
> there.

> -- F

Thanks for those links, I have a working parser now.

--
Bill Hails



Tue, 14 Jun 2005 22:31:06 GMT  
 
 [ 9 post ] 

 Relevant Pages 

1. DCG troubles

2. DCG problem

3. DCG help ASAP!!!

4. DCG help

5. DCG's compared to Haskell monads

6. DCG question.

7. Parsing C++ with dcg?

8. dcg

9. DCG grammer result manipulation

10. DCG Parser

11. DCG help!!!!

12. DCG Help!

 

 
Powered by phpBB® Forum Software