Splitting apart regular expressions... 
Author Message
 Splitting apart regular expressions...

Hi,

I'm working on a program to take a regular expression and turn it into a
simple categorical grammar.

Most of that isn't too hard, but I'm having trouble getting started. Here is
the problem:

I need to take a regular expression that looks like this:

((aUb+)U(cUd))

and split it into a tree which would look like this:

            U
      /             \
    U              U
  /    \          /     \
a     b+     c     d
        |
       b

The operators are . + and U. The U and . are both binary and the + is unary.
So an expression could also look like this:

((a.b)+ . ((xUy) U c) . b+)

So does anyone have any idea how I would do this? Also, does anyone have
suggestions on how I would represent this in perl?

Thanks for any and all suggestions,

Allen

--

http://www.*-*-*.com/



Mon, 23 Jul 2001 03:00:00 GMT  
 Splitting apart regular expressions...

: I'm working on a program to take a regular expression and turn it into a
: simple categorical grammar.

: Most of that isn't too hard, but I'm having trouble getting started. Here is
: the problem:

: I need to take a regular expression that looks like this:

: ((aUb+)U(cUd))

: and split it into a tree which would look like this:

[snip]

: The operators are . + and U. The U and . are both binary and the + is unary.

You need to write an expression parser; probably a simple recursive
descent parser will do, though you could use a parser-generator tool like
Parse:Yapp or Parse::RecDescent.  See any textbook on compilers for the
details.  You might want to use 1-symbol lookahead in your lexer to make
your parser see 'ab' as 'a.b' which is easier to parse.



Tue, 24 Jul 2001 03:00:00 GMT  
 Splitting apart regular expressions...


Quote:
>So does anyone have any idea how I would do this? Also, does anyone have
>suggestions on how I would represent this in perl?

The Regex.pm module that I wrote for the Perl Journal last year has a
function in it that does this for a limited subset of regular
expressions, and would not be difficult to extend.  It might give you
some ideas.

In particular, it returns a tree structure in Lisp style: If A, B, ...
are the structures that represent the subtrees of a tree T, and R
is the data at the root od T, then T is represented by

        [ T => A, B, ... ]

For example, the regex (a|b+)c  turns into

   [ CONCAT =>
     [ ALTERN =>
       [ LITERAL => 'a' ],
       [ PLUS => [ LITERAL => 'b' ]],
     ],
     [ LITERAL => 'c' ]
   ]

It's at <URL:http://www.plover.com/~mjd/perl/Regex/>.  Be sure to pick
up the patch that adds support for the dot metacharacter.



Tue, 24 Jul 2001 03:00:00 GMT  
 
 [ 3 post ] 

 Relevant Pages 

1. split and regular expression modifiers

2. Regular expression instead of split

3. Help with a regular expression and split

4. In search of a regular expression for split.

5. Regular expression / split question

6. Help need with regular expressions and try to find the expression .t

7. Looking for script to take e-mail apart

8. referencing scalars apart from subroutine calls

9. Ripping apart a scalar - Important!

10. Rip apart my code

11. Tell apart file URLs from dir URLs?

12. Need a regular Express for Perl split command.

 

 
Powered by phpBB® Forum Software