FAQ: How do I find matching/nesting anything? 
Author Message
 FAQ: How do I find matching/nesting anything?

This message is one of several periodic postings to comp.lang.perl.misc
intended to make it easier for perl programmers to find answers to
common questions. The core of this message represents an excerpt
from the documentation provided with every Standard Distribution of
Perl.

+
  How do I find matching/nesting anything?

    This isn't something that can be done in one regular expression, no
    matter how complicated. To find something between two single characters,
    a pattern like "/x([^x]*)x/" will get the intervening bits in $1. For
    multiple ones, then something more like "/alpha(.*?)omega/" would be
    needed. But none of these deals with nested patterns, nor can they. For
    that you'll have to write a parser.

    If you are serious about writing a parser, there are a number of modules
    or oddities that will make your life a lot easier. There are the CPAN
    modules Parse::RecDescent, Parse::Yapp, and Text::Balanced; and the
    byacc program.

    One simple destructive, inside-out approach that you might try is to
    pull out the smallest nesting parts one at a time:

        while (s/BEGIN((?:(?!BEGIN)(?!END).)*)END//gs) {
            # do something with $1
        }

    A more complicated and sneaky approach is to make Perl's regular
    expression engine do it for you. This is courtesy Dean Inada, and rather
    has the nature of an Obfuscated Perl Contest entry, but it really does
    work:

        # $_ contains the string to parse
        # BEGIN and END are the opening and closing markers for the
        # nested text.



        ($re=$_)=~s/((BEGIN)|(END)|.)/$)[!$3]\Q$1\E$([!$2]/gs;


-

Documents such as this have been called "Answers to Frequently
Asked Questions" or FAQ for short.  They represent an important
part of the Usenet tradition.  They serve to reduce the volume of
redundant traffic on a news group by providing quality answers to
questions that keep coming up.

If you are some how irritated by seeing these postings you are free
to ignore them or add the sender to your killfile.  If you find
errors or other problems with these postings please send corrections
or comments to the posting email address or to the maintainers as
directed in the perlfaq manual page.

Answers to questions about LOTS of stuff, mostly not related to
Perl, can be found by pointing your news client to


or to the many thousands of other useful Usenet news groups.

Note that the FAQ text posted by this server may have been modified
from that distributed in the stable Perl release.  It may have been
edited to reflect the additions, changes and corrections provided
by respondents, reviewers, and critics to previous postings of
these FAQ. Complete text of these FAQ are available on request.

The perlfaq manual page contains the following copyright notice.

  AUTHOR AND COPYRIGHT

    Copyright (c) 1997-1999 Tom Christiansen and Nathan
    Torkington.  All rights reserved.

This posting is provided in the hope that it will be useful but
does not represent a commitment or contract of any kind on the part
of the contributers, authors or their agents.

                                                           04.21
--
    This space intentionally left blank



Sun, 16 May 2004 02:17:01 GMT  
 FAQ: How do I find matching/nesting anything?

Quote:

>+
>  How do I find matching/nesting anything?
>    This isn't something that can be done in one regular expression, no
>    matter how complicated. To find something between two single characters,
>    a pattern like "/x([^x]*)x/" will get the intervening bits in $1. For
>    multiple ones, then something more like "/alpha(.*?)omega/" would be
>    needed. But none of these deals with nested patterns, nor can they. For
>    that you'll have to write a parser.

[Replace with something like...]

Prior to Perl 5.6.0 (???) this wasn't something that could be done in one
regular expression, no matter how complicated. To find something between
two single characters, a pattern like "/x([^x]*)x/" will get the
intervening bits in $1. For multiple ones, then something more like
"/alpha(.*?)omega/" would be needed. But none of these deals with nested
patterns.

To do that, you need the "deferred interpolation" provided by the
experimental C</(?{$var})/> construct. For example:

        our $nested = qr{
                           \(                  # Opening bracket
                           (?:                 # either
                              (?> [^()]+ )     # Non-parens without backtracking
                           |                   # or
                              (??{ $nested })  # Recursively nested brackets
                           )*                  # Repeat as necessary
                           \)                  # Closing bracket
                        }x;

Note that C<$nested> has to be a package variable, not a lexical.

For more complex nestings you may find it easier to write a parser.
[etc. as before]

[Damian]



Sun, 16 May 2004 20:47:40 GMT  
 
 [ 2 post ] 

 Relevant Pages 

1. FAQ: How do I find matching/nesting anything?

2. FAQ: How do I find matching/nesting anything?

3. FAQ: How do I find matching/nesting anything?

4. FAQ: How do I find matching/nesting anything?

5. FAQ: How do I find matching/nesting anything?

6. FAQ 4.23 How do I find matching/nesting anything?

7. FAQ 4.23 How do I find matching/nesting anything?

8. FAQ 4.23 How do I find matching/nesting anything?

9. FAQ 4.23 How do I find matching/nesting anything?

10. FAQ 4.23 How do I find matching/nesting anything?

11. FAQ 4.23 How do I find matching/nesting anything?

12. FAQ 4.23 How do I find matching/nesting anything?

 

 
Powered by phpBB® Forum Software