Bizarre regexp behavior (bug?) 
Author Message
 Bizarre regexp behavior (bug?)

: Could someone please explain this regexp behavior?
:
: $tmp = 'TEST';
: $tmp =~ s/(ES)*//; # $tmp is still 'TEST', not 'TT'
: $tmp =~ s/(ST)*//; # $tmp is still 'TEST', not 'TE'
: $tmp =~ s/(TE)*//; # $tmp is now 'ST' as expected
:
: Any help explaining this would be greatly appreciated. Please respond via
: email as well as to the list. Thanks.
:

It is far more likely a bee in your bonnet than a bug in your Perl.

Did you try using /g?  Then it works. Look:

        $_ = 'TT';
        $matched = /(ES)*/;
        print "Yup, it matched on nothing, just like I thought.\n"
                if $matched;

Grouping the letters (ES) and then placing the zero or more match
specification after it is a guaranteed match. Since the first position
in the string "matches" it, no further scan is done, and the replacement
is done.  Since the replacement is '', nothing changes.

Try:

        $tmp = 'TEST';
        $tmp =~ s/(ES)*/something/;
        print $tmp, "\n";

--
Regards,
Mike Heins     [mailed and posted]   http://www.*-*-*.com/ ~mikeh ___       ___
                                    Internet Robotics        |_ _|____ |_ _|
Be patient. God isn't               131 Willow Lane, Floor 2  | ||  _ \ | |
finished with me yet.               Oxford, OH  45056         | || |_) || |
 -- unknown                                                  |___|  _ <|___|



Sun, 07 Feb 1999 03:00:00 GMT  
 Bizarre regexp behavior (bug?)


Quote:

>Could someone please explain this regexp behavior?

>$tmp = 'TEST';
>$tmp =~ s/(ES)*//; # $tmp is still 'TEST', not 'TT'
>$tmp =~ s/(ST)*//; # $tmp is still 'TEST', not 'TE'
>$tmp =~ s/(TE)*//; # $tmp is now 'ST' as expected

>Any help explaining this would be greatly appreciated. Please respond via
>email as well as to the list. Thanks.

The regex engine in perl starts looking for a match at the left of the
string, and settles the first match it finds (matching as many charcaters
as possible), so

  $tmp =~ s/(ES)*//;

looks for 0 or more ES strings at the beginning of $tmp and immediately
matches on 0 of them right at the beginning of the string, similarly the
(ST)* matches 0 of them right at the beginning of the string.  In the
(TE)* case it matches against one occurrence of TE at the beginning of
the string which is a longer sequence of characters than none.

With the * quantifier you must remember that if there is a possible legal
0-times match which will allow the regex to succeed (and it can't consume
more characters for that chunk of the expression by matching 1 or more
times) then the regex engine will not speculatively continue trying to
see if there are longer matches further down the string.

There are several good articles in issues of The Perl Journal (check out
http://orwant.www.media.mit.edu/the_perl_journal/) which start to strip
away some of the air of mystery that people seem to insist on shrouding
regexes in.

Hope this helps,

Mike
--

http://www.stok.co.uk/~mike/       |   PGP fingerprint FE 56 4D 7D 42 1A 4A 9C
http://www.token.net/~mike/        |                   65 F3 3F 1D 27 22 B7 41



Sun, 07 Feb 1999 03:00:00 GMT  
 Bizarre regexp behavior (bug?)

Quote:



> >Could someone please explain this regexp behavior?

IT'S A BUG

Quote:

> >$tmp = 'TEST';
> >$tmp =~ s/(ES)*//; # $tmp is still 'TEST', not 'TT'
> >$tmp =~ s/(ST)*//; # $tmp is still 'TEST', not 'TE'
> >$tmp =~ s/(TE)*//; # $tmp is now 'ST' as expected

> >Any help explaining this would be greatly appreciated. Please respond via
> >email as well as to the list. Thanks.

> The regex engine in perl starts looking for a match at the left of the
> string, and settles the first match it finds (matching as many charcaters
> as possible), so

WRONG, WRONG, WRONG.

The behavior you describe only occurs with the Perl 5 "non-greedy"
qualifier.

The first pattern SHOULD have matched the longest string that fits
the patter, that is, ES.  I tried changing the source string to TESEST
and the result was 'TT' as it should have been.

Also, 4.036 gave the same result as 5.003.  Hmm.  Scary!

Perl's regular expression engine is "greedy" by default, matching the
longest sequence of characters that matches the entire pattern.

        -joseph

--
76% of all CGI questions posted in comp.lang.perl.misc are answered by:
"CGI.pm.  LWP.  http://www.perl.com/CPAN/modules/01modules.index.html."
....

Proprietor, 5 Sigma Productions          P.O. Box 6250 Chandler AZ 85246




Mon, 08 Feb 1999 03:00:00 GMT  
 Bizarre regexp behavior (bug?)


: >


: > >Could someone please explain this regexp behavior?
:
: IT'S A BUG
:

Do you have to shout? When you are wrong?

: > >
: > >$tmp = 'TEST';
: > >$tmp =~ s/(ES)*//; # $tmp is still 'TEST', not 'TT'
: > >$tmp =~ s/(ST)*//; # $tmp is still 'TEST', not 'TE'
: > >$tmp =~ s/(TE)*//; # $tmp is now 'ST' as expected
: > >
: > >Any help explaining this would be greatly appreciated. Please respond via
: > >email as well as to the list. Thanks.
: >
: > The regex engine in perl starts looking for a match at the left of the
: > string, and settles the first match it finds (matching as many charcaters
: > as possible), so
:
: WRONG, WRONG, WRONG.

That is correct.  You are.

:
: The behavior you describe only occurs with the Perl 5 "non-greedy"
: qualifier.

No. That would be true for .*ES, but not for (ES)*. Try this:

        $tmp = 'TEST';
        $tmp =~ s/(ES)*/XX/g;
        print $tmp, $/;

See what I mean? The definition of greedy is not "the longest string".
Quoting from perlre:

     The standard quantifiers are all "greedy", in that they
     match as many occurrences as possible (given a particular starting
     location) without causing the pattern to fail.

--
Regards,
Mike Heins     [mailed and posted]  http://www.iac.net/~mikeh ___       ___
                                    Internet Robotics        |_ _|____ |_ _|
Be patient. God isn't               131 Willow Lane, Floor 2  | ||  _ \ | |
finished with me yet.               Oxford, OH  45056         | || |_) || |
 -- unknown                                                  |___|  _ <|___|



Tue, 09 Feb 1999 03:00:00 GMT  
 Bizarre regexp behavior (bug?)




Quote:
> Oops, this is tricky.  This is working correctly although it may not
> seem that it is.  It was Stok's wording that got me going.  Sorry.

> Obviously Perl is NOT selecting the longest match it can find in the
> source string.

> However (and this is the important part), it IS selecting the longest
> match it can find AT THE STARTING POSITION WHERE IT FIRST FINDS A MATCH.

Wrong again.

Ilya



Tue, 09 Feb 1999 03:00:00 GMT  
 
 [ 6 post ] 

 Relevant Pages 

1. What causes this bizarre hash behavior?

2. Bizarre mod_perl behavior

3. bizarre PERL/CGI behavior question

4. bizarre REQUIRE behavior

5. Truly bizarre do FILE behavior

6. bizarre $var->method behavior

7. bizarre regexp behaviour with split

8. Truly bizarre bug: perl5.004_04/IO::Select/libwww-perl-5.32

9. perl debugger acting bizarre (debugger bug?)

10. Truly bizarre bug: perl5.004_04/IO::Select/libwww-perl-5.32

11. Bizarre possible perl bug -- help!

12. Very bizarre bug???

 

 
Powered by phpBB® Forum Software