regex greediness problem (possible bug?) 
Author Message
 regex greediness problem (possible bug?)

Summary: I'm trying to match a regular expression containing
non-printable characters (e.g. "\xFF" or "\x00").  The pattern
contains a greediness modifier (".*?"), but doesn't work if a
non-printable character follows the modifier.  If I use a printable
character after the "?", the pattern suddenly works...  Has anyone
else encountered this problem, and, if so, does anyone have a
solution?

Detail: I'm using perl v. 5.6.1 on cygwin-nt.  I'm trying to match
various image file headers (GIF, JPEG, PNG, etc) and extract the width
and height of the images.  I use the following style of regex pattern:
   $jpghdr = qr/^\xFF\xD8\xFF\xE0..JFIF\x00.*?\xFF\xC0...(..)(..)/s;
When I include the '?' to modify greediness, the pattern stops
working.  If I don't include the '?', the pattern works, but,
obviously, incorrectly, matching the longest possible string.  Aside
from the fact that using regular expressions to process chunk-based
files is odd, the problem remains that the greediness modifier stops
working if followed by a non-printable character.  Quoting and the
like (\Q..\E, etc) didn't help.  ActiveState's perl (v 5.6.0) seems to
exhibit the same behavior.  Ideas, anyone?
     Igor



Sun, 16 May 2004 03:39:11 GMT  
 regex greediness problem (possible bug?)
[posted & mailed]

On Nov 27, Igor Pechtchanski said:

Quote:
>Summary: I'm trying to match a regular expression containing
>non-printable characters (e.g. "\xFF" or "\x00").  The pattern
>contains a greediness modifier (".*?"), but doesn't work if a
>non-printable character follows the modifier.  If I use a printable
>character after the "?", the pattern suddenly works...  Has anyone
>else encountered this problem, and, if so, does anyone have a
>solution?

This bug was fixed in September (by me :) ).  The problem is that there's
a line in the regex source that doesn't allow for a high-bit character
there.  This was fix 12031, which has yet to make it out to a released
version of Perl.

--

RPI Acacia brother #734   http://www.perlmonks.org/   http://www.cpan.org/
** Look for "Regular Expressions in Perl" published by Manning, in 2002 **



Sun, 16 May 2004 04:53:35 GMT  
 
 [ 2 post ] 

 Relevant Pages 

1. Greediness bug in this regex? (Untabifying code example)

2. possible perl bug using regex m??

3. Regex greediness question

4. Controlling RegEx Greediness

5. greediness in regex

6. Problem with formats and possible bug(?)

7. Problem using timelocal.pl - possible perl bug?

8. formline problem -- possible bug?

9. Possible bug pl 18/19 bug in file I/O SunOS 4.1.1 SPARC

10. possible bug: what=bug in h2ph platform=solaris2.4 perlversion=5.000

11. greediness problem matching date expressions? help

12. regex problem (bug in 4.036?)

 

 
Powered by phpBB® Forum Software