Regular Expression madness 
Author Message
 Regular Expression madness

I am having a strange problem with regular expressions. I am reading a
html page from a file and storing it in $Page{raw}. Then I try:

   $Page{raw} =~ m!Simple re!ig;

It fails to find a match even when I know there is a match in the file
and in $Page{raw}.

If I say:

   $temp = $Page{raw};
   $temp =~ m!Simple re!ig;

then it does match.

More strangely if I edit the file so it contains little more than the
regular expression that I want to match then:

   $Page{raw} =~ m!Simple re!ig;

works again.

If I sucessivly shorten the string using

   $Page{raw} = substr($Page{raw}, 1);

in an attempt to locate the source of the problem then I get a match
after removing only the first char (a '<' ).

To make matters even worse, it only seems to be a problem on certain
sets of data files.  The same re on other data files works without any
problems.

At first I thought it might be some strange Unicode problem but if I
say:

   use bytes;

at the top of the file it doesn't change anything.  

Does anyone have any suggestions as to what could be the problem, it's
driving me absolutely nuts.

FVA



Tue, 21 Sep 2004 02:39:40 GMT  
 Regular Expression madness

Quote:

> I am having a strange problem with regular expressions. I am reading a
> html page from a file and storing it in $Page{raw}. Then I try:

>    $Page{raw} =~ m!Simple re!ig;

Is the g necessary?  It will cause the result of the match to vary depending
on how far through the string you are and that position is maintained per
scalar and will wrap around e.g.

  DB<1> $Page{raw} = 'Sex, Sleep, Eat, Drink, Dream'

  DB<2> $Page{raw} =~ m!s!ig && print 'OK at ', pos($Page{raw})
OK at 1
  DB<3> $Page{raw} =~ m!s!ig && print 'OK at ', pos($Page{raw})
OK at 6
  DB<4> $Page{raw} =~ m!s!ig && print 'OK at ', pos($Page{raw})

  DB<5> $Page{raw} =~ m!s!ig && print 'OK at ', pos($Page{raw})
OK at 1

Note that this position is maintained per scalar, so making a copy will
start you back at the beginning.

  perldoc -f pos
  perldoc perlre

Hope this helps,

Mike

--

http://www.stok.co.uk/~mike/       | GPG PGP Key      1024D/059913DA

http://www.starnix.com/            |                  75D2 9EC4 C1C0 0599 13DA



Tue, 21 Sep 2004 03:05:38 GMT  
 Regular Expression madness

Quote:


>> I am having a strange problem with regular expressions. I am reading a
>> html page from a file and storing it in $Page{raw}. Then I try:

>>    $Page{raw} =~ m!Simple re!ig;

>Is the g necessary?  It will cause the result of the match to vary depending
>on how far through the string you are and that position is maintained per
>scalar and will wrap around e.g.

>  DB<1> $Page{raw} = 'Sex, Sleep, Eat, Drink, Dream'

>  DB<2> $Page{raw} =~ m!s!ig && print 'OK at ', pos($Page{raw})
>OK at 1
>  DB<3> $Page{raw} =~ m!s!ig && print 'OK at ', pos($Page{raw})
>OK at 6
>  DB<4> $Page{raw} =~ m!s!ig && print 'OK at ', pos($Page{raw})

>  DB<5> $Page{raw} =~ m!s!ig && print 'OK at ', pos($Page{raw})
>OK at 1

>Note that this position is maintained per scalar, so making a copy will
>start you back at the beginning.

>  perldoc -f pos
>  perldoc perlre

>Hope this helps,

>Mike

Thank you, that did it.  I had developed the bad habit of writing ig
at the end of most of my re's, regardless of whether or not they were
needed.  
I will have to work to break that habit.

Thanks again

FVA



Tue, 21 Sep 2004 04:13:59 GMT  
 
 [ 3 post ] 

 Relevant Pages 

1. Regular Expression Madness

2. Help need with regular expressions and try to find the expression .t

3. Regular Expression Submatch List (missing desireable feature?)

4. Using $^N in Regular Expressions

5. Finding specific directories listings using regular expressions

6. Compiling regular expressions

7. Complicated regular expression help

8. New scalars with regular expressions

9. regular expression matching using a scalar variable

10. Question on regular expressions

11. fast way of finding position of regular expression

12. Regular Expression help...

 

 
Powered by phpBB® Forum Software