Repeated regex matching. 
Author Message
 Repeated regex matching.

Let's say I have a string:

$foo = "popopop";

and then I have this code:

while(($foo =~ m/pop/gi)){
   print "MATCHED\n";

Quote:
}

This prints the word MATCHED two times instead of 3
times.  I.e. the string

popopop
|   |   |
It should see matches here

but instead it's seeing matches

popopop
|       |

there.  

How do I get the desired behavior out of a regex?  I.e. how
do I get a subsequent regex to return a match that can include
part of, but not all of, a previous match?

--
David Allen
http://www.*-*-*.com/



Wed, 18 Jun 1902 08:00:00 GMT  
 Repeated regex matching.

Quote:

>Let's say I have a string:

>$foo = "popopop";

>and then I have this code:

>while(($foo =~ m/pop/gi)){
>   print "MATCHED\n";
>}

>This prints the word MATCHED two times instead of 3
>times.

That's because after the first match, the current position on the
string, held in pos($foo), is now 3, and it will start matching at the
4th character.

If you want it to back up a character each time, you could do something
like this:

  $foo = "popopop";

  while(($foo =~ m/pop/gi)){
     print "MATCHED: ", pos($foo), "\n";
     pos($foo)--;
  }

  -Ken



Wed, 18 Jun 1902 08:00:00 GMT  
 Repeated regex matching.

Quote:

> Let's say I have a string:

> $foo = "popopop";

> and then I have this code:

> while(($foo =~ m/pop/gi)){
>    print "MATCHED\n";
> }

> This prints the word MATCHED two times instead of 3
> times.  I.e. the string

That's right m//g finds non overlapping occorances.

Quote:
> How do I get the desired behavior out of a regex?  I.e. how
> do I get a subsequent regex to return a match that can include
> part of, but not all of, a previous match?

Lookahead assertion.

while(($foo =~ m/p(?=op)/gi)){
    print "MATCHED\n";

Quote:
}

--
     \\   ( )
  .  _\\__[oo

 .  l___\\
  # ll  l\\
 ###LL  LL\\


Wed, 18 Jun 1902 08:00:00 GMT  
 Repeated regex matching.

Quote:

>Let's say I have a string:

>$foo = "popopop";

>and then I have this code:

>while(($foo =~ m/pop/gi)){
>   print "MATCHED\n";
>}

>This prints the word MATCHED two times instead of 3
>times.

[snip overlapping matches]

Quote:
>How do I get the desired behavior out of a regex?  I.e. how
>do I get a subsequent regex to return a match that can include
>part of, but not all of, a previous match?

   perldoc -f pos

while(($foo =~ m/(pop)/gi)){
   print "MATCHED '$`'\n";
   pos($foo) -= length($1) - 1; # one char beyond start of this match

Quote:
}

--
    Tad McClellan                          SGML Consulting

    Fort Worth, Texas


Wed, 18 Jun 1902 08:00:00 GMT  
 Repeated regex matching.

Quote:
> Let's say I have a string:

> $foo = "popopop";

> and then I have this code:

> while(($foo =~ m/pop/gi)){
>    print "MATCHED\n";
> }

> This prints the word MATCHED two times instead of 3
> times.  I.e. the string
> ...
> How do I get the desired behavior out of a regex?  I.e. how
> do I get a subsequent regex to return a match that can include
> part of, but not all of, a previous match?

There is the brute force approach:

  my $foo = "poPOPopo1234POp";
  for (map{ $foo =~ /^.{$_}(pop)/i } 0..length $foo) {
     print "MATCHED $_\n";
  }

but it sure is ugly. :o)

Swee Heng



Wed, 18 Jun 1902 08:00:00 GMT  
 Repeated regex matching.

MCMXCIII in <URL::">
() Let's say I have a string:
()
() $foo = "popopop";
()
() and then I have this code:
()
() while(($foo =~ m/pop/gi)){
()    print "MATCHED\n";
() }
()
() This prints the word MATCHED two times instead of 3
() times.  I.e. the string
()
() popopop
() |   |   |
() It should see matches here

If you use a proportional font and do vertical formatting, you're
garanteed that 99% of the people have no clue what you mean.

It should match twice. Once at the first "pop", and once at the
last "pop". The second one overlaps the first, so it will not match.
The regex machine picks up *after* the position it finished.

() but instead it's seeing matches
()
() popopop
() |       |
()
() there.  

Nah, I don't think so.... The latter | is way after the string... ;-)

() How do I get the desired behavior out of a regex?  I.e. how
() do I get a subsequent regex to return a match that can include
() part of, but not all of, a previous match?

Use a look-ahead:

    while ($foo =~ /(?=pop)/g) {...}

Abigail
--

0)x299=>C=>(0)x9=>XC=>(0)x39=>L=>(0)x9=>XL=>(0)x29=>X=>IX=>0=>0=>0=>V=>IV=>0=>0
=>I=>$r=-2449231+gm_julian_day+time);do{until($r<$#r){$_.=$r[$#r];$r-=$#r}for(;
!$r[--$#r];){}}while$r;$,="\x20";print+$_=>September=>MCMXCIII=>()'



Wed, 18 Jun 1902 08:00:00 GMT  
 Repeated regex matching.

Quote:
> > How do I get the desired behavior out of a regex?  I.e. how
> > do I get a subsequent regex to return a match that can include
> > part of, but not all of, a previous match?

> There is the brute force approach:

>   my $foo = "poPOPopo1234POp";
>   for (map{ $foo =~ /^.{$_}(pop)/i } 0..length $foo) {
>      print "MATCHED $_\n";
>   }

> but it sure is ugly. :o)

Very crude as well. Make sure you don't use it! Read Abigail's suggestion
about using look-aheads instead. I apologise for the hideous monstrosity I
created.

Swee Heng



Wed, 18 Jun 1902 08:00:00 GMT  
 Repeated regex matching.

Quote:

>  $foo = "popopop";

>  while(($foo =~ m/pop/gi)){
>     print "MATCHED: ", pos($foo), "\n";
>     pos($foo)--;
>  }

Of course, this assumes we know the pattern in advance.  If we don't,
we could do it like this:

  $foo = "popopop";
  while ($foo =~ m/pop/gi) {
      print "MATCHED: ", pos($foo), "\n";
      pos($foo) -= length($&) - 1;
  }

This still does not find all the ways the pattern can match the given
string, merely every possible starting position.  This is not an issue
if the pattern has no regex metacharacters, but in that case it makes
much more sense to use index() instead:

  $foo = "popopop";
  for (my $pos = 0; ($pos = index $foo, 'pop', $pos) >= 0; $pos++) {
      print "FOUND: $pos\n";
  }

The difference in the output is because the former code gives the
ending and the latter the starting positions of the matches.

--
Ilmari Karonen - http://www.sci.fi/~iltzu/
Please ignore Godzilla and its pseudonyms - do not feed the troll.



Wed, 18 Jun 1902 08:00:00 GMT  
 
 [ 8 post ] 

 Relevant Pages 

1. Repeated regex matching.

2. regex for "no character repeating itself"

3. Regex for non consecutive repeating character

4. Regex: extracting repeating values like x=a,b,c,d

5. Repeat string matching/substituting ?

6. help with pattern match (repeated patterns)

7. Matching a repeated character class series?

8. Help with regex: How to match shortest match (Anno Siegel)

9. Help with regex: How to match shortest match

10. 5.6.1: regex matching on bytes?

11. RFC: How can I optimise bulk matching regex

12. regex to match a multi line pattern

 

 
Powered by phpBB® Forum Software