Regexp multiple string matches 
Author Message
 Regexp multiple string matches

 After working on a Perl 4.036 script that extracts all text between two strings, I've
come across a problem which I am not able to resolve. I made a small demo script
that demonstrates what my script is doing (wrong?) but on a smaller scale:

==========================================================================
# make a string with multiple matching pairs so we can parse it
$s = "This is STRING0 inside of string pairs STRING1 and STRING0 more inside STRING1 pairs.";

# now try to pick out what's in between the pairs.
$s =~ /STRING0(.*)STRING1/ && ($m = $1);

# print this mess out
print "s: [$s]\nm: [$m]\n";
==========================================================================

The output is as follows:

s: [This is STRING0 inside of string pairs STRING1 and STRING0 more inside STRING1 pairs.]
m: [ inside of string pairs STRING1 and STRING0 more inside ]

..but I really *intended* for the output to be:

s: [This is STRING0 inside of string pairs STRING1 and STRING0 more inside STRING1 pairs.]
m: [ inside of string pairs ]

..which is the string in between the FIRST two pairs. It seems to be matching the first
"STRING0" with the last "STRING1" instead of the *next* "STRING1", which is what I
want.

..help!



Sun, 22 Dec 1996 01:59:37 GMT  
 Regexp multiple string matches
|>  After working on a Perl 4.036 script that extracts all text between two strings, I've
|> come across a problem which I am not able to resolve. I made a small demo script
|> that demonstrates what my script is doing (wrong?) but on a smaller scale:
|>
|> ==========================================================================
|> # make a string with multiple matching pairs so we can parse it
|> $s = "This is STRING0 inside of string pairs STRING1 and STRING0 more inside STRING1 pairs.";
|>
|> # now try to pick out what's in between the pairs.
|> $s =~ /STRING0(.*)STRING1/ && ($m = $1);
|>
|> # print this mess out
|> print "s: [$s]\nm: [$m]\n";
|> ==========================================================================
|>
|> The output is as follows:
|>
|> s: [This is STRING0 inside of string pairs STRING1 and STRING0 more inside STRING1 pairs.]
|> m: [ inside of string pairs STRING1 and STRING0 more inside ]
|>
|> ..but I really *intended* for the output to be:
|>
|> s: [This is STRING0 inside of string pairs STRING1 and STRING0 more inside STRING1 pairs.]
|> m: [ inside of string pairs ]
|>
|> ..which is the string in between the FIRST two pairs. It seems to be matching the first
|> "STRING0" with the last "STRING1" instead of the *next* "STRING1", which is what I
|> want.
|>
|>
|> ..help!
|>

I immediately posted off the top of my head, then regretted it.  My solution had
worked for your explicit case, but not for the general case.  I canceled it,
hopefully I canceled it in time :-)

Here's one way to do it: (Probably not the best way.)

# make a string with multiple matching pairs so we can parse it
$s = "This is STRING0 inside of string pairs STRING1 and STRING0 more inside STRING1 pairs.";

# now try to pick out what's in between the pairs.
$s =~ /STRING0(.*)STRING1/ && ($m = $1);

# ADD THIS LINE
$m =~ s/STRING1.*//;

# print this mess out
print "s: [$s]\nm: [$m]\n";

PS: My erroneous solution had a pattern match which looked like:
    $s =~ /STRING0([^STRING1]*)STRING1/ && ($m = $1);

    but I forgot that I am saying NOT S AND NOT T AND NOT R... instead of
    NOT STRING1.

Pat

-----------------------------------------------------------------------
| Patrick Martin    | My opinions | World's Crummiest JAPH:           |

| Computer Engineer --------------| The First Attempt:                |
| Advance Geophysical Corporation | print 'Just another perl hacker`; |
-----------------------------------------------------------------------



Sun, 22 Dec 1996 03:37:58 GMT  
 Regexp multiple string matches

Quote:

> After working on a Perl 4.036 script that extracts all text between two strings, I've
>come across a problem which I am not able to resolve. I made a small demo script
>that demonstrates what my script is doing (wrong?) but on a smaller scale:

>==========================================================================
># make a string with multiple matching pairs so we can parse it
>$s = "This is STRING0 inside of string pairs STRING1 and STRING0 more inside STRING1 pairs.";

># now try to pick out what's in between the pairs.
>$s =~ /STRING0(.*)STRING1/ && ($m = $1);

># print this mess out
>print "s: [$s]\nm: [$m]\n";
>==========================================================================
>..which is the string in between the FIRST two pairs. It seems to be matching the first
>"STRING0" with the last "STRING1" instead of the *next* "STRING1", which is what I
>want.

This is correct, the .* in the regexp is being greedy.  If you can
install perl5 then there's a simple solution, use a postfix ? on the .*
to do non-greedy matching:

#!/usr/bin/perl5

# make a string with multiple matching pairs so we can parse it
$s = "This is STRING0 inside of string pairs STRING1 and STRING0 more inside STRING1 pairs.";

# now try to pick out what's in between the pairs.
$s =~ /STRING0(.*?)STRING1/ && ($m = $1);

# print this mess out
print "s: [$s]\nm: [$m]\n";

produces:

s: [This is STRING0 inside of string pairs STRING1 and STRING0 more inside STRING1 pairs.]
m: [ inside of string pairs ]

Perl5 seems pretty stable even though it's in alpha still - all of my new
code uses it and it doesn't break too much old code (which could do with
a re-write anyway ;-)

If you're stuck with perl4 you might want to split the string on the
second delimiter and process each of the fragments in turn.

Hope this helps,

Mike

--
The "usual disclaimers" apply.    | Meiko
Mike Stok                         | 130C Baker Ave. Ext

Meiko tel: (508) 371 0088 x124    |



Sun, 22 Dec 1996 03:38:05 GMT  
 
 [ 3 post ] 

 Relevant Pages 

1. Regexp Multiple Matching Problem

2. Matching multiple groups/regexp problem

3. multiple match & replace regexp

4. Regexp for multiple matches

5. Matching over multiple lines (regexp problem)

6. Regexp to match a C-style string

7. Tricky: Generate matching string from regexp

8. Regexp fails matches @ string start

9. REGEXP: Problem matching string containing parens or brackets

10. regexp: matching at least n chars out of a string of length m

11. regexp to return list of matches from string

12. Multiple matches on one string

 

 
Powered by phpBB® Forum Software