Negating (and retaining match) of a regular expression 
Author Message
 Negating (and retaining match) of a regular expression

I have a string that contains the following text:

<p><font size="4"><b>INS Relationship with
    State/Local Law Enforcement</b></font></p>
<p align="left">INS Action....

What I'm trying to do is remove the formatting around the "INS
Relationship with
    State/Local Law Enforcement" (yes, that is a \n\t sequence).
Basically, I've attempted the following:

$html =~ s/<font[ \n\t]*size="4><b>([^<\/b>]*)<\/b>/<span
class="subheading">\1<\/span>/igs;

The problem is that it doesn't consider the elements within the
brackets as one expression - it's like any character that isn't < and
/ and b and > not  characters not matching the entire string of
"</b>".

And it seems like the ?! syntax doesn't allow you to retain the match
in the number variables.  Help, please!  How can I replace the <font>
and <b> tags with the <span> tag as I've indicated above?

Rachael



Fri, 05 Nov 2004 20:02:20 GMT  
 Negating (and retaining match) of a regular expression

Quote:

> I have a string that contains the following text:

> <p><font size="4"><b>INS Relationship with
>     State/Local Law Enforcement</b></font></p>
> <p align="left">INS Action....

> What I'm trying to do is remove the formatting around the "INS
> Relationship with
>     State/Local Law Enforcement" (yes, that is a \n\t sequence).
> Basically, I've attempted the following:

> $html =~ s/<font[ \n\t]*size="4><b>([^<\/b>]*)<\/b>/<span

                                ^^ fails to match `size="4"'
Quote:
> class="subheading">\1<\/span>/igs;

                     ^^  use $1, not \1, in the replacement string

Quote:

> The problem is that it doesn't consider the elements within the
> brackets as one expression - it's like any character that isn't < and
> / and b and > not  characters not matching the entire string of
> "</b>".

Sorry, I don't follow.  If you want to match everything up until </b>,
why not say something like

/(.*?)</b>/

Quote:
> And it seems like the ?! syntax doesn't allow you to retain the match
> in the number variables.  Help, please!  How can I replace the <font>
> and <b> tags with the <span> tag as I've indicated above?

(?!...) would never match anything, so of course nothing in its
parentheses would ever found or much less stored.

If you want to remove all HTML formatting you might consider using
HTML::TokeParser or HTML::Parser .  For your particular example, I
think you're trying to say

$html =~ s|<b>(                 #  into $1
               ((?!</b>).)      #  while we don't see </b>, get a char
               *                #  as many times as we can
              )</b>
          |<SPAM>$1</SPAM>|sx;  # insert <SPAM>.  'x' allows whitespace;
                                # 's' lets '.' match newlines

...but this could be rewritten without the fuss, if you use non-greedy
searching.  See perlre(1).

$html =~ s|<b>(.*?)</b>|<SPAM>$1</SPAM>|sx;
                 ^ ? means non-greedy.

--
John Borwick



Fri, 05 Nov 2004 20:53:14 GMT  
 
 [ 2 post ] 

 Relevant Pages 

1. Regular expression, negating sequence

2. Regular Expressions: Negating bracketed strings

3. Negating regular expression challenge

4. regular expression matching using a scalar variable

5. regular expression | match

6. Regular expression to match up parentheses (()())

7. Regular Expression matching sentence

8. Putting variables inside m// (regular expression matching)

9. regular expression matching (a better diff) again...

10. regular expression matching source code: help reqeust.

11. Regular Expressions: Greedy Matching

12. OT: regular expression matching multiple occurrences of one group

 

 
Powered by phpBB® Forum Software