Multiline pattern matching with command line invocation 
Author Message
 Multiline pattern matching with command line invocation

I am trying to invoke perl from the command line to change a series of
links across an entire Web site, and some of the links have internal
\n's which HTML doesn't mind, but which causes perl's pattern matching
to miss them.

<A HREF="somelink.html" TARGET="new">

or

<A HREF="somelink.html"
TARGET="new">

What I want to do is

perl -p -i.bak -e 's#(<A HREF=")(.*?html">)#$1/SomeDir/$2#m;' *.html

but it seems to be {*filter*} on the 'm' of 's###m'. Is there any way to
enable multiline pattern matching for command line perl?

TIA

--
Clay Shirky



Thu, 03 Dec 1998 03:00:00 GMT  
 Multiline pattern matching with command line invocation

 [courtesy cc of this posting sent to cited author via email]

In comp.lang.perl.misc,

:I am trying to invoke perl from the command line to change a series of
:links across an entire Web site, and some of the links have internal
:\n's which HTML doesn't mind, but which causes perl's pattern matching
:to miss them.
:
:<A HREF="somelink.html" TARGET="new">
:
:or
:
:<A HREF="somelink.html"
:TARGET="new">
:
:What I want to do is
:
:perl -p -i.bak -e 's#(<A HREF=")(.*?html">)#$1/SomeDir/$2#m;' *.html
:
:but it seems to be {*filter*} on the 'm' of 's###m'. Is there any way to
:enable multiline pattern matching for command line perl?

Yes, but that pattern ain't gonna got it anyway.  /m doesn't do what
you think it does: it lets ^ and $ match internally in a multiline
string.  But you don't have one of those yet.  So you could use -000 or
-076 or -0777, and then you'll need a /s.  

For non-marginal cases and an imperfect solution, use

    #!/usr/bin/perl -p -i.bak
    s#(<\s*A\s+\HREF\s*=\s*(['"]))(.*?html\2.*?>)#$1/SomeDir/$2#sig

But you might wish to consider reading up on something like

    http://www.*-*-*.com/

for why this is harder than it appears.

--tom
--

    "With a PC, I always felt limited by the software available.  
    On Unix, I am limited only by my own knowledge."       --Peter J. Schoenster



Thu, 03 Dec 1998 03:00:00 GMT  
 Multiline pattern matching with command line invocation

Quote:
>: [Can I] enable multiline pattern matching for command line perl?
>    #!/usr/bin/perl -p -i.bak
>    s#(<\s*A\s+\HREF\s*=\s*(['"]))(.*?html\2.*?>)#$1/SomeDir/$2#sig

When I try this from the command line, it still seems to {*filter*}on the
-e statement if the /search/replace/ ends in an 's' (s///s) but not if
its not, (s///). Is there a command line invocation I'm missing?

--
Clay Shirky



Thu, 03 Dec 1998 03:00:00 GMT  
 Multiline pattern matching with command line invocation

[ mailed and posted ]

in file

Quote:
> <A HREF="somelink.html" TARGET="new">
> <A HREF="somelink.html"
> TARGET="new">

> What I want to do is
> perl -p -i.bak -e 's#(<A HREF=")(.*?html">)#$1/SomeDir/$2#m;' *.html>

> but it seems to be {*filter*} on the 'm' of 's###m'. Is there any way to
> enable multiline pattern matching for command line perl?

What do you mean, "{*filter*} on the 'm' of 's###m'" ?

  It seems to work fine if you change either the input file or the perl
code so that they match.  Your are looking for a substring /html">/
which does not appear in your input file.  If you change your
one-liner to this,

   perl -p -i.bak -e 's#(<A HREF=")(.*?html")#$1/SomeDir/$2#m;'
                                          ^^^
you get this from perl 5.001-1n
   <A HREF="/SomeDir/somelink.html" TARGET="new">
   <A HREF="/SomeDir/somelink.html"
   TARGET="new">

Hmm, this doesn't require multiple line matching.
So something like (.*?html".*>) might be an attempt
to match the newline and the TARGET section.

Actually, I knew this wasn't going to work on multiple lines
because -p will only read one line at a time.  man perlre
says that //m makes $ match before a newline in a multiple
line string.  If you want . to match a newline, you have to
use //s.   So if you can redefine a "line" so be be a whole
<block> of html, then you could use s###s.

For instance:
  perl -e '$/=">";' -p  -e 's#(<A HREF=")(.*?html.*">)#$1/SomeDir/$2#s;' temp
generates
  <A HREF="/SomeDir/somelink.html" TARGET="new">
  <A HREF="/SomeDir/somelink.html"
  TARGET="new">

I have no idea if the -e before the -p puts the $/ assignment outside the
loop, nor much knowledge of html to suggest an appropriate regexp
to match on.  You might be aware of greedy .* and nested <<>> structs.

WIA
Joel Graber       --  non-popoty
--

-- office phone: 214-480-2665  pager: 214-597-0822



Tue, 08 Dec 1998 03:00:00 GMT  
 
 [ 4 post ] 

 Relevant Pages 

1. Pattern matching on command line

2. multiline, multi pattern match

3. bug in anchored, multiline pattern match

4. pattern matching and multilines (sort of)

5. perl/tk command line invocation

6. Pattern bug matching whitespace in multi-line match?

7. regex to match a multi line pattern

8. pattern for matching an empty line

9. Pattern matching across multiple lines in a file

10. Multi line pattern matches?

11. Multi-line pattern matching?

12. pattern matching in multi-line strings fails under perl4.034

 

 
Powered by phpBB® Forum Software