Can't Match Multi-Line Pattern 
Author Message
 Can't Match Multi-Line Pattern

I'm sure this is easy and am just missing the boat.
    I'm trying to change text between two HTML tags ("<B>text</B>"
becomes "t^De^Dx^Dt^D"). If the HTML tags are on the same line,
all is right with the world. If the tags span a line, life is not
worth living.
    Per the perl FAQ, I've tried undef'ing $/ and I still get no love.
    I've also tried changing...
s/($pre.*?$post)/markup($1,$char)/ge;
    ...to...
s/($pre.*?$post)/markup($1,$char)/gem;    # m should match multi-line?
    ...with the exact same results.
    Code below. Please be nice. Perl ans

    Matt

$char = "\004";      #the character add between
$pre = "<[B|b]>";    #the prefix
$post = "<\/[B|b]>"; #the suffix

print doit($char,$pre,$post, \$x);

sub doit
{
   my $char = shift;
   my $pre = shift;
   my $post = shift;
   my $retval = '';

   while (<>) {
      s/($pre.*?$post)/markup($1,$char)/ge;
      $retval .= $_;
   }
   return $retval;

Quote:
}

sub markup
{
   my $x = shift;
   my $char = shift;

   $x =~ s/($pre|$post)//g;
   $x =~ s/(.)/$1$char/g;

 return ($x);

Quote:
}

+----------------------------------------------------------------+
| Matt Steinhoff                                    407-420-6121 |



Mon, 22 Jan 2001 03:00:00 GMT  
 Can't Match Multi-Line Pattern
[Posted to comp.lang.perl.misc and a copy mailed.]



Quote:
> I'm sure this is easy and am just missing the boat.
>     I'm trying to change text between two HTML tags ("<B>text</B>"
> becomes "t^De^Dx^Dt^D"). If the HTML tags are on the same line,
> all is right with the world. If the tags span a line, life is not
> worth living.
>     Per the perl FAQ, I've tried undef'ing $/ and I still get no love.
>     I've also tried changing...
> s/($pre.*?$post)/markup($1,$char)/ge;
>     ...to...
> s/($pre.*?$post)/markup($1,$char)/gem;    # m should match multi-line?
>     ...with the exact same results.

You were on the right track to undef $/ and read the whole file into one
string.  Where you went wrong is with the modifier '/m', which only
affects the behavior of '^' and '$'.  What you should use is '/s', which
causes '.' to match "\n".

--
Larry Rosler
Hewlett-Packard Laboratories
http://www.hpl.hp.com/personal/Larry_Rosler/



Mon, 22 Jan 2001 03:00:00 GMT  
 Can't Match Multi-Line Pattern
: I'm sure this is easy and am just missing the boat.
:     I'm trying to change text between two HTML tags ("<B>text</B>"
: becomes "t^De^Dx^Dt^D"). If the HTML tags are on the same line,
: all is right with the world. If the tags span a line, life is not
: worth living.
:     Per the perl FAQ, I've tried undef'ing $/ and I still get no love.

By this, I presume you mean that you've read the entire file into a single
scalar, a necessary (but not sufficient) step toward doing this with one
substitution.

:     I've also tried changing...
: s/($pre.*?$post)/markup($1,$char)/ge;
:     ...to...
: s/($pre.*?$post)/markup($1,$char)/gem;    # m should match multi-line?
:     ...with the exact same results.

For some reason, the /m and /s modifiers cause far more confusion than
seems warranted.  See Freidl's brilliant explanation of these in
_Mastering Regular Expressions_ for full enlightenment.  In brief,
though:

/m has only one effect -- changing how ^ and $ are interpreted.  Without
/m, they match only at the beginning and end of the entire string; with
it, they can also match at internal newlines.  Your regex has no ^ or $,
so /m is irrelevant.

/s has only one effect -- changing how . is interpreted.  Without /s, .
matches any character except newline.  With it, it matches any character
*including* newline.  You want your .* above to span newlines, so /s is
what you need.

Hope this helps...

---------------------------------------------------------------------

 --*--    Home Page: http://www.cinenet.net/users/cberry/home.html
   |      Member of The HTML Writers Guild: http://www.hwg.org/  
       "Every man and every woman is a star."



Mon, 22 Jan 2001 03:00:00 GMT  
 Can't Match Multi-Line Pattern

Quote:

> For some reason, the /m and /s modifiers cause far more confusion than
> seems warranted.  See Freidl's brilliant explanation of these in

That's because in both perlre and the Camel they are essentially
not explained.

Quote:
> _Mastering Regular Expressions_ for full enlightenment.  In brief,
> though:

> /m has only one effect -- changing how ^ and $ are interpreted.  Without
> /m, they match only at the beginning and end of the entire string; with
> it, they can also match at internal newlines.  Your regex has no ^ or $,
> so /m is irrelevant.

> /s has only one effect -- changing how . is interpreted.  Without /s, .
> matches any character except newline.  With it, it matches any character
> *including* newline.  You want your .* above to span newlines, so /s is
> what you need.

Your explanation certainly helped me.  Before I had to stop and go
to a meeting, I was researching this identical point at work,
having forgotten which /[ms] did which.  I didn't get as far as
Friedl, but did check the two I mentioned above and found
no help there.


Mon, 22 Jan 2001 03:00:00 GMT  
 Can't Match Multi-Line Pattern
: > For some reason, the /m and /s modifiers cause far more confusion than
: > seems warranted.  See Freidl's brilliant explanation of these in
:
: That's because in both perlre and the Camel they are essentially
: not explained.

I found this at the first match for '/m' in perlre (5.004_04):

  By default, the "^" character is guaranteed to match at only
  the beginning of the string, the "$" character at only the  
  end (or before the newline at the end) and Perl does certain
  optimizations with the assumption that the string contains  
  only one line.  Embedded newlines will not be matched by "^"
  or "$".  You may, however, wish to treat a string as a      
  multi-line buffer, such that the "^" will match after any  
  newline within the string, and "$" will match before any    
  newline.  At the cost of a little more overhead, you can do
  this by using the /m modifier on the pattern match operator.
  (Older programs did this by setting $*, but this practice is
  now deprecated.)                                            

  To facilitate multi-line substitutions, the "." character  
  never matches a newline unless you use the /s modifier,    
  which in effect tells Perl to pretend the string is a single
  line--even if it isn't.  The /s modifier also overrides the
  setting of $*, in case you have some (badly behaved) older  
  code that sets it in another module.                        

Seems relatively straightforward to me, though perhaps the wording on the
second paragraph could be improved a bit.

: Your explanation certainly helped me.  Before I had to stop and go
: to a meeting, I was researching this identical point at work,
: having forgotten which /[ms] did which.  I didn't get as far as
: Friedl, but did check the two I mentioned above and found
: no help there.

Freidl explains them in glorious detail and emphasizes the "/m affects
*only* ^$, /s affects *only* ." tack which I took in my previous message.
However, all the same semantics are there in perlre.

---------------------------------------------------------------------

 --*--    Home Page: http://www.cinenet.net/users/cberry/home.html
   |      Member of The HTML Writers Guild: http://www.hwg.org/  
       "Every man and every woman is a star."



Mon, 22 Jan 2001 03:00:00 GMT  
 Can't Match Multi-Line Pattern
 [courtesy cc of this posting sent to cited author via email]


:Freidl explains them in glorious detail and emphasizes the "/m affects
:*only* ^$, /s affects *only* ." tack which I took in my previous message.
:However, all the same semantics are there in perlre.

But that's false: /s affects ^ and $ as well, insulating them from
unpleasant surprises out of $* settings.

--tom

#!/usr/bin/perl -00

use strict;

use FileHandle;
$| = 1;

my $DELAY = 2;

my $RANDOMIZE = 1;

while (<DATA>) {
    next if /===/;
    my %record;
    (undef, %record) = split /^([^:\s]+):\s*/m;



    else                     { die "unknown record $_" }

Quote:
}

rand_questions();

exit;

show_full_quiz();

####################################

sub rand_questions {

    for (my $qnum = 0; $qnum < $#questions; $qnum++) {
        system('clear');
        $q = $questions[$qnum];
        question("Q".(1+$qnum), $q->{Question});
        my $anum = 'a';


            answer($anum++, $a->{Answer});
        }

        delay(5);

        print "\nANSWERS:\n\n";
        for ( my $i = 0; $i <= $#answers; $i++ ) {
            my $a = $answers[$i];
            explain( ( 'a' .. 'z' )[$i], $a->{Correct} . $a->{Why} );
            print "\n";
            delay(2 + length($a->{Why})/50);
        }
        delay(5);
    }

Quote:
}

sub delay {
    my $count = shift;
    sleep(1 + $DELAY * $count);

Quote:
}

sub rand_question {

    system('clear');
    question('Q', $q->{Question});
    my $anum = 'a';


        answer($anum++, $a->{Answer});
    }

    delay(3);

    print "\nANSWERS:\n\n";
    for ( my $i = 0; $i <= $#answers; $i++ ) {
        answer( ( 'a' .. 'z' )[$i],
                $answers[$i]{Correct} .
                $answers[$i]{Why} );
        print "\n";
        delay(1);
    }
    print "\n";
    delay(5);

Quote:
}

sub show_full_quiz {
    my $qnum = '01';

        fwrite ($qnum++, $q->{Question});
        my $anum = 'a';

            fwrite(' '.$anum++.':', $a->{Answer});
        }
        print "\n";
    }

Quote:
}

sub scramble {


    srand(time() ^ ($$ + ($$ << 15)));


Quote:
}

sub fwrite {

    sub question { STDOUT->format_name("Question"); &fwrite; }
    sub answer { STDOUT->format_name("Answer"); &fwrite; }
    sub explain { STDOUT->format_name("Explain"); &fwrite; }


$num, $text
  ~~  ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
      $text

.

format Answer =

$num, $text
  ~~        ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
      $text
.


$num, $text
~~    ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
      $text
.

    write;

Quote:
}

__DATA__

Question:   How do you produce a reference to a list?
Type:       References
Difficulty: 6/7 (Hard)


Correct:    No.


Correct:    No.
Why:        That makes a reference to a newly allocated anonymous
            array, and populates it with a copy of the contents


Correct:    No.
Why:        The backslash operator is distributive across a list, and

            Well.  In list context.  In scalar context, it's a strange
            way to get a reference to the function &c.

Answer:     You can't.
Correct:    Yes.
Why:        A list is not an array, although is many places one may be
            used for the other.  An array has an AV allocated, whereas a
            list is just some values on a stack somewhere.  You cannot
            alter the length of a list, for example, any more than
            you could alter a number by saying something like 23++.
            While an array contains a list, it is not a list itself.

========================================================================

Question:   What happens when you return a reference to a private variable?
Type:       References
Difficulty: 4/7 (Medium)

Answer:     You get a core dump later when you use it.
Correct:    No.
Why:        Perl is not C or C++.  

Answer:     The underlying object is silently copied.
Correct:    No.
Why:        Even though the reference returned is for all intents
            and purposes a copy of the original (Perl uses return
            by reference), the underlying referent has not changed.

Answer:     The Right Thing (tm).
Correct:    Yes.
Why:        Perl keeps track of your variables, whether dynamic or
            otherwise, and doesn't free things before you're done using
            them.

Answer:     The compiler doesn't let you.
Correct:    No.
Why:        Perl seldom stops you from doing what you want to do,
            and tries very hard to do what you mean to do.  This
            is one of those cases.

========================================================================

Question:   Why aren't Perl's patterns regular expressions?
Type:       Regular expressions
Difficulty: 3/7 (Medium)

Answer:     Because Perl patterns have backreferences.
Correct:    Yes.
Why:        A regular expression by definition must be
            able to determine the next state in the finite
            automaton without requiring any extra memory
            to keep around previous state.  A pattern /([ab]+)c\1/
            requires the state machine to remember old
            states, and thus disqualifies such patterns
            as being regular expressions in the classic sense
            of the term.

Answer:     Because Perl allows both minimal matching and maximal
            matching in the same pattern.
Correct:    No.
Why:        The mere presence of minimal and maximal repetitions
            does not disqualify a language from being "regular".

Answer:     Because Perl uses a non-deterministic finite automaton
            rather than a deterministic finite automaton.
Correct:    No.
Why:        Both NFAs and DFAs can be used to solve regular
            expressions.  Given an NFA, a DFA for it can be constructed,
            and vice versa.  For example, classical grep uses an NFA,
            while classical egrep a DFA.  Whether a pattern matches
            a particular string doesn't change, but where the match
            occurs may.  In any case, they're both regular.  However,
            an NFA can also be modified to handle backtracking, while
            a DFA cannot.

Answer:     Because Perl patterns can have look-aheads assertions
            and negations.
Correct:    No.
Why:        The `(?=foo)' and `(?!foo)' constructs no more violate
            whether the language is regular than do `^' and `$',
            which are also zero-width statements.  

========================================================================

Question:   What happens to objects lost in "unreachable" memory,
            such as the object returned by Ob->new() in
            `{ my $ap; $ap = [ Ob->new(), \$ap ]; }' ?
Type:       Objects
Difficulty: 4/7 (Medium)

Answer:     Their destructors are called when that interpreter thread
            shuts down.
Correct:    Yes.
Why:        When the interpreter exits, it first does an exhaustive
            search looking for anything that it allocated.  This allows
            Perl to be used in embedded and multithreaded applications
            safely, and furthermore guarantees correctness of object
            code.

Answer:     Their destructors are called when the memory becomes unreachable.
Correct:    No.
Why:        Under the current implementation, the reference-counted
            garbage collection system will not notice that the object
            in $ap's array cannot be reached, because the array reference
            itself never has its reference count go to zero.

Answer:     Their destructors are never called.
Correct:    No.
Why:        That would be very bad, because then you could have objects
            whose class-specific cleanup code didn't get called ever.

Answer:     Perl doesn't support destructors.
Correct:    No.
Why:        A class's DESTROY function, or that of its base classes,
            is called for any cleanup.  It is not expected to deallocate
            memory, however.

========================================================================

Question:   How do you give functions private variables that
            retain their values between calls?
Type:       Subroutines, Scoping
Difficulty: 5/7 (Medium)

Answer:     Perl doesn't support that.
Correct:    No.
Why:        It would be difficult to keep private state in a
            function otherwise.

Answer:     Include them as extra parameters in the prototype list,
            but don't pass anything in at that slot.
Correct:    No.
Why:        Perl is not the Korn shell, nor anything like that.
            If you tried this, your program probably wouldn't
            even compile.

Answer:     Use localized globals.
Correct:    No.
Why:        The local() operator merely saves the old value of a global
            variable, restoring that value when the block in which the
            local occurred exits.  Once the subroutine exits, the
            temporary value is lost.  Before then, other functions
            can access the temporary value of that global variable.

Answer:     Create a scope surrounding that sub that contains lexicals.
Correct:    Yes.
Why:        Only lexical variables are truly private, and they will
            persist even when their block exits if something still
            cares about them.  Thus:
                { my $i = 0; sub next_i { $i++ } sub last_i { --$i } }
            creates two functions that share a
...

read more »



Tue, 23 Jan 2001 03:00:00 GMT  
 Can't Match Multi-Line Pattern
 [courtesy cc of this posting sent to cited author via email]

In comp.lang.perl.misc,

:/s has only one effect -- changing how . is interpreted.  

You forgot $*.  Writing /^From:?/s is perfectly sensible.

       m   Treat string as multiple lines.  That is, change "^" and "$"
           from matching at only the very start or end of the string to the
           start or end of any line anywhere within the string,

       s   Treat string as single line.  That is, change "." to match
           any character whatsoever, even a newline, which it normally would
           not match.

       The /s and /m modifiers both override the $* setting.  That is,
       no matter what $* contains, /s (without /m) will force "^" to
       match only at the beginning of the string and "$" to match only
       at the end (or just before a newline at the end) of the string.
       Together, as /ms, they let the "." match any character whatsoever,
       while yet allowing "^" and "$" to match, respectively, just after
       and just before newlines within the string.

--tom
--
    : The ksh scripts do not have a problem with it.
    That's because ksh doesn't much mind opening up security holes.  The
    absence of taint checks is not exactly a feature.



Tue, 23 Jan 2001 03:00:00 GMT  
 Can't Match Multi-Line Pattern
: In comp.lang.perl.misc,

: :/s has only one effect -- changing how . is interpreted.  
:
: You forgot $*.  Writing /^From:?/s is perfectly sensible.

To quote another post of yours today:

  Sometimes clarity of explanation must overrule "oh by the ways".  
  I thought about this point, and decided to avoid its mention, as it
  didn't appear particularly practical for this poster's purposes.

I couldn't have put it better myself.  Since $* is deprecated, and
omitting its mention significantly reduces the complexity of explaining /m
and /s, I felt justified in choosing to ignore it.

---------------------------------------------------------------------

 --*--    Home Page: http://www.cinenet.net/users/cberry/home.html
   |      Member of The HTML Writers Guild: http://www.hwg.org/  
       "Every man and every woman is a star."



Tue, 23 Jan 2001 03:00:00 GMT  
 Can't Match Multi-Line Pattern

: :Freidl explains them in glorious detail and emphasizes the "/m affects
: :*only* ^$, /s affects *only* ." tack which I took in my previous message.
: :However, all the same semantics are there in perlre.
:
: But that's false: /s affects ^ and $ as well, insulating them from
: unpleasant surprises out of $* settings.

As per my other response on this thread, $* is deprecated, not mentioning
it makes /s and /m much easier to explain, and I took the "oh by the way"
exemption per your message on another thread.

[Fragment of Perl quiz snipped -- what was this doing here?]

---------------------------------------------------------------------

 --*--    Home Page: http://www.cinenet.net/users/cberry/home.html
   |      Member of The HTML Writers Guild: http://www.hwg.org/  
       "Every man and every woman is a star."



Tue, 23 Jan 2001 03:00:00 GMT  
 Can't Match Multi-Line Pattern
Thanks to Craig and the rest who assisted. Replacing /m with /s
worked.
    Perl is pretty darn nifty.

    Matt

Quote:

> /m has only one effect -- changing how ^ and $ are interpreted.  Without
> /m, they match only at the beginning and end of the entire string; with
> it, they can also match at internal newlines.  Your regex has no ^ or $,
> so /m is irrelevant.

> /s has only one effect -- changing how . is interpreted.  Without /s, .
> matches any character except newline.  With it, it matches any character
> *including* newline.  You want your .* above to span newlines, so /s is
> what you need.

> Hope this helps...

> ---------------------------------------------------------------------

>  --*--    Home Page: http://www.cinenet.net/users/cberry/home.html
>    |      Member of The HTML Writers Guild: http://www.hwg.org/
>        "Every man and every woman is a star."



Tue, 23 Jan 2001 03:00:00 GMT  
 
 [ 11 post ] 

 Relevant Pages 

1. Pattern bug matching whitespace in multi-line match?

2. regex to match a multi line pattern

3. Multi line pattern matches?

4. Multi-line pattern matching?

5. pattern matching in multi-line strings fails under perl4.034

6. regex to match a multi line pattern

7. Multi-line pattern matching

8. Multi line pattern match

9. Multiline pattern matching with command line invocation

10. Multi-line pattern matching in 5.001n

11. multiline, multi pattern match

12. bug in anchored, multiline pattern match

 

 
Powered by phpBB® Forum Software