bug in anchored, multiline pattern match 
Author Message
 bug in anchored, multiline pattern match

I think this bug has already been exposed, but what the hey.  When
processing an anchored, multiline pattern match, perl gets a little
confused if there's a blank line in the search string.  The following
fragment finds a bug instead of a match:

$* = 1;
$block = <<\EOF;
abcdefg

blank line kills the match!
EOF
print $block =~ /^b/ . "\n";

The loop that scans for newlines gets a little e{*filter*}d and increments
the pointer twice when it hits a newline.  Satisfyingly easy to fix:

*** regexec.c   Sun Jun 10 03:15:37 1990
--- regexec.c   Sun Jun 10 03:20:53 1990
***************
*** 208,216 ****
                        /* for multiline we only have to try after newlines */
                        if (s > string)
                            s--;
!                       for (; s < strend; s++) {
!                           if (*s == '\n') {
!                               if (++s < strend && regtry(prog, s))
                                    goto got_it;
                            }
                        }
--- 208,216 ----
                        /* for multiline we only have to try after newlines */
                        if (s > string)
                            s--;
!                       while (s < strend) {
!                           if (*s++ == '\n') {
!                               if (s < strend && regtry(prog, s))
                                    goto got_it;
                            }
                        }



Thu, 26 Nov 1992 18:47:56 GMT  
 bug in anchored, multiline pattern match
: I think this bug has already been exposed, but what the hey.  When
: processing an anchored, multiline pattern match, perl gets a little
: confused if there's a blank line in the search string.

Actually, this is the first I heard of this one, that I can recall.  Thanks.
It'll be fixed in the next patch.

While we're on the subject of the next patch, I've got things arranged now
that you can link in C libraries to make special versions of Perl.  You do
this by writing a glue routine that translates Perl subroutine calls
to C subroutine calls.  You can easily link in multiple independent
glue routines.  On some machines it might be possible for one of the glue
routines to be a fasl'er, though I haven't written one yet.  I do have
a glue file for BSD curses already.  X windows, anyone?

It's easy to write a glue routine because it's preprocessed from definitions
like:

    CASE int flushok
    I       WINDOW*         win
    I       bool            boolf
    END

In fact, I've even got a script that scans manual pages and spits out
definitions like that above based on the synopsis, presuming it is
reasonably well formed.

Included below is a VERY rudimentary pager using the curses version of
perl.  The help text fibs--it's just ripped out of less to get more than
one page of help text, since I wanted to test recursive calls of the
pager.  Of note is that the pager can call itself recursively just by saying

    local(*lines) = *helplines;
    local(line);
    &pagearray;

Note also the use of an eval to trap any fatal errors and call &endwin
to clean up.

What you are seeing here is the beginnings of a prototype for a new version
of rn.

Larry

#!./curseperl

eval <<'EndOfMain';   $evaloffset = 3;    # line number of this line

    $| = 1;             # command buffering on stdout
    &initterm;
    &inithelp;
    &slurpfile && &pagearray;

EndOfMain

&endwin;


    print "";         # force flush of stdout


Quote:
}

exit;

################################################################################

sub initterm {

    &initscr; &cbreak; &noecho; &scrollok($stdscr, 1);
    &defbell unless defined &bell;

    $lines = $LINES; $lines1 = $lines - 1; $lines2 = $lines - 2;
    $cols = $COLS;   $cols1  = $cols  - 1; $cols2  = $cols  - 2;;

    $dl = &getcap('dl');
    $al = &getcap('al');
    $ho = &getcap('ho');
    $ce = &getcap('ce');

Quote:
}

sub slurpfile {
    while (<>) {
        s/^(\t+)/'        ' x length($1)/e;
        &expand($_) if /\t/;
        if (length($_) < $cols) {

        }
        else {
            while ($_ && $_ ne "\n") {

                substr($_,0,$cols) = '';
            }
        }
    }
    1;

Quote:
}

sub drawscreen {
    &move(0,0);
    for ($line .. $line + $lines2) {
        &addstr($lines[$_]);
    }
    &clrtobot;
    &percent;
    &refresh;

Quote:
}

sub expand {
    while (($off = index($_[0],"\t")) >= 0) {
        substr($_[0], $off, 1) = ' ' x (8 - $off % 8);
    }

Quote:
}

sub pagearray {
    $line = 0;

    $| = 1;

    for (&drawscreen;;&drawscreen) {

        $ch = &getch;
        $ch = "j" if $ch eq "\n";

        if ($ch eq ' ') {
            last if $percent >= 100;
            &move(0,0);
            $line += $lines1;
        }
        elsif ($ch eq 'b') {
            $line -= $lines1;
            &move(0,0);
            $line = 0 if $line < 0;
        }
        elsif ($ch eq "j") {
            $line += 1;
            if ($dl) {
                print $ho, $dl;
                &mvcur(0,0,$lines2,0);
                print $ce,$lines[$line+$lines2],$ce;
                &wmove($curscr,0,0);
                &wdeleteln($curscr);
                &wmove($curscr,$lines2,0);
                &waddstr($curscr,$lines[$line+$lines2]);
            }
            &wmove($stdscr,0,0);
            &wdeleteln($stdscr);
            &wmove($stdscr,$lines2,0);
            &waddstr($stdscr,$lines[$line+$lines2]);
            &percent;
            &refresh;
            redo;
        }
        elsif ($ch eq "k") {
            next if $line <= 0;
            $line -= 1;
            if ($al) {
                print $ho, $al, $ce, $lines[$line];
                &wmove($curscr,0,0);
                &winsertln($curscr);
                &waddstr($curscr,$lines[$line]);
            }
            &wmove($stdscr,0,0);
            &winsertln($stdscr);
            &waddstr($stdscr,$lines[$line]);
            &percent;
            &refresh;
            redo;
        }
        elsif ($ch eq "\f") {
            &clear;
        }
        elsif ($ch eq "q") {
            last;
        }
        elsif ($ch eq "h") {
            &clear;
            &help;
            &clear;
        }
        else {
            &bell;
        }
    }

Quote:
}

sub defbell {
    eval q#
        sub bell {
            print "\007";
        }
    #;

Quote:
}

sub help {
    local(*lines) = *helplines;
    local($line);
    &pagearray;

Quote:
}

# This help message is borrowed from the "less" program.

sub inithelp {

      Commands marked with * may be preceeded by a number, N.

  h              Display this help.
  q              Exit.

  f, SPACE    *  Forward  N lines, default one screen.
  b           *  Backward N lines, default one screen.
  e, j, CR    *  Forward  N lines, default 1 line.
  y, k        *  Backward N lines, default 1 line.
  d           *  Forward  N lines, default 10 or last N to d or u command.
  u           *  Backward N lines, default 10 or last N to d or u command.
  r              Repaint screen.
  R              Repaint screen, discarding buffered input.

  /pattern    *  Search forward for N-th line containing the pattern.
  ?pattern    *  Search backward for N-th line containing the pattern.
  n           *  Repeat previous search (for N-th occurence).

  g           *  Go to line N, default 1.
  G           *  Like g, but default is last line in file.
  p, %        *  Position to N percent into the file.
  m<letter>      Mark the current position with <letter>.
  '<letter>      Return to a previously marked position.
  ''             Return to previous position.

  E [file]       Examine a new file.
  N           *  Examine the next file (from the command line).
  P           *  Examine the previous file (from the command line).
  =              Print current file name.
  V              Print version number of "less".

  -<flag>        Toggle a command line flag.
  +cmd           Execute the less cmd each time a new file is examined.

  !command       Passes the command to a shell to be executed.
  v              Edit the current file with $EDITOR.
EOT

        s/$/\n/;
    }

Quote:
}

sub percent {
    &standout;

      &move($lines1,0);
      &addstr("($percent%)");
    &standend;
    &clrtoeol;
Quote:
}



Sat, 28 Nov 1992 02:55:35 GMT  
 bug in anchored, multiline pattern match

Quote:

>Included below is a VERY rudimentary pager using the curses version of
>perl.                                                ^^^^^^^^^^^^^^

Is this using Henk P. Penning's cterm.pl stuff?  Or stuff we haven't seen yet?

Thankx.


Coastal Imaging Lab
OSU--Oceanography
Corvallis, OR  97331         503-737-3251



Sun, 29 Nov 1992 18:43:02 GMT  
 bug in anchored, multiline pattern match

: >
: >Included below is a VERY rudimentary pager using the curses version of
: >perl.                                                ^^^^^^^^^^^^^^
: >
:
: Is this using Henk P. Penning's cterm.pl stuff?  Or stuff we haven't seen yet?

Stuff you haven't seen yet.  I don't intend to provide curses in normal perl,
but after the next patch you'll be able to build a version of perl that
supports whatever your favorite libraries are.  It basically is a way for
you to make C subroutines and variables look like Perl subroutines and
variables.

Larry



Mon, 30 Nov 1992 05:35:35 GMT  
 bug in anchored, multiline pattern match


Quote:
>but after the next patch you'll be able to build a version of perl that
>supports whatever your favorite libraries are.  It basically is a way for
>you to make C subroutines and variables look like Perl subroutines and
>variables.

Can't you do some sort of dynamic loading so we don't get too many perl
executables?  What about stripped down perls and a perl compiler which
generates smaller stand-alone scripts?  Where is perl going, anyway?

-Mitch

"First there was GNU emacs; it had stuff like mail, news, ftp, csh, and
lisp built in, and filled 32 Gigabytes of Cray-2 memory per user.  Then
the GNU project expanded and gave us bison, ghostscript, bash, gcc,
gas, g++, gawk, pax, and a slew of other great stuff which forced us to
buy a few more Gigabytes of memory for the expanding cray-2 cluster.
GNU is rumored to come out with an operating system, but I suspect it
will require at least 4 Terabytes to boot the kernel...

Uzi (unix for z80) runs in 32K; v6 ran in 64K.  v7 kernels are under
80K for 68020s.



Mon, 30 Nov 1992 20:30:06 GMT  
 bug in anchored, multiline pattern match

Quote:

>It's easy to write a glue routine because it's preprocessed from definitions
>like:

>    CASE int flushok
>    I       WINDOW*         win
>    I       bool            boolf
>    END

>In fact, I've even got a script that scans manual pages and spits out
>definitions like that above based on the synopsis, presuming it is
>reasonably well formed.

This sounds wonderful.  We hacked our own interface for use with our
DBMS, but it will be nice not to have to worry whether the next patch
will break it.

From the synopsis above, though, it looks like the C functions won't be
able to return arrays.  We acheive this by returning values via a
function that adds the values one at a time to the return stack, and
can return numbers or strings depending on say the type of the database
column passed as an argument.  The C functions also know whether they
are in an array context or not.  Of course, these are custom functions,
which call the underlying DBMS routines.  If no support is given for
arrays, we can rewrite the functions in perl accessing the glued DBMS
routines.

Even without array returns it will be a boon.  Once we glue in our
screen-handling stuff, we can ditch C-programming altogether :-).
--

FGH Decision Support Systems Pty Ltd    | ..!uunet!fgh.fgh.oz.au!andrew



Mon, 30 Nov 1992 18:38:44 GMT  
 bug in anchored, multiline pattern match
: From the synopsis above, though, it looks like the C functions won't be
: able to return arrays.  We acheive this by returning values via a
: function that adds the values one at a time to the return stack, and
: can return numbers or strings depending on say the type of the database
: column passed as an argument.  The C functions also know whether they
: are in an array context or not.  Of course, these are custom functions,
: which call the underlying DBMS routines.  If no support is given for
: arrays, we can rewrite the functions in perl accessing the glued DBMS
: routines.

You can return arrays, but you just have to write that particular case of
the glue routine by hand.  I've already done one--it's not too difficult,
modulo the usual difficulties of making sure the stack is long enough.
The wantarray variable is also available.

Here's what a handwritten case to return an array value looks like:

    case US_dts_errlist:
        if (!wantarray) {
            str_numset(st[0], (double) dts_nerr);
            return sp;
        }
        astore(stack, sp + dts_nerr, Nullstr);          /* extend stack */
        st = stack->ary_array + sp;                  /* possibly realloced */
        for (i = 0; i < dts_nerr; i++) {
            tmps = dts_errlist[i];
            st[i] = str_2static(str_make(tmps,strlen(tmps)));
        }
        return sp + dts_nerr - 1;

Not too hard, as you see.

Larry



Tue, 01 Dec 1992 13:57:45 GMT  
 bug in anchored, multiline pattern match

Quote:
>Can't you do some sort of dynamic loading so we don't get too many perl
>executables?

If you want to put in a dynamic loader, Larry is providing a way for you
to do it.

Oh, you want HIM to do it?

TAANSTAFL, dude.
--



Tue, 01 Dec 1992 20:47:14 GMT  
 
 [ 11 post ] 

 Relevant Pages 

1. question regarding perl pattern match anchor

2. multiline, multi pattern match

3. pattern matching and multilines (sort of)

4. Multiline pattern matching with command line invocation

5. Pattern bug matching whitespace in multi-line match?

6. BUG in string pattern match

7. BUG: regexp pattern matching

8. Can I change the word boundary anchoring pattern?

9. HTML AnChoring Text Pattern

10. RE switch for fully-anchored matches?

11. Anchor matches at start/end of string?

12. Problem w/anchored match

 

 
Powered by phpBB® Forum Software