Using $`, $&, and &' without regex overhead 
Author Message
 Using $`, $&, and &' without regex overhead

    I have an idea for enabling use of the $` ($PREMATCH), $& ($MATCH), and
$' ($POSTMATCH) match variables without incurring the extra overhead on
every regex if they're only used infrequently.

The perlre manpage says:

    "Once perl sees that you need one of $&, $` or $' anywhere in the
program, it has to provide them on each and every pattern match. This can
slow your program down."

    For this reason, I usually avoid these match variables in my scripts.
However, perlre also says, "some algorithms really appreciate them."  I've
thought that it would be good to if there was a way to enable these
variables only if a user specifically "requests" that these variables be
filled, to avoid the overhead on every regex.  Here's an idea for
accomplishing this:

(1) Create a new pragma to enable automatic generation of the $`, $&, and $'
variables in every regex.  Perhaps it could be called "use regex_match",
although the name is not important.  I'll use this name in this post for
discussion purposes.

(2) If a user specifically enables the "use regex_match" pragma, the three
match variables would be enabled for every regex.

(3) If the user says "no regex_match", the match variables would only be
enabled if the user specifically "requests" them (more later).

(4) If neither "use regex_match" nor "no regex_match" are present, the
decision on whether to enable the match variables would follow the rule
currently in Perl (i.e., if any of these match variables appear in the
script, the match variables must be generated for each and every regex).

(5) If the user say "no regex_match", the match variables could be
specifically enabled for a particular regex by use of a /x option on the m//
or s/// operators (I'll call the option /r, just to make up a name).  Thus,
the user could write:

    no regex_match;
    "Jack and Jill" = m/(and)/r;
    print "pre-match = $`\ "; # prints Jack
    print "match = $&\n"; # prints and
    print "match = $'\n"; # prints Jill

yet subsequent regexes would not be subject to the overhead of generating
$`, $&, and $'.

Perhaps there are reasons why this approach would not be workable (or
useful), but I thought I'd toss this out for discussion.  Any comments?

Brett



Tue, 24 Jul 2001 03:00:00 GMT  
 Using $`, $&, and &' without regex overhead

Quote:

>    I have an idea for enabling use of the $` ($PREMATCH), $&
>    ($MATCH), and $' ($POSTMATCH) match variables without incurring
>    the extra overhead on every regex if they're only used
>    infrequently.
>    [SNIP]

Excellent suggestion, but more complex than it needs to be, I think.

How about this instead:

        Retain the current behaviour (i.e. set $`, $&, and $'
        everywhere if they're used anywhere) as the default.

        However, if any regexes in the source use the new modifier
        (and personally I favour /v, for "variables"), then only
        those regexes which actually use the modifier will set
        $`, $&, and $'.

In other words, if the compiler sees a $`, $&, or $', then it looks to
see if any regexes are marked as the source of these values (i.e. if
any have a /v modifier). If so, only those regexes perform the extra
work of setting up those values. If no regex is explicitly identified
as the culprit, then all regexes are suspect and the extra work must be
done everywhere.

Damian



Tue, 24 Jul 2001 03:00:00 GMT  
 Using $`, $&, and &' without regex overhead

Quote:

>         Retain the current behaviour (i.e. set $`, $&, and $'
>         everywhere if they're used anywhere) as the default.

Great, this would mean no change whatsoever to existing scripts

Quote:

>         However, if any regexes in the source use the new modifier
>         (and personally I favour /v, for "variables"), then only
>         those regexes which actually use the modifier will set
>         $`, $&, and $'.

Here's some work to be done by the parser and/or optimizer

So ..., one more in favour of above suggestion: (me)



Tue, 24 Jul 2001 03:00:00 GMT  
 Using $`, $&, and &' without regex overhead

Quote:

>    I have an idea for enabling use of the $` ($PREMATCH), $& ($MATCH), and
>$' ($POSTMATCH) match variables without incurring the extra overhead on
>every regex if they're only used infrequently.

I've had pretty much the same ideas lately. In short, it would involve:

 - a new pragmatic module, which would explicitely enable/disable
assignment to those variables
 - default behaviour (pragma not mentioned) same as before

It looks like the latter automatically sets a bit in the compiler. The
pragma could explicitely set/reset that bit, but also set another (until
now, unused) bit to indicate that the module is mentioned, and that the
default bit is overridden.

The pragma would thus serve as a manual override of the automatic
behaviour.

This could work, except:

 a) What if a module uses this pragma, which would activate the
override, and the main program doesn't mention it, so that would need
automatic activation?
 b) It doesn't work.

This is disturbing. By doing some experiments, I found that the
import/unimport  behaviour is NOT as mentioned in the docs in the
Integer module. A bug? It looks like it. A serious one.

Here's what I came up with. Create a module file "Mark.pm", with this
code:

# file mark.pm
package Mark;

$used++;

sub import {
        $imported = "yes";

Quote:
}

sub unimport {
        $imported = "no";

Quote:
}

1;
__END__

This is a test script. I saved it as "mark.t", but the name isn't
relevant. The printed results follow after "__END__".

#! perl -w
# testfile mark.t

no Mark;

BEGIN {
    print "Pre: Mark used: $Mark::used, imported: $Mark::imported\n";

Quote:
}

sub used {
    use Mark;
    BEGIN {
      print "used: Mark used: $Mark::used, imported: $Mark::imported\n";
    }
    1;

Quote:
}

sub default {
  BEGIN {
   print "default: Mark used: $Mark::used, imported: $Mark::imported\n";
  }
  0;

Quote:
}

BEGIN {
    print "Post: Mark used: $Mark::used, imported: $Mark::imported\n";

Quote:
}

__END__
Pre: Mark used: 1, imported: no
used: Mark used: 1, imported: yes
default: Mark used: 1, imported: yes
Post: Mark used: 1, imported: yes

Integer.pm uses exactly the same mechanism, as does strict.pm. So the
results for those modules must be identical as those here. From the docs
from Integer.pm:

Quote:
> This tells the compiler that it's okay to use integer operations
> from here to the end of the enclosing BLOCK.

So, the last 2 lines SHOULD have said: "imported: no", because that was
the situation in the outer scope. Or am I missing something?

Oh, btw, this is the DJGPP port v. 5.004_02.

        Bart.



Tue, 24 Jul 2001 03:00:00 GMT  
 Using $`, $&, and &' without regex overhead

Quote:
>This is disturbing. By doing some experiments, I found that the
>import/unimport  behaviour is NOT as mentioned in the docs in the
>Integer module. A bug? It looks like it. A serious one.

Step down from red to yellow alert. It *does* work for strict.pm and
integer.pm, but in a way that is not too obvious.

I've played a little "what if..." game: what if $^H (compiler
preferences stored in bits of this variable) was saved on entering the
block containing "use", and restored on exit? A lot like "local", but at
compile time. I tested it, and *bingo*. It seems to be working. See the
code below.

The intuitive thing, for me, would have been that if a "use" is done in
a block, that an unimport() for this module would happen at the end of
the block. That doesn't happen. So even if you put "use" in a block,
most results will remain after the block scope is left. Only any effects
on $^H are undone. Dirty.

Not all future/experimental pragma's can depend on $^H alone: we'd soon
run out of bits, and everything would need official approval from the
Perl top. You can't use the same bit for more than one application.

#file Mark.pm
package Mark;

$used++;

sub import {
        $^H |= 1;  # just like integer.pm
        $imported = "yes";

Quote:
}

sub unimport {
        $^H &= ~1;  # like integer.pm
        $imported = "no";

Quote:
}

1;
__END__

#! perl -w
# testfile mark.t

no Mark;
print "Runtime: imported: $Mark::imported; Bitflags: $^H\n";

BEGIN {
   print "Pre: used: $Mark::used, imported: $Mark::imported; Bitflags:
$^H\n";

Quote:
}

sub used {
   use Mark;
   BEGIN {
      print "used: imported: $Mark::imported; Bitflags: $^H\n";
   }
   1;

Quote:
}

BEGIN {
   print "Post: imported: $Mark::imported; Bitflags: $^H\n";
Quote:
}

__END__
Pre: used: 1, imported: no; Bitflags: 0
used: imported: yes; Bitflags: 1
Post: imported: yes; Bitflags: 0
Runtime: imported: yes; Bitflags: 0

Anyway, if $^H would be used to control use of the match variables, this
problem would not occur. BTW how many bits are still free?

And the other problem I mentioned, i.e. if this module is used from
within a module, but not in the main script, well, that can be
circumvented by putting "use" or "no" for this module in an enclosing
block. That way, $^H would be restored at the time the main program gets
compiled. And that's nice.

        Bart.



Tue, 24 Jul 2001 03:00:00 GMT  
 Using $`, $&, and &' without regex overhead

Quote:

> [...pondering of a new pragma...]
> This is disturbing. By doing some experiments, I found that the
> import/unimport  behaviour is NOT as mentioned in the docs in the
> Integer module. A bug? It looks like it. A serious one.

A good percentage of the bugs in Perl aren't.  ;-)

You appear to be misunderstanding the documentation for the integer
pragma.

Quote:
> Here's what I came up with. Create a module file "Mark.pm", with this
> code:

> # file mark.pm
> package Mark;

> $used++;

> sub import {
>   $imported = "yes";
> }

> sub unimport {
>   $imported = "no";
> }

> 1;
> __END__
> Pre: Mark used: 1, imported: no
> used: Mark used: 1, imported: yes
> default: Mark used: 1, imported: yes
> Post: Mark used: 1, imported: yes
> Integer.pm uses exactly the same mechanism, as does strict.pm.

No, they don't.  The import and unimport methods in integer.pm and
strict.pm (all lowercase, because they're pragmas) modify the special
variable $^H.  Your module modifies a generic variable.

The description of $^H in perlvar (5.005_02) offers a small hint:

  $^H

    The current set of syntax checks enabled by use strict and other
    block scoped compiler hints.  See the documentation of strict for
    more details.

Note, "block scoped compiler hints".  Presumably, modifications to $^H
are localized to the block in which they occur, whereas your
modifications to $Mark::imported are global.

Quote:
> So the
> results for those modules must be identical as those here. From the docs
> from Integer.pm:

> > This tells the compiler that it's okay to use integer operations
> > from here to the end of the enclosing BLOCK.

> So, the last 2 lines SHOULD have said: "imported: no", because that was
> the situation in the outer scope. Or am I missing something?

That quote does not describe *how* the compiler is told that it's okay
to use integer operations to the end of the enclosing block.  It so
happens that this is done by modifying a localized special variable.
Modifying a non-localized generic variable will not work, as you
discovered.

So, back to the thread topic...  The suggested new pragma *will* work.
All it needs is the use of a bit or two in $^H.

--


    /                                  http://www.ziplink.net/~rjk/
        "It's funny 'cause it's true ... and vice versa."



Tue, 24 Jul 2001 03:00:00 GMT  
 Using $`, $&, and &' without regex overhead
[A complimentary Cc of this posting was sent to Bart Lateur


Quote:
> Not all future/experimental pragma's can depend on $^H alone: we'd soon
> run out of bits, and everything would need official approval from the
> Perl top. You can't use the same bit for more than one application.

You do not need to.  You use %^H instead (it does not propagate to
eval()s yet, but probably it will soon).

Ilya



Wed, 25 Jul 2001 03:00:00 GMT  
 Using $`, $&, and &' without regex overhead

Quote:
>  b) It doesn't work.

> This is disturbing. By doing some experiments, I found that the
> import/unimport  behaviour is NOT as mentioned in the docs in the
> Integer module.

No, it's because your imports and unimports are all happening at
compile time, all at once.  But they don't affect the way the code is
actually compiled, and by the time it runs, it's all over.  That makes
`unimport' useless for regular users.

I've complained about this in the past.   Back in November I posted an
article about `lexically scoped use' with a suggestion on how
user-defined pragmas might be enabled, although I think that the
proposal there wouldn't actually be applicable to the thing we're
discussing now---this needs to be more magical than what I was
proposing.

If you want to see the article again, it is at
        http://www.plover.com/~mjd/perl/lexuse.txt



Wed, 25 Jul 2001 03:00:00 GMT  
 Using $`, $&, and &' without regex overhead
:

Quote:
> Um, I believe that that would be absolutely terrible.

No, great

Quote:
> Let's say that part of my script says:

> $foo =~ /(and)/;
> $bar = $`;

> I then upgrade to your version.  It works.  However, I add to a
> completely different part of the script:

> $bang =~ /something/v;

The moment you start using m//v, you should be aware of this, you do this
on purpose don't you? Then you could scan the rest of your script for
$[`&'] and add /v where needed.

Needless to say that the parser/optimizer should turn on/off this
option for every file, cause require some-other-file-that-is-not-yet-new
can mess up the original intent :)

Quote:
> My use of $` in the first fragment instantly becomes incorrect.
> Changing one part of the program shouldn't affect the other.

Yes it should, as long as it is in the same file (see above)


Sun, 05 Aug 2001 03:00:00 GMT  
 
 [ 9 post ] 

 Relevant Pages 

1. $&, $', and $` and parens....

2. Calling subs using '&' form

3. Perl && SAP's RFC

4. differance between &sub and &'sub

5. perl parsing bug - if &foo and scalar(&bar) doesn't work correctly

6. s///ge & $`

7. does ms's nmake understand &&

8. Memory leak using `...` and a workaround.

9. FAQ 6.19 Why does using $&, $`, or $' slow my program down?

10. FAQ 6.19 Why does using $&, $`, or $' slow my program down?

11. FAQ 6.19 Why does using $&, $`, or $' slow my program down?

 

 
Powered by phpBB® Forum Software