Efficiency of some regexp patterns 
Author Message
 Efficiency of some regexp patterns



Quote:
>I don't know how class negation and the "?" modififier work behind
>the scenes. Is it quicker to match against this kind of pattern

>    <foo[^>]+>

>or against this version of almost the same thing

>    <foo.+?>

It depends a lot on what version of Perl you are using.  

In most versions, the first one is faster, because it backtracks only
once; the second regex backtracks at each character position until the
..+? finally matches enough characters to reach the ">".

In 5.6.1, I think .*? has an optimization so that it doesn't do this;
so it may be faster.

Careful benchmarking is the only really good answer.

Quote:
>Isn't [^c] tested somehow as $current_char ne "c"?  

It is.

Quote:
>Why does a broader character set matter?

I don't see how [^xyz] could be substantially slower than [xyz].
"[^c]" might be slower than plain "c", because it would invoke the
character class mechanism, whereas "c" can use special case logic that
looks ahead a fixed number of bytes.  In principle, "[c]" (a
single-character class) could be just as fast as "c" but I don't think
anyone has ever bothered to put in that optimization.

Quote:
>PS: These questions were posted to comp.lang.perl.moderated some days
>    ago, but there was not too feedback there.

rd
($p{$_})&6];$p{$_}=/ ^$P/ix?$P:close$_}keys%p}p;p;p;p;p;map{$p{$_}=~/^[P.]/&&
close$_}%p;wait until$?;map{/^r/&&<$_>}%p;$_=$d[$q];sleep rand(2)if/\S/;print


Mon, 14 Jun 2004 09:44:31 GMT  
 Efficiency of some regexp patterns

Quote:
> In principle, "[c]" (a single-character class) could be
> just as fast as "c" but I don't think anyone has ever
> bothered to put in that optimization.

I occasionally use "[ ]" to emphasize that I'm definately
looking for a space, but not any other whitespace
characters.  I know it's not necessary to put in the
brackets, but whitspace is so often the "extra" that I like
to hilight it when it's the "focus".

And in my papplications, it's not a big deal to me that I've
(possibly) wasted some cycles to do so.

--
Michael R. Wolf
    All mammals learn by playing!



Tue, 15 Jun 2004 00:03:13 GMT  
 Efficiency of some regexp patterns

Quote:

> > In principle, "[c]" (a single-character class) could be
> > just as fast as "c" but I don't think anyone has ever
> > bothered to put in that optimization.

> I occasionally use "[ ]" to emphasize that I'm definately
> looking for a space, but not any other whitespace
> characters.  I know it's not necessary to put in the
> brackets, but whitspace is so often the "extra" that I like
> to hilight it when it's the "focus".

Look into the /x modifier.  Among other things, it lets the parser
ignore unquoted white space.  Turned the other way, you must quote
white space to make it count, so it will stand out.  A one-character
character class [x] just looks wrong.

Anno



Tue, 15 Jun 2004 20:03:44 GMT  
 Efficiency of some regexp patterns

Quote:

> Look into the /x modifier.  Among other things, it lets
> the parser ignore unquoted white space.  Turned the other
> way, you must quote white space to make it count, so it
> will stand out.  A one-character character class [x] just
> looks wrong.

And "\ " looks better to you?  Either way, it's a way to
draw attention to the space.  It feels like a preference,
not a right/wrong thing.  What's "wrong" about a class of
one?  Why is a class with 2, 3, 4, or more members OK, but 1
is not?

--
Michael R. Wolf
    All mammals learn by playing!



Thu, 17 Jun 2004 11:07:43 GMT  
 Efficiency of some regexp patterns

Quote:
> not a right/wrong thing.  What's "wrong" about a class of
> one?  Why is a class with 2, 3, 4, or more members OK, but 1
> is not?

Because a class is a means to hedge your bets.  Why hedge when you know
exactly what you're looking for?  Make the space stand out as '\040' to
enhance readability.

jimbo
;-)



Sun, 18 Jul 2004 23:49:54 GMT  
 Efficiency of some regexp patterns

Quote:


> > not a right/wrong thing.  What's "wrong" about a class of
> > one?  Why is a class with 2, 3, 4, or more members OK, but 1
> > is not?

> Because a class is a means to hedge your bets.  Why hedge when you know
> exactly what you're looking for?  Make the space stand out as '\040' to
> enhance readability.

I see your point when you frame it as a hedge.  But putting
brackets around it was my attempt to make it stand out.  I
don't always remeber that octal 40 is a space, and I know
that some of my readers don't even know that on good days.
Of course, there's no real way to assure folks that the
contents of the brackets is a space and not a lucky tab.
Your suggestion is as bad as mine when we stack up
undesirable traits :-) And I admit that I liked your
suggestion when I read it.  But remember I wrote mine a long
time ago, and I just read yours today.  Style and preference
-- they're a phase-of-the moon thing.

Back to the original "efficiency" issue -- I think that a
single character match is constant time with either a
self-match or a character class.

--
Michael R. Wolf
    All mammals learn by playing!



Wed, 21 Jul 2004 15:04:43 GMT  
 Efficiency of some regexp patterns

Quote:
> time ago, and I just read yours today.  Style and preference
> -- they're a phase-of-the moon thing.

Yeah, I sorta agree.  Except when I don't. :)

Quote:
> Back to the original "efficiency" issue -- I think that a
> single character match is constant time with either a
> self-match or a character class.

I'm sure we're talking trivial here.  There must be some extra overhead
involved parsing the character class as opposed to a single character, but,
truth be told, for the volumes of data I process per execution, I don't
really care.  You probably feel the same way.

jimbo
;-)



Thu, 22 Jul 2004 00:12:53 GMT  
 Efficiency of some regexp patterns

[...]

Quote:
> I'm sure we're talking trivial here.  There must be some
> extra overhead involved parsing the character class as
> opposed to a single character, but, truth be told, for the
> volumes of data I process per execution, I don't really
> care.  You probably feel the same way.

Yup!  

I always try to optomize the inconsistant, slow, lazy,
expensive, error-prone part of the programming cycle.  And
that of course would be ME.

I'm not terribly interested in optomizing the meticulous,
fast, diligent, cheap, accurate part of the programming
cycle.  That of course would be the computer.

My billing rate always dominates the run-time computer cost
for the projects I do.

--
Michael R. Wolf
    All mammals learn by playing!



Fri, 30 Jul 2004 12:55:23 GMT  
 
 [ 8 post ] 

 Relevant Pages 

1. Efficiency of some regexp patterns

2. Efficiency of some regexp patterns

3. Efficiency of some regexp patterns

4. Oracle + Delphi5: Transactions and Locking

5. Variant does not reference an automation object

6. ReadOnly

7. Need help

8. adding a new field to existing table

9. Pattern Matching Efficiency

10. Q: beginner,patterns,efficiency

11. regexp pattern matching

12. Matched pattern in regexp?

 

 
Powered by phpBB® Forum Software