Efficiency of some regexp patterns 
Author Message
 Efficiency of some regexp patterns

I don't know how class negation and the "?" modififier work behind the
scenes. Is it quicker to match against this kind of pattern

    <foo[^>]+>

or against this version of almost the same thing

    <foo.+?>

? Why?

Why negation of classes in Unicode is supposed to be slow in general?
Isn't [^c] tested somehow as $current_char ne "c"?  Why does a broader
character set matter?

-- fxn



Sat, 05 Jun 2004 04:58:20 GMT  
 Efficiency of some regexp patterns


Quote:
>I don't know how class negation and the "?" modififier work behind the
>scenes. Is it quicker to match against this kind of pattern

>    <foo[^>]+>

>or against this version of almost the same thing

>    <foo.+?>

I have problems answering this question. Even by looking at it
carefully and reading the Camel Book page 63, it's hard to decide.
There's a comparison made for every characters until it matches a > in
both patterns, but the difference between minimal and greedy matching
makes it tough to decide. Will the engine try many matches in the
string with the first pattern? That would make the difference and I
would tend to say that the second pattern is faster, at least for
longer strings with many tags. Any Perl expert here to give us an
insight about greedy matching?

Quote:
>Why negation of classes in Unicode is supposed to be slow in general?
>Isn't [^c] tested somehow as $current_char ne "c"?  Why does a broader
>character set matter?

[^c] is the original character set minus "c". If our original set is
Unicode, our new set is big: the cardinality of the Unicode set - 1.


wrong here, but I think that the test will look more like:


        if ($current_char eq $char) { ... }

Quote:
}

Is the pattern matching algorithm optimized to look at the new set and
decide if it would be faster to use eq with the new set or ne with the
negation of the new set? I don't know. But I'm sure that if your
character set is composed of half the characters of the Unicode set,
then it's going to be slower.

Sbastien Nadeau
Technicien en informatique
Bibliothque de l'Universit Laval



Mon, 07 Jun 2004 05:54:43 GMT  
 
 [ 2 post ] 

 Relevant Pages 

1. Oracle + Delphi5: Transactions and Locking

2. Variant does not reference an automation object

3. ReadOnly

4. Need help

5. Efficiency of some regexp patterns

6. Efficiency of some regexp patterns

7. Efficiency of some regexp patterns

8. Pattern Matching Efficiency

9. Q: beginner,patterns,efficiency

10. adding a new field to existing table

11. Text Editor > 64K

12. QR2 Calculated field

 

 
Powered by phpBB® Forum Software