Many of you probably already know this -- but like me, you might have
temporarily "misplaced" the information.
I had a generalized syslog processing tool that I hacked up for
a special case, removing a lot of code unnecessary for the purpose,
lifting tests out to a higher level so as to avoid expensive
operations such as 'split', and so on.
The special case tool then ran several times *slower* than the
original :(
I thought it was due to a difference in how I was handling I/O, but
testing showed that not to be the case, so I used -dSmallDprof .
The expensive line turned out to be a pattern match of the
form X.*Y.*Z where X was a plain string, and Y and Z had simple (?:a|b)
semantics.
I replaced this with two pattern matches in succession, matching
against first Y and then Z (and assumed X would always be present for my
purpose).
Even though the new code had two matches instead of one, the result
was twice as fast -- probably because perl didn't have to be bothered
with expensive back-up semantics on the .*'s.
So... the tip for today is that if you are doing a lot of matches,
processing large files, then multiple matches in succession might be
MUCH faster for you then using .*'s to join required subsequences.
======
I keep forgetting that the regex engine doesn't use a finite state
machine. I don't think I've ever written a perl script that
required the more general power. Does 5.8 have built-in finite-state
capabilities?
--
I don't know if there's destiny,
but there's a decision! -- Wim Wenders (WoD)