
Substituting many sparsely distrbuted strings in many files
Hi all,
I don't believe this is a FAQ. Apologies if it is!
I've written a program (based on the "Substituting Strings in Many
Files" example in the O'Reilly Nutshell Reference) that replaces many
(100s) of strings in many (10s-100s) of large (100-10000 line) files.
But it runs like a proverbial dead-dog. :-(
It the slowdown occurs because I execute a m// or s/// for every
single string on every single line. Can anyone suggest a faster search
and replace algorithm?
The fragment below shows my current algorithm. %map is an associative
array with match/replace pairs.
Thanks in advance for any suggestions,
Ron Verstappen
Teknekron Software Systems, Inc.
[%map-initalization stuff deleted]
while (<>) {
if ($ARGV ne $oldargv) {
rename($ARGV, $ARGV . '.bak');
open(ARGVOUT, ">$ARGV");
select(ARGVOUT);
$oldargv = $ARGV;
}
if (! /\w/) {
# if this line has no word characters in it at all, then
# just print it out and continue
print;
}
else {
while (($from,$to) = each %map) {
if (! /$from/) {
# this test is faster than the s/// below and is usually true
# so helps speed up the program by about 50%
print;
}
else {
if ($from =~ /\s/) {
# if the $from string contains a whitespace character then
# this string must be enclosed in double-quotes (or
# escaped double-quotes).
s/(\\\"|\")$from\1/\1$to\1/g;
} else {
# otherwise the $from string may be without double-quotes,
# but if it opens with a double-quote then it must also
# close with one.
s/(\\\"|\"|[^\"]\b)$from\1/\1$to\1/g ||
s/^$from\b/$to/ ||
s/\b$from$/$to/;777789
}
}
}
}
Quote:
}
continue {
print; # this prints to original filename
Quote:
}