Substituting many sparsely distrbuted strings in many files 
Author Message
 Substituting many sparsely distrbuted strings in many files

Hi all,

I don't believe this is a FAQ. Apologies if it is!

I've written a program (based on the "Substituting Strings in Many
Files" example in the O'Reilly Nutshell Reference) that replaces many
(100s) of strings in many (10s-100s) of large (100-10000 line) files.

But it runs like a proverbial dead-dog. :-(

It the slowdown occurs because I execute a m// or s/// for every
single string on every single line. Can anyone suggest a faster search
and replace algorithm?

The fragment below shows my current algorithm. %map is an associative
array with match/replace pairs.

Thanks in advance for any suggestions,
Ron Verstappen
Teknekron Software Systems, Inc.

[%map-initalization stuff deleted]

while (<>) {
    if ($ARGV ne $oldargv) {
        rename($ARGV, $ARGV . '.bak');
        open(ARGVOUT, ">$ARGV");
        select(ARGVOUT);
        $oldargv = $ARGV;
    }
    if (! /\w/) {
        # if this line has no word characters in it at all, then
        # just print it out and continue
        print;
    }
    else {
        while (($from,$to) = each %map) {
            if (! /$from/) {
                # this test is faster than the s/// below and is usually true
                # so helps speed up the program by about 50%
                print;
            }
            else {
                if ($from =~ /\s/) {
                    # if the $from string contains a whitespace character then
                    # this string must be enclosed in double-quotes (or
                    # escaped double-quotes).
                    s/(\\\"|\")$from\1/\1$to\1/g;
                } else {
                    # otherwise the $from string may be without double-quotes,
                    # but if it opens with a double-quote then it must also
                    # close with one.
                    s/(\\\"|\"|[^\"]\b)$from\1/\1$to\1/g ||
                        s/^$from\b/$to/ ||
                            s/\b$from$/$to/;777789
                }
            }
        }
    }

Quote:
}

continue {
    print;     # this prints to original filename
Quote:
}



Tue, 11 Mar 1997 09:05:46 GMT  
 Substituting many sparsely distrbuted strings in many files

: I've written a program (based on the "Substituting Strings in Many
: Files" example in the O'Reilly Nutshell Reference) that replaces many
: (100s) of strings in many (10s-100s) of large (100-10000 line) files.
: But it runs like a proverbial dead-dog.  :-(

One usual method involves a strategy of precompiling the
patterns instead of using $-expansion to construct them on the fly.
This can gain moderate speedups, but the exact applicability (given
the munging on the $from done) is obscure to me.

But I noticed that you went to some lengths in your code that seemed to
be enforcing token boundaries on the substitutions.  I note that IF
(mind you, IF) the substitutions are always or often on tokens, then
it will be much more efficient to tokenize the input stream, and then
simply look up each incoming token as $replacementof{$token}.

( Of course, you'd have to take care to treat what one would normally
  think of as whitespace *between* tokens, or comments, or whatnot,
  as a special case of a non-matching token, just so the inputs don't
  get their whitespace smushed. )
--




Wed, 12 Mar 1997 00:36:18 GMT  
 Substituting many sparsely distrbuted strings in many files
I have found a simple solution to the problem I posed yesterday (so
please forget my earlier request for assistance):


   : I've written a program (based on the "Substituting Strings in Many
   : Files" example in the O'Reilly Nutshell Reference) that replaces many
   : (100s) of strings in many (10s-100s) of large (100-10000 line) files.
   : But it runs like a proverbial dead-dog.  :-(

This was very simply solved by added the following 2 lines before the
main processing loop...

        undef $/;
        $* = 1;

This means I am now treating each file as one line. The improvement is
terrific. Pretty basic for seasoned Perl-programmers, I know, but this
was my first of these types of programs.

Cheers!
Ron Verstappen



Wed, 12 Mar 1997 08:32:01 GMT  
 
 [ 3 post ] 

 Relevant Pages 

1. Substituting strings (with newlines) in many files

2. Substituting Strings in Many Files

3. substituting strings in many files

4. Substituting $variable strings in a file

5. how to substitute strings in a newly created file

6. split and substitute, substitute, substitute

7. substituting an exact negated string: can I?

8. MakeMaker: target to substitute strings in a distribution

9. Substitute without exceeding a defined string length

10. Substituting a string?!

11. Substitute string

12. substitute to a new string, leaving first unchaned

 

 
Powered by phpBB® Forum Software