Several Perl questions related to speed and memory use 
Author Message
 Several Perl questions related to speed and memory use

[I had earlier posted this to comp.lang.perl but I see it's not read as much
as this one; sorry for the multiple postings.]

I am developing a Perl script (5.00503 on NT) that processes many lines of
text. I am looking into ways to speed it up. Unfortunately, because of
company regulations, I cannot post all of my script here.

For one pattern match, I would like to find out if there is a faster way to
do this. In the code fragment below, the $csCookie is composed of a number
of keyword=value pairs, spearated by semicolons.The XYZZY is composed of
upper- and lower-case letters and/or digits. Is there a faster way to do
this pattern match? I could possibly use the \w character, and ignore the
fact that it would allow the underscore character to be in the string also,
if this was significantly faster than the pattern used below. Or, what if I
used /XYZZY=(\w*/ and, if that succeeded, then look for the underscore
- would that be faster than what I am using below?

if ($csCookie =~ m/XYZZY=([A-Za-z0-9]+)/) {
    $ValidStrings ++;
    return $1;

Quote:
}

When I run my script on 30 files (averaging around 125,000 KB each), the
rate at which the script processes each file decreases as it progresses
through the 30 files. The script uses anonymous arrays, hashs, etc. But, I
am fairly certain I have gotten all the "memory leaks" out of it. So, I'm
not sure if memory is the problem; what are some of the common reasons for a
decrease in speed as Perl runs?

If I wanted to check to make sure I was undef'ing all my references and
deleting hash keys no longer used, is there a way to dump or look at all the
memory my script has allocated and see the reference counts?

Thanks for your help,
Jack Stansbury



Wed, 18 Jun 1902 08:00:00 GMT  
 Several Perl questions related to speed and memory use

Quote:

>[I had earlier posted this to comp.lang.perl but I see it's not read as much
>as this one; sorry for the multiple postings.]

That newsgroup was deleted in 1995.

Quote:
>For one pattern match, I would like to find out if there is a faster way to
>do this. In the code fragment below, the $csCookie is composed of a number
>of keyword=value pairs, spearated by semicolons.The XYZZY is composed of
>upper- and lower-case letters and/or digits. Is there a faster way to do
>this pattern match? I could possibly use the \w character, and ignore the
>fact that it would allow the underscore character to be in the string also,
>if this was significantly faster than the pattern used below. Or, what if I
>used /XYZZY=(\w*/ and, if that succeeded, then look for the underscore
>- would that be faster than what I am using below?

The new regex would probably be equivalently fast; perldoc Benchmark if
you want to find out.

Quote:
>if ($csCookie =~ m/XYZZY=([A-Za-z0-9]+)/) {

(I don't have an answer to your other questions.)

--

The Internet stock bubble didn't burst on 1999-11-08.  Hurrah!
<URL:http://www.pobox.com/~kragen/bubble.html>



Wed, 18 Jun 1902 08:00:00 GMT  
 Several Perl questions related to speed and memory use

Quote:

> Unfortunately, because of company regulations, I cannot post all of my
> script here.

That's okay; we don't want to see all of your script. But if you want us
to help with part of it, we'll need to see part. (Of course, you could let
us help with a similar script....)

Quote:
> Is there a faster way to do this pattern match?
> if ($csCookie =~ m/XYZZY=([A-Za-z0-9]+)/) {

Probably not much faster, if at all. But if you really want to know
whether one approach is faster than another, use Benchmark.

Quote:
> When I run my script on 30 files (averaging around 125,000 KB each),
> the rate at which the script processes each file decreases as it
> progresses through the 30 files.

You're probably doing something wrong. See whether you can write a small
self-contained program which we can run to see the same behavior. It
should be fewer than a dozen lines, if possible, and it shouldn't use
external data.

Cheers!

--
Tom Phoenix       Perl Training and Hacking       Esperanto
Randal Schwartz Case:     http://www.rahul.net/jeffrey/ovs/



Wed, 18 Jun 1902 08:00:00 GMT  
 Several Perl questions related to speed and memory use

Quote:

> [I had earlier posted this to comp.lang.perl but I see it's not read as much
> as this one; sorry for the multiple postings.]

That is because comp.lang.perl has been dead for a while now. You
shouldn't post there anymore.

Quote:
> I am developing a Perl script (5.00503 on NT) that processes many lines of
> text. I am looking into ways to speed it up. Unfortunately, because of
> company regulations, I cannot post all of my script here.

A lot of people around here frown upon that kind of attitude.

Quote:
> For one pattern match, I would like to find out if there is a faster way to
> do this. In the code fragment below, the $csCookie is composed of a number
> of keyword=value pairs, spearated by semicolons.The XYZZY is composed of
> upper- and lower-case letters and/or digits. Is there a faster way to do
> this pattern match? I could possibly use the \w character, and ignore the
> fact that it would allow the underscore character to be in the string also,
> if this was significantly faster than the pattern used below. Or, what if I
> used /XYZZY=(\w*/ and, if that succeeded, then look for the underscore
> - would that be faster than what I am using below?

I doubt it. I highly doubt that this is the bottle neck in your
program. You could probably do something like:

        /XYZZY=([^_\s]+)/

but again, I don't think it will make a big difference.

Since you are using a recent version of Perl, look into using the qr//
operator to compile your regular expression first.

Quote:
> if ($csCookie =~ m/XYZZY=([A-Za-z0-9]+)/) {
>     $ValidStrings ++;
>     return $1;
> }

> When I run my script on 30 files (averaging around 125,000 KB each), the
> rate at which the script processes each file decreases as it progresses
> through the 30 files. The script uses anonymous arrays, hashs, etc. But, I
> am fairly certain I have gotten all the "memory leaks" out of it. So, I'm
> not sure if memory is the problem; what are some of the common reasons for a
> decrease in speed as Perl runs?

You're probably doing something wrong. Maybe you need to clear some
hash, or array before you start reading each file. It's very hard to
say since you don't show us any real code.

Quote:
> If I wanted to check to make sure I was undef'ing all my references and
> deleting hash keys no longer used, is there a way to dump or look at all the
> memory my script has allocated and see the reference counts?

No, but I would limit the scope of any variable to the absolute
minimum, just to be sure.

HTH,
--Ala



Wed, 18 Jun 1902 08:00:00 GMT  
 Several Perl questions related to speed and memory use


Quote:

>I doubt it. I highly doubt that this is the bottle neck in your
>program. You could probably do something like:

>    /XYZZY=([^_\s]+)/

>but again, I don't think it will make a big difference.

Have you benchmarked?

Quote:
>Since you are using a recent version of Perl, look into using the qr//
>operator to compile your regular expression first.

The expression he posted will be compiled at compile-time anyway; it
doesn't interpolate any variables.
--

The Internet stock bubble didn't burst on 1999-11-08.  Hurrah!
<URL:http://www.pobox.com/~kragen/bubble.html>


Wed, 18 Jun 1902 08:00:00 GMT  
 Several Perl questions related to speed and memory use

Quote:

> Since you are using a recent version of Perl, look into using the qr//
> operator to compile your regular expression first.

I don't think that will make any difference, since the pattern in question
should be compiled just once in any case. Unless I'm mistaken about what
you're talking about. Cheers!

--
Tom Phoenix       Perl Training and Hacking       Esperanto
Randal Schwartz Case:     http://www.rahul.net/jeffrey/ovs/



Wed, 18 Jun 1902 08:00:00 GMT  
 Several Perl questions related to speed and memory use


Quote:



...

Quote:
> > For one pattern match, I would like to find out if there is a faster way to
> > do this. In the code fragment below, the $csCookie is composed of a number
> > of keyword=value pairs, spearated by semicolons.The XYZZY is composed of
> > upper- and lower-case letters and/or digits. Is there a faster way to do
> > this pattern match? I could possibly use the \w character, and ignore the
> > fact that it would allow the underscore character to be in the string also,
> > if this was significantly faster than the pattern used below. Or, what if I
> > used /XYZZY=(\w*/ and, if that succeeded, then look for the underscore
> > - would that be faster than what I am using below?

> I doubt it. I highly doubt that this is the bottle neck in your
> program. You could probably do something like:

>    /XYZZY=([^_\s]+)/

That will match against all kinds of unwanted punctuation characters.  
You need this:

        /XYZZY=([^_\W]+)/

--
(Just Another Larry) Rosler
Hewlett-Packard Laboratories
http://www.hpl.hp.com/personal/Larry_Rosler/



Wed, 18 Jun 1902 08:00:00 GMT  
 Several Perl questions related to speed and memory use
[posted & mailed]

Quote:

> I am developing a Perl script (5.00503 on NT) that processes many lines of
> text. I am looking into ways to speed it up.

Then you will want to determine where it's slow.

perldoc perlfaq3

    How do I profile my Perl programs?

--
Rick Delaney



Wed, 18 Jun 1902 08:00:00 GMT  
 Several Perl questions related to speed and memory use

Quote:

>Have you benchmarked?

Yes, I did try benchmark after some of you suggested that. I changed the one
pattern to use index and substr instead, because that was a little faster
than the m/XYZZY=([A-Za-z0-9]+) I was using, according to the benchmark.

I've been referring to the text on the Perl CD with the six (?) books on it.
There is a section there that describes things one can do to speed up a Perl
script, and I've been through most if not all of them. With those, I have
been able to speed up the script to process the data lines much faster than
it was. However, I am still concerned about the slowdown in speed as it
processes more files. I understand some of you all's concern about me not
posting my script, but hey, I do like my job! :-)

My main concern was in getting a grip on the memory aspects of the script,
and being able to answer questions like: Am I leaving dangling refs to
memory that I thought I was deleting or undefing? Is the accumulation of
unused-but-not-deleted memory slowing down the script? Exactly what memory
has been allocated but not deleted? Can I dump all of memory that I have
allocated? If Perl has ways to do this kind of analysis, please point me in
that direction.

I am not calling the malloc function, as someone asked me in an e-mail. I
just allocate anonymous hashes and arrays somewhat like this:

my %Table;

$Table{$key) = $ref1 = [];
$ref1->[0] = $value1;
$ref1->[1] = $value2;
$ref1->[2] = $ref2 = [];

and so forth. In the script, the hash entry points to an array A of 5
entries. The last entry in the array A is a ref pointing to an array B of 4
entries. Each entry in the array B is a ref pointing to an array C of 6
entries. If I do a delete %Table{$key}, I hope that is deleting all the
arrays involved in that hash entry.

Thanks for all the great responses! It's been fun learning this language!

Jack Stansbury



Wed, 18 Jun 1902 08:00:00 GMT  
 
 [ 9 post ] 

 Relevant Pages 

1. Perl memory and speed questions (Lots of code)

2. Tracing memory usage & related questions

3. Memory allocation in 5a9, NDBM related

4. several questions about OOP in perl

5. Perl with Apache - Several questions for you

6. several questions about perl

7. Several Perl questions

8. some questions related to Perl and security and...

9. newbie question related to perl

10. Perl question relating to pattern substitution

11. Save memory and measure used memory???

12. Save memory and measure used memory???

 

 
Powered by phpBB® Forum Software