Perl is too slow!!! 
Author Message
 Perl is too slow!!!

Got your attention? Good.

Here is my problem: I have a file containing 5000 blocks of 3240 lines of text
seperated by a blank line. I'd like to be able to read in these blocks
one at a time and split them into an array. BUT, I'd like to do this
as quickly as possible.

The simple solution:

        $/ = "\n\n"; # para mode
        while(<>) {
                chop; chop; # take off the trailing newlines

        }

The former runs much slower than the latter (why?):

        $/ = "\n"; # normal and not necessary except for clarity
        while (<>) {
                if ($_ eq "\n") { # blank line sperator

                } else {
                        chop; # strip newline
                        push(vals,$_); # add line to end of array
                }
        }

Hmmm, whats the fastest way of doing this (maybe I should actually
look at the perl code?)? If I want to read in an array of these lines
as rapidly as possible, should I write my own usub (seems kind of
idiotic)?

--
---

Dietrich Kappe



Mon, 17 Jun 1996 03:30:43 GMT  
 Perl is too slow!!!

:Got your attention? Good.
:
:Here is my problem: I have a file containing 5000 blocks of 3240 lines of text
:seperated by a blank line. I'd like to be able to read in these blocks
:one at a time and split them into an array. BUT, I'd like to do this
:as quickly as possible.
:
:The simple solution:
:
:       $/ = "\n\n"; # para mode
:       while(<>) {
:               chop; chop; # take off the trailing newlines

:       }

:The former runs much slower than the latter (why?):
:
:       $/ = "\n"; # normal and not necessary except for clarity
:       while (<>) {
:               if ($_ eq "\n") { # blank line sperator

:               } else {
:                       chop; # strip newline
:                       push(vals,$_); # add line to end of array


:               }
:       }

It's expensive to create a humongous string (3240 lines times maybe 50 bytes
per line is 150k each).  It's also expensive to go and split it into an array
just to throw the big string away.

:Hmmm, whats the fastest way of doing this (maybe I should actually
:look at the perl code?)? If I want to read in an array of these lines
:as rapidly as possible, should I write my own usub (seems kind of
:idiotic)?

Pre-allocating the array will help a wee bit the first time, but
otherwise it will already be there.

--tom
--

      "Will Hack Perl for Fine Food and Fun"
        Boulder Colorado  303-444-3212



Wed, 19 Jun 1996 01:02:37 GMT  
 Perl is too slow!!!

   :                    push(vals,$_); # add line to end of array


:-) Then why did Larry make it optional?

   It's expensive to create a humongous string (3240 lines times maybe 50 bytes
   per line is 150k each). It's also expensive to go and split it into an array
   just to throw the big string away.

If I want to know what the 1098th line is, and based on that, change
the 2317th line, all in that single string, then regular expressions:

        1) Turn out to have built in limits that fall short of this
        sort of problem (modifying the regular expression maximum size
        in regcomp.c helps, but I haven't checked if this introduces
        any problems into the regexp code).

        2) Are ugly ugly ugly!
        s/^(.*\n){1097}(OH\n)(.*\n){1218}(Col[^u]*mbus\n)/$1$2$3Columbus\n/

        The above doesn't quite work (The (regexp){n} constructs only match
        the last occurance of regexp).

The whole point of this excercise is modifying a dataset that will
eventually consist of 40000 "blocks" (observations) by roughly 20000
"lines" (variables). All of the tricks and tradeoffs of dealing with
this gargantuan dataset in smaller chunks have already been considered
and used. I have written a few programs that can perform simple
modifications on the dataset, and do so in a very speedy manner. The more
complex these simple tools become, the more I see myself actually
implementing yet another programming language. I want to avoid this. I
want to use perl as the base language for this work, rather than
invent my own.

I have played around with doing so in straight perl, without any
usubs, but I've been unable to get the performance I want. Sigh.

Well, now you know my story. :-)

--
---

Dietrich Kappe



Wed, 19 Jun 1996 03:17:18 GMT  
 Perl is too slow!!!

|>The whole point of this excercise is modifying a dataset that will
|>eventually consist of 40000 "blocks" (observations) by roughly 20000
|>"lines" (variables). All of the tricks and tradeoffs of dealing with
|>this gargantuan dataset in smaller chunks have already been considered
|>and used. I have written a few programs that can perform simple
|>modifications on the dataset, and do so in a very speedy manner. The more
|>complex these simple tools become, the more I see myself actually
|>implementing yet another programming language. I want to avoid this. I
|>want to use perl as the base language for this work, rather than
|>invent my own.
|>
|>I have played around with doing so in straight perl, without any
|>usubs, but I've been unable to get the performance I want. Sigh.
|>
|>Well, now you know my story. :-)

There is an expression - "horses for courses".  When you select a language
or languages for a particular project, you need to consider the requirements
of the project.  I also once tried writing a program to manipulate vast amounts
of information in perl, only to have somebody else write it in C and have
their version run a lot faster.

Perl is fast.  However the other main advantage of perl is that it is easily
modified and is therefore easy to change as requirements change.  With a bit
of thought it is easy to write perl code that is almost trivial to change in
some expected ways.  Therefore I've written our project's build tool in perl,
since I know I'll need to extend it in the future and so the aspects of perl
that make it easily modifiable are perfect for this application.

However perl is not always as fast as straight C code.  If you want to munge
vast amounts of data, but keep perl as your base language, my advice would
be to write a few underlying utilities in C and call them, as seperate
programs, from perl.  This should give you the maximum flexibility with the
greatest speed.

Lezz

P.S. But then again, what do I know?!  :-)



Wed, 19 Jun 1996 05:35:40 GMT  
 Perl is too slow!!!

   However perl is not always as fast as straight C code.  If you want to munge
   vast amounts of data, but keep perl as your base language, my advice would
   be to write a few underlying utilities in C and call them, as seperate
   programs, from perl.  This should give you the maximum flexibility with the
   greatest speed.

   Lezz

   P.S. But then again, what do I know?!  :-)

A great deal, it seems! :-)

Your last suggestion is actually what we're doing right now. The
problem with this approach, however, is that the communication between
the utilities and perl is slow and disjointed. Also, the whole system
will eventually be used by non-programmer economists (sigh), so coding
specific data munging tasks in C is not a realistic option (but then
again, is perl a realistic option? Evidence suggests the answer is
"yes").

My current half baked plan is to implement "fake" arrays (via user
variables as "described" :-) in /usub), that will be read from and
written to files via read and write system calls in usubs, but the
dimensions of which will remain persistent (avoiding malloc() calls).

Is this doable? I don't know yet. Is it wise? Probably not. :-)

--
---

Dietrich Kappe



Wed, 19 Jun 1996 07:52:31 GMT  
 Perl is too slow!!!


:
:   :                   push(vals,$_); # add line to end of array
:

:
::-) Then why did Larry make it optional?

Larry's smarter now than he used to be, for certain values of smarter.
What you're seeing there is a vestige of an ancient version of Perl.
It's more consistent to *ALWAYS* use the type specifier.

Today, seeing people write this:

    push(Red,  Blue);
    push(Blue, Red);

really makes me cringe.

-tom
--

      "Will Hack Perl for Fine Food and Fun"
        Boulder Colorado  303-444-3212



Wed, 19 Jun 1996 23:48:03 GMT  
 Perl is too slow!!!

Quote:

>If I want to know what the 1098th line is, and based on that, change
>the 2317th line, all in that single string, then regular expressions:

>    1) Turn out to have built in limits that fall short of this
>    sort of problem (modifying the regular expression maximum size
>    in regcomp.c helps, but I haven't checked if this introduces
>    any problems into the regexp code).

A while ago I think I read a post from Larry that said somebody was
rewriting the regex code to remove these limits.

Quote:
>    2) Are ugly ugly ugly!
>    s/^(.*\n){1097}(OH\n)(.*\n){1218}(Col[^u]*mbus\n)/$1$2$3Columbus\n/

>    The above doesn't quite work (The (regexp){n} constructs only match
>    the last occurance of regexp).

No, but if you put parens around them it does:

s/^((.*\n){1097})(OH\n)((.*\n){1218})(Col[^u]*mbus\n)/$1$3$4Columbus\n/
   ^^            ^     ^^            ^                ^^^^^^
   12            3     45            6

Michael D'Errico



Fri, 21 Jun 1996 09:30:25 GMT  
 Perl is too slow!!!
Quote:

>[...]
>Larry's smarter now than he used to be, for certain values of smarter.
>What you're seeing there is a vestige of an ancient version of Perl.
>It's more consistent to *ALWAYS* use the type specifier.

>Today, seeing people write this:

>    push(Red,  Blue);
>    push(Blue, Red);

>really makes me cringe.

Me to! In perl 4 with -w I didn't get a warning when I tried the above.

In perl 5, will this "vestige of an ancient version of Perl", and others
like it, produce warnings?

Quote:
>-tom
>--

>      "Will Hack Perl for Fine Food and Fun"
>    Boulder Colorado  303-444-3212

Regards,
Tim Bunce.


Fri, 21 Jun 1996 19:05:18 GMT  
 Perl is too slow!!!


:>[...]
:>Larry's smarter now than he used to be, for certain values of smarter.
:>What you're seeing there is a vestige of an ancient version of Perl.
:>It's more consistent to *ALWAYS* use the type specifier.
:>
:>Today, seeing people write this:
:>
:>    push(Red,  Blue);
:>    push(Blue, Red);
:>
:>really makes me cringe.
:>
:Me to! In perl 4 with -w I didn't get a warning when I tried the above.
:
:In perl 5, will this "vestige of an ancient version of Perl", and others
:like it, produce warnings?

    % perl5 -cwe 'push(Red,  Blue)'
    Possible typo: "Red" at /tmp/perl-ea15087 line 1.
    /tmp/perl-ea15087 syntax OK

    % perl5 -cwe 'push(red,  blue)'
    "red" may clash with future reserved word at /tmp/perl-ea15088 line 1.
    "blue" may clash with future reserved word at /tmp/perl-ea15088 line 1.
    Possible typo: "red" at /tmp/perl-ea15088 line 1.
    /tmp/perl-ea15088 syntax OK

--tom
--

      "Will Hack Perl for Fine Food and Fun"
        Boulder Colorado  303-444-3212



Fri, 21 Jun 1996 23:22:26 GMT  
 Perl is too slow!!!


: :>[...]
: :>Larry's smarter now than he used to be, for certain values of smarter.
: :>What you're seeing there is a vestige of an ancient version of Perl.
: :>It's more consistent to *ALWAYS* use the type specifier.
: :>
: :>Today, seeing people write this:
: :>
: :>    push(Red,  Blue);
: :>    push(Blue, Red);
: :>
: :>really makes me cringe.
: :>
: :Me to! In perl 4 with -w I didn't get a warning when I tried the above.
: :
: :In perl 5, will this "vestige of an ancient version of Perl", and others
: :like it, produce warnings?
:
:     % perl5 -cwe 'push(Red,  Blue)'
:     Possible typo: "Red" at /tmp/perl-ea15087 line 1.

In alpha 4 it'll say


Similarly if you say "keys foo" you get

    Hash %foo missing the % in argument 1 of keys() at - line 2.

By the way, I just added :: as a package delimiter.  While I was at it
I made

        package foo::bar;

create a nested package.  If you say

        package foo::bar;
        print *xyz;

it prints out *foo::bar::xyz.  Unfortunately, there isn't any way yet to
specify the current package as a starting point, so you can't get
at that variable by saying

        package foo;
        print $bar::xyz;

Pity.

Larry



Tue, 25 Jun 1996 10:54:56 GMT  
 
 [ 10 post ] 

 Relevant Pages 

1. Perl CGI scripts getting slower...and...slower..

2. Need Perl help, but I am not a Perl Monger

3. script slows as input grows, am I missing something obvious?

4. Slowing down Perl

5. why activePerl is slower than MKS perl???

6. Speedup for slow perl code

7. Perl 5.8 UTF-8 RedHat SLOW

8. perl hashes seems to slow down

9. Please help ham handed perl butcher with slow code

10. Slow page return to browser from Perl Script

11. perl oo execution too slow?

12. perl very slow solving Nqueen Problem

 

 
Powered by phpBB® Forum Software