Reg. Exp. on a whole file 
Author Message
 Reg. Exp. on a whole file

I have recently upgraded to 5.6.1 from 5.00503

One of the scripts I have been working on recently reads in a large
ASCII file and looks for 12 lines that look like
P #.###

where #.### are floating point numbers (and sometimes have - signs
and powers of ten).

I wrote a simple script to slurp the whole file in 5.00503 and
pull out what I need (based on ideas from the _Perl Cookbook_):

   open(F,"$filename") or die "Cannot open $filename: $!\n";

   undef $/;
   my $file = <F>;
   while ($file =~ /P\s(\-?\d+\.\d*\w?(\+|\-)?\d*)\s/g) {

   }
   close(F);

Under 5.6.1 this script takes forever, specifically

Benchmark: timing 1 iterations of slurp...

Under 5.00503

Benchmark: timing 10000 iterations of slurp...
     slurp: 273 wallclock secs (172.66 usr + 93.23 sys = 265.89 CPU)

If I replace the complicated regexp above (\-?\d+\.\d*\w?(\+|\-)?\d*)
with the simple (\S+), I find for 5.6.1

Benchmark: timing 1000 iterations of slurp...

while for perl5.00503

Benchmark: timing 1000 iterations of slurp...
     slurp: 27 wallclock secs (17.01 usr +  9.36 sys = 26.37 CPU)

5.6.1 is now almost as fast as 5.00503 but still slower....

Anyway ideas what I am doing wrong?  I will note here that 5.6.1 was
built with gccversion=2.95.2 19991024 (release) while 5.00503 was
built with gccversion=2.8.1 for the same machine (osname=solaris,
osvers=2.6, archname=sun4-solaris).  I get similar results with the
Solaris compiler.

Thanks
Brad Holden



Mon, 29 Sep 2003 08:40:16 GMT  
 Reg. Exp. on a whole file
: One of the scripts I have been working on recently reads in a large
: ASCII file and looks for 12 lines that look like
: P #.###
:
: where #.### are floating point numbers (and sometimes have - signs
: and powers of ten).
:
: I wrote a simple script to slurp the whole file in 5.00503 and
: pull out what I need (based on ideas from the _Perl Cookbook_):

I'm not sure what's causing your slowdown, but there are a few style
issues you may wish to consider.

:    open(F,"$filename") or die "Cannot open $filename: $!\n";

Those double-quotes around $filename are superfluous.


:    undef $/;
:    my $file = <F>;

My prefered idiom for slurping a file looks like

  my $file = do { local $/; <F>; }

This avoids leaving $/ undefined for any other reads which may occur
elsewhere in the script.  Maintenance coders (including yourself in a year
or so) will thank you.

:    while ($file =~ /P\s(\-?\d+\.\d*\w?(\+|\-)?\d*)\s/g) {

:    }

Rather than matching and then pushing in a while loop, why not just


As for the regex itself, there are several ways to improve it:

  /P\s(-?\d+\.\d*\w?[-+]?\d*)\s/g

:    close(F);

--
   |   Craig Berry - http://www.cinenet.net/~cberry/
 --*--  "When the going gets weird, the weird turn pro."
   |               - Hunter S. Thompson



Sun, 05 Oct 2003 01:08:03 GMT  
 Reg. Exp. on a whole file

Quote:

> :    open(F,"$filename") or die "Cannot open $filename: $!\n";

> Those double-quotes around $filename are superfluous.

Rather than kill the quotes, even better would be to insert
the langle:

   open(F,"< $filename") or die "Cannot read $filename: $!\n";

(For discussion, see the Ram, recipes 7.1, 7.2.)

--
John Porter

Any technology distinguishable from magic is insufficiently advanced.



Sun, 05 Oct 2003 01:41:06 GMT  
 Reg. Exp. on a whole file


Quote:

>> :    open(F,"$filename") or die "Cannot open $filename: $!\n";

>> Those double-quotes around $filename are superfluous.

>Rather than kill the quotes, even better would be to insert
>the langle:

>   open(F,"< $filename") or die "Cannot read $filename: $!\n";

>(For discussion, see the Ram, recipes 7.1, 7.2.)

For perl-5.6.0 or later, you can use the 3-arg open.

    open F,"<",$filename or die "Cannot read $filename: $!\n";

        -Joe
--
See http://www.inwap.com/ for PDP-10 and "ReBoot" pages.



Fri, 10 Oct 2003 09:47:27 GMT  
 
 [ 4 post ] 

 Relevant Pages 

1. Alias list

2. Accessing Global procedures

3. Strange thing

4. REQ: finding and editing strings in access files !!!!

5. Converting pascal to asm

6. 1. Indenting and 2. Printing graphics

7. Delphi Desktop and Sybase/SQL Server

8. DBGrids changing appearance of data

9. Menu activation problem

10. Multidimension arrays

11. DBE

12. How Gupta SQL works ?

 

 
Powered by phpBB® Forum Software