Speed problems with Python vs. Perl 
Author Message
 Speed problems with Python vs. Perl

Last week I wrote a simple python program and found out that it was
terribly slow. Therefore I retried with Perl and found a much better
performance. The programs simple read a file and split the lines on white
space, something I have to do very often for data elaboration

Here are the two programs:

----------------------------------------------------------

#!/usr/bin/python

import sys
import re

whitespace = re.compile("\s+")

def main():

    icount = 0
    for line in sys.stdin.readlines():
        icount = icount + 1
        f = whitespace.split(line)
    print "Total lines read: " + `icount`

if __name__ == '__main__':
    main()

--------------------------------------------------------

#!/usr/bin/perl

$icount = 0;

while(<>) {
  $icount++;

Quote:
}

print "Total lines read: $icount\n";

---------------------------------------------------------

I ran the two programs with the line splitting and also commenting out the
line in which I split on whitespace. Here are the results:

Perl:

with line splitting

Total lines read: 12212
0.34user 0.00system 0:00.34elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (264major+34minor)pagefaults 0swaps

without line splitting

Total lines read: 12212
0.10user 0.00system 0:00.10elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (250major+34minor)pagefaults 0swaps

Python:

with line splitting

Total lines read: 12212
1.93user 0.01system 0:01.94elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (236major+311minor)pagefaults 0swaps

without line splitting

Total lines read: 12212
0.20user 0.00system 0:00.20elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (233major+311minor)pagefaults 0swaps

As you can see, without the splitting (only reading and counting the lines)
Perl is twice as fast, but once I split on whitespace, Python gets more
than 5 times slower than Perl.

These results have been achieved on a AMD with 650 MHerz. On my home
machine, a 266 Celeron, the performance gap is worse. Without line
splitting the programs
are about the same speed, but with splitting enabled, the Python version
becomes 10 times slower than the Perl version (5 secs against 0.5 secs)

I wonder why this happens? Both languages are interpreted. Is it the
fault of the implementation of the re module? Somebody has had a similar
experience.

Georg Umgiesser

--

                                                             __
                                                           ^/..\^  
----------------------------------------------------------m( 00 )m---

Oceanography, ISDGM-CNR | web    : http://www.*-*-*.com/ ~georg/
1364 S. Polo            | tel    :              ++39 - 041 - 5216 875
30125 Venezia, Italia   | fax    :              ++39 - 041 - 2602 340
---------------------------------------------------------------------



Sun, 14 Sep 2003 23:12:58 GMT  
 Speed problems with Python vs. Perl

Quote:

> ----------------------------------------------------------

> #!/usr/bin/python

> import sys
> import re

> whitespace = re.compile("\s+")

> def main():

>     icount = 0
>     for line in sys.stdin.readlines():
>         icount = icount + 1
>         f = whitespace.split(line)
>     print "Total lines read: " + `icount`

> if __name__ == '__main__':
>     main()

> --------------------------------------------------------

assuming you're using Python 2.1, the following version is
about 16 times faster on my box:

import sys

def main():

    icount = 0
    for line in sys.stdin.xreadlines():
        icount += 1
        f = line.split()
    print "Total lines read", icount

if __name__ == '__main__':
    main()

for a backwards-compatible version of xreadlines, use
the double-loop pattern from:

http://effbot.org/guides/readline-performance.htm

Cheers /F



Sun, 14 Sep 2003 23:37:33 GMT  
 Speed problems with Python vs. Perl
Did you try to avoid re for this simple task and to use the string
method 'split' instead (or the string module if you are using 1.5)? It
defaults to any whitespace string as separator and should be faster:
f = line.split()

Matthias



Sun, 14 Sep 2003 23:44:53 GMT  
 Speed problems with Python vs. Perl

Quote:
> Last week I wrote a simple Python program and found out that it was
> terribly slow. Therefore I retried with Perl and found a much better
> performance. The programs simple read a file and split the lines on white
> space, something I have to do very often for data elaboration

[SNIP]

Quote:
> As you can see, without the splitting (only reading and counting the
lines)
> Perl is twice as fast, but once I split on whitespace, Python gets more
> than 5 times slower than Perl.

> These results have been achieved on a AMD with 650 MHerz. On my home
> machine, a 266 Celeron, the performance gap is worse. Without line
> splitting the programs
> are about the same speed, but with splitting enabled, the Python version
> becomes 10 times slower than the Perl version (5 secs against 0.5 secs)

Why the Perl's REs are faster, I'll leave for someone else to explain.
Perhaps they special case common REs more than Python does. However, you are
suffering from Overuse of Regular Expressions (ORE [tm]). Specifically, you
can use string.split or in Python 2.0 you can use the split method on
strings:

def main152(): # Should work on python 1.5.2
    import string
    icount = 0
    for line in sys.stdin.readlines():
        icount = icount + 1
        f = string.split(line)
    print "Total lines read: " + `icount`

or

def main20(): # Should work on python >= 2.0
    icount = 0
    for line in sys.stdin.readlines():
        icount += 1
        f = line.split()
    print "Total lines read: " + `icount`

or

def main21(): # Should? work on Python >= 2.1
    icount = 0
    for line in sys.stdin.xreadlines(): # Might speed things up a bit....
        icount += 1
        f = line.split()
    print "Total lines read: " + `icount`

I haven't actually benchmarked (or even tried) any of these, but I'd be
curious to see how the last one in particular stacks up against it's Perl
equivalent. I don't have Perl installed at the moment, though so I leave
that as an excercise for the reader....

-lazy tim



Sun, 14 Sep 2003 23:51:53 GMT  
 Speed problems with Python vs. Perl

Quote:

> assuming you're using Python 2.1, the following version is
> about 16 times faster on my box:

> import sys

> def main():

>     icount = 0
>     for line in sys.stdin.xreadlines():
>         icount += 1
>         f = line.split()
>     print "Total lines read", icount

> if __name__ == '__main__':
>     main()

You can't necessarily blame this on the line I/O performance because
you used line.split() where he used the split method on a regular
expression object.  Testing just the differences in those two I show a
10 fold improvement on line.split().  

In this case that is valid since you are splitting on whitespace in
both Perl and Python, but it would be interesting to see if Python
still holds up under more complex regular expressions.  

I went ahead and used

--- Python Script ---

import re
import sys

r = re.compile(":+")

def main():
    icount = 0
    while 1:
        lines = sys.stdin.readlines(50000)
        if not lines:
                break
        for line in lines:
                icount += 1
                f = r.split(line)

if __name__=="__main__": main()

--- End Python Script ---

vs.

--- Perl Script ---
$icount = 0;

while(<>) {
  $icount++;

Quote:
}

--- End Perl Script ---

with Python 2.0 and Perl 5.6.0 on a Celeron 300A using a 242000 line
input file and I got the following results:

python: 61.10s user 0.42s system 87% cpu 1:10.37 total
perl:   11.31s user 0.10s system 76% cpu 14.960 total

admittedly Python is being hurt by the slower 2.0 line I/O, but I
would guess that regular expressions are still hurting performance.

--

"Parity is for farmers." Seymour Cray on his machines lack of parity
"I guess farmers buy a lot of computers." Seymour Cray on including parity



Mon, 15 Sep 2003 00:08:25 GMT  
 Speed problems with Python vs. Perl

Quote:

> with Python 2.0 and Perl 5.6.0 on a Celeron 300A using a 242000 line
> input file and I got the following results:

> python: 61.10s user 0.42s system 87% cpu 1:10.37 total
> perl:   11.31s user 0.10s system 76% cpu 14.960 total

> admittedly Python is being hurt by the slower 2.0 line I/O, but I
> would guess that regular expressions are still hurting performance.

s/regular expressions/re.split/g

note that re.split is written in Python, while I'm pretty
sure Perl's split is written in C.

to get a better idea of how fast the regular expression
engine itself is, run your benchmark with "findall" instead
of "split".

Cheers /F



Mon, 15 Sep 2003 00:46:14 GMT  
 Speed problems with Python vs. Perl

Quote:
>for a backwards-compatible version of xreadlines, use
>the double-loop pattern from:

>http://effbot.org/guides/readline-performance.htm

Looking at your code there, a saw readline-example-3.py:

# readline-example-3.py
file = open("sample.txt")
while 1:
   lines = file.readlines(100000)
     if not lines:
       break
     for line in lines:
       pass # do something

Then I got to the library reference manual:

readlines ([sizehint])
   Read until EOF using readline() and return a list containing the
   lines thus read. If the optional sizehint argument is present,
   instead of reading up to EOF, whole lines totalling approximately
   sizehint bytes (possibly after rounding up to an internal buffer
   size) are read. Objects implementing a file-like interface may choose
   to ignore sizehint if it cannot be implemented, or cannot be
   implemented efficiently.

I think that this must be new on Python 2.0; I really dont remember it from
my 1.5.2 days (maybe I overlooked it). So I have two questions:

1) What platforms do implement this? DO I need to check the code to have
this information?
2) There is any documentation about buffer sizes for different platforms?

Carlos Ribeiro



Mon, 15 Sep 2003 01:33:48 GMT  
 Speed problems with Python vs. Perl

Quote:

> I haven't actually benchmarked (or even tried) any of these, but I'd
> be curious to see how the last one in particular stacks up against
> it's Perl equivalent. I don't have Perl installed at the moment,
> though so I leave that as an excercise for the reader....

$ cat tsplit.py
#!/usr/bin/python
import sys, string

i = 0
for line in sys.stdin.readlines():
    i = i + 1
    f = string.split(line)

print "Total lines read: " + `i`
$ cat tsplit.pl
#!/usr/bin/perl
$icount = 0;

while(<>) {
  $icount++;

Quote:
}

print "Total lines read: $icount\n";
$ time ./tsplit.py < manual.txt
Total lines read: 32542

real    0m4.865s
user    0m4.780s
sys     0m0.090s
$ time ./tsplit.pl < manual.txt
Total lines read: 32542

real    0m4.452s
user    0m3.140s
sys     0m0.060s

Fairly comparable.  Don't have a version with xreadlines() yet, so I
can't say how much improvement that would add for the python times.

--
Bob Kline

http://www.rksystems.com



Mon, 15 Sep 2003 01:17:40 GMT  
 
 [ 8 post ] 

 Relevant Pages 

1. Perl speed vs. Python speed

2. Speed of Python vs. Perl

3. Yet another Python vs. Perl speed issue/question

4. Speed of Python vs. Perl

5. python vs. perl speed comparisons

6. Python vs Perl: benchmarking for speed?

7. Perl vs TCL (was: Execution speed of Perl?)

8. Python Binding [Was: Re: PYTHON VS. PERL VS. TCL ]

9. Forth vs Python vs Perl

10. perl vs python vs icon

11. PYTHON VS. PERL VS. TCL

12. jredford's flames and criticism (was: PYTHON VS. PERL VS. TCL )

 

 
Powered by phpBB® Forum Software