Speed Tweaks 
Author Message
 Speed Tweaks

I'm writing a very simple program to do a very simple thing.  I've got a
data file containing three floating point numbers per line.  (Describing
a point in 3D space.)  I want to find the bounding box which contains
them all.  Trivial stuff, except that I've got about 200MB of data
points.

Currently my python script takes 1:41 to process a 650,000 point file,
whereas a compiled C version takes 9 seconds.  How close can Python get
to C's performance in this task?  What's the big bottleneck?

ObILovePython: I don't mean to use this post to say that Python sucks.
I love Python.  I'm responsible for my company's now-widespread use of
it.  I'm just curious how it can be tweaked in this case.

===========================================================
#!/usr/bin/env python

import sys,string

def getBBox(input):
    max = map(float,string.split(input.readline()))
    min = max[:]
    line = input.readline()
    while (line):
        line = map(float,string.split(line))
        for i in range(0,len(line)):
            if line[i] < min[i]:
                min[i] = line[i]
            if line[i] > max[i]:
                max[i] = line[i]
        line = input.readline()
    print min
    print max

if __name__ == "__main__":
    if len(sys.argv) == 1:
        print "Usage:",sys.argv[0],"[inputfiles]"
        sys.exit(-1)

    for arg in sys.argv[1:]:
        file = open(arg,'r')
        getBBox(file)

===========================================================
#include <stdio.h>
#include <string.h>

void minmax(FILE *fd) {
  double line[3];
  double min[3];
  double max[3];
  fscanf(fd,"%lf %lf %lf",&min[0],&min[1],&min[2]);
  max[0]=min[0]; max[1]=min[1]; max[2]=min[2];

  while( EOF != fscanf(fd,"%lf %lf %lf",&line[0],&line[1],&line[2]) ) {
    for (int j=0; j<3; j++) {
      if (line[j] < min[j]) {
        min[j] = line[j];
      }
      if (line[j] > max[j]) {
        max[j] = line[j];
      }
    }
  }

  printf("%lf %lf %lf\n",min[0],min[1],min[2]);
  printf("%lf %lf %lf\n",max[0],max[1],max[2]);

Quote:
}

int main(int ARGC, char **ARGV) {
  if (ARGC == 1) {
    minmax(stdin);
  }
  else for (int i=1; i<ARGC; i++) {
    printf("%s\n",ARGV[i]);
    FILE *fd = fopen(ARGV[i],"r");
    minmax(fd);
  }

Quote:
}

Sent via Deja.com http://www.*-*-*.com/
Before you buy.


Mon, 19 May 2003 03:00:00 GMT  
 Speed Tweaks

Quote:

>I'm writing a very simple program to do a very simple thing.  I've got a
>data file containing three floating point numbers per line.  (Describing
>a point in 3D space.)  I want to find the bounding box which contains
>them all.

There are a couple of things to do.  The performance boost which comes
up most often is to use readlines with a sizehint over readline,

  while 1:
    lines = input.readlines(1000000)
    if not lines:
        break
    for line in lines:
      ...

Another thing to try is to use the built-in min and max functions
rather than doing the tests yourself, as in

  min_val[i] = min(min_val[i], line[i])
  max_val[i] = max(max_val[i], line[i])

Since there are only three values, you don't need the generality of
looping over range(len(line)).  Instead, you could use local variables.
The lookup time for these is much faster.

  x, y, z = string.split(line)
  min_x = min(min_x, x)
  max_x = max(max_x, x)
   ...

Another thing to try is to use min's and max's ability to work
across a list of elements

def getBBox(input):
  data = map(float,string.split(input.readline()))
  min_x = max_x = data[0]
  min_y = max_y = data[1]
  min_z = max_z = data[2]

  while 1:
    lines = input.readlines(1000000)
    if not lines:
        break
    data = map(string.split, lines)
    x_data = map(lambda coord, float=float: float(coord[0]), data)
    min_x = min(min_x, min(x_data))
    max_x = max(max_x, max(x_data))
    y_data = map(lambda coord, float=float: float(coord[1]), data)
    min_y = min(min_y, min(y_data))
    max_y = max(max_y, max(y_data))
    z_data = map(lambda coord, float=float: float(coord[2]), data)
    min_z = min(min_z, min(z_data))
    max_z = max(max_z, max(z_data))

    print [min_x, min_y, min_z]
    print [max_x, max_y, max_z]

Of course, I haven't actually tested the performance of these
suggestions...

                    Andrew



Mon, 19 May 2003 03:00:00 GMT  
 Speed Tweaks

Quote:
> I'm writing a very simple program to do a very simple thing.  I've got a
> data file containing three floating point numbers per line.  (Describing
> a point in 3D space.)  I want to find the bounding box which contains
> them all.  Trivial stuff, except that I've got about 200MB of data
> points.

> Currently my Python script takes 1:41 to process a 650,000 point file,
> whereas a compiled C version takes 9 seconds.  How close can Python get
> to C's performance in this task?  What's the big bottleneck?

> ===========================================================
> #!/usr/bin/env python

> import sys,string

> def getBBox(input):
>     max = map(float,string.split(input.readline()))
>     min = max[:]
>     line = input.readline()
>     while (line):
>         line = map(float,string.split(line))
>         for i in range(0,len(line)):
>             if line[i] < min[i]:
>                 min[i] = line[i]
>             if line[i] > max[i]:
>                 max[i] = line[i]
>         line = input.readline()
>     print min
>     print max

> if __name__ == "__main__":
>     if len(sys.argv) == 1:
>         print "Usage:",sys.argv[0],"[inputfiles]"
>         sys.exit(-1)

>     for arg in sys.argv[1:]:
>         file = open(arg,'r')
>         getBBox(file)

> ===========================================================

using Numeric is a great way to munch on big arrays like this.
also, as we've heard so many times, using "readline()" on large
textfiles is a killer.

this is just a rough job off the top of my head. it should run
a bit faster, but i'm sure there's still a lot of room for
improvement. btw, this is for python 2.0

====================================================
#!/usr/bin/env python

import sys
from Numeric import *

def getBBox(input):
    data = array(input.read().split(), Float)
    x_values = data[::3]
    y_values = data[1::3]
    z_values = data[2::3]
    range_x = min(x_values), max(x_values)
    range_y = min(y_values), max(y_values)
    range_z = min(z_values), max(z_values)

    print range_x[0], range_y[0], range_z[0]
    print range_x[1], range_y[1], range_z[1]

 if __name__ == "__main__":
    if len(sys.argv) == 1:
        print "Usage:", sys.argv[0], "[inputfiles]"
        sys.exit(-1)

    for arg in sys.argv[1:]:
        getBBox(open(arg))
========================================================

sadly, i haven't tested this out, so there might be some
types.. (still seems ok after a second look). the 'fancy'
slicing is just indexing every 3rd value in the list of numbers.

instead of slicing, you could reshape the array and transpose
it so data[0] would be all the X values, data[1] would be all
the Y values. but i thought the method above would be a little
clearer.

if you're doing anything with 3D i think Numeric will be
pretty important for chewing through all the numbers.

goodluck! i'd like to hear if this works, and what kind of
speed it gets. like i mentioned, it can probably be weaned
even further, just have to get the 'gurus' on it. :]



Mon, 19 May 2003 03:00:00 GMT  
 Speed Tweaks

Let me throw in a suggestion of my own piled on top of Andrews....


Quote:


> >I'm writing a very simple program to do a very simple thing.  I've got a
> >data file containing three floating point numbers per line.  (Describing
> >a point in 3D space.)  I want to find the bounding box which contains
> >them all.

> There are a couple of things to do.  The performance boost which comes
> up most often is to use readlines with a sizehint over readline,

[SNIP]

1) get and install Numeric

import Numeric, string
def getBBox(input):
    data = map(float,string.split(input.readline()))
    min_d = max_d = Numeric.array(data)
    while 1:
        lines = input.readlines(1000000)
        if not lines:
            break
        data = map(float, string.split(string.join(lines))) # gets
individual #s
        adata = Numeric.reshape(data, (-1, 3)) # This gives an Nx3 array
        tmin = Numeric.minimum.reduce(adata, 0)
        tmax = Numeric.maximum.reduce(adata, 0)
        min_d = Numeric.minimum(min_d, tmin)
        max_d = Numeric.maximum(max_d, tmax)
    print min_d
    print max_d

Quote:
> Of course, I haven't actually tested the performance of these
> suggestions...

Me neither....

-tim



Mon, 19 May 2003 03:00:00 GMT  
 Speed Tweaks

Quote:

> I'm writing a very simple program to do a very simple thing.  I've got a
> data file containing three floating point numbers per line.  (Describing
> a point in 3D space.)  I want to find the bounding box which contains
> them all.  Trivial stuff, except that I've got about 200MB of data
> points.

> Currently my Python script takes 1:41 to process a 650,000 point file,
> whereas a compiled C version takes 9 seconds.  How close can Python get
> to C's performance in this task?  What's the big bottleneck?

Not sure, but, first, we can make input a bit faster.  I measured the
runtimes on a 1000-random-points file with time.clock calls around
the getBbox call, repeating the file 3 times, and with your chosen
input idiom I see (on this old Win98 box):

D:\Python2>python bb.py bb.txt
bb: processed 9999 points in file bb.txt (time 1.791896)
bb: processed 9999 points in file bb.txt (time 1.733645)
bb: processed 9999 points in file bb.txt (time 1.705533)

D:\Python2>python bb.py bb.txt
bb: processed 9999 points in file bb.txt (time 1.921506)
bb: processed 9999 points in file bb.txt (time 1.766448)
bb: processed 9999 points in file bb.txt (time 1.748607)

i.e. a best-time of around 1.70 seconds; by changing the
processing loop to:

    while 1:
        lines = input.readlines(32000)
        if not lines: break
        for line in lines:
            npo = npo+1
            line = map(float,string.split(line))
            for i in range(0,len(line)):
                if line[i] < min[i]:
                    min[i] = line[i]
                if line[i] > max[i]:
                    max[i] = line[i]

I now see:

D:\Python2>python bb1.py bb.txt
bb1: processed 9999 points in file bb.txt (time 1.581872)
bb1: processed 9999 points in file bb.txt (time 1.563185)
bb1: processed 9999 points in file bb.txt (time 1.545119)

D:\Python2>python bb1.py bb.txt
bb1: processed 9999 points in file bb.txt (time 1.685102)
bb1: processed 9999 points in file bb.txt (time 1.585440)
bb1: processed 9999 points in file bb.txt (time 1.613935)

i.e., a best-time of about 1.55 seconds, about 10% less.

Then, we can eliminate some overhead by tweaking the
by-line loop body to:

        line = map(float,line.split())
        for i in 0,1,2:
            if line[i] < min[i]:
                min[i] = line[i]
            elif line[i] > max[i]:
                max[i] = line[i]

and we now measure:

D:\Python2>python bb2.py bb.txt
bb2: processed 9999 points in file bb.txt (time 1.581455)
bb2: processed 9999 points in file bb.txt (time 1.350018)
bb2: processed 9999 points in file bb.txt (time 1.319527)

D:\Python2>python bb2.py bb.txt
bb2: processed 9999 points in file bb.txt (time 1.484811)
bb2: processed 9999 points in file bb.txt (time 1.355571)
bb2: processed 9999 points in file bb.txt (time 1.369087)

a best-time of 1.35, another 10% less from the original.

Now some more tweaking: unrolling the inner-loop and
saving a few lookups-in-global-space...:

    m=map; f=float
    xmax,ymax,zmax = m(f,input.readline().split())
    xmin,ymin,zmin = xmax,ymax,zmax
    while 1:
        lines = input.readlines(32000)
    if not lines: break
    for line in lines:
        npo = npo+1
        x,y,z = m(f,line.split())
        if x>xmax: xmax=x
        elif x<xmin: xmin=x
        if y>ymax: ymax=y
        elif y<ymin: ymin=y
        if z>zmax: zmax=z
        elif z<zmin: zmin=z
    return npo, (xmin,ymin,zmin), (xmax,ymax,zmax)

D:\Python2>python bb3.py bb.txt
bb3: processed 9999 points in file bb.txt (time 1.129470)
bb3: processed 9999 points in file bb.txt (time 1.126154)
bb3: processed 9999 points in file bb.txt (time 1.107795)

D:\Python2>python bb3.py bb.txt
bb3: processed 9999 points in file bb.txt (time 1.340585)
bb3: processed 9999 points in file bb.txt (time 1.144205)
bb3: processed 9999 points in file bb.txt (time 1.143439)

This gives a best-time of 1.11, another saving of over 10%.

No doubt more can be done, and we're still far from C
performance, but these simple, typical tweaks reduce
minimum-of-6-runs time from 1.70 to 1.11, over 1/3.

Alex



Tue, 20 May 2003 07:04:54 GMT  
 Speed Tweaks

Quote:

> I'm writing a very simple program to do a very simple thing.  I've got a
> data file containing three floating point numbers per line.  (Describing
> a point in 3D space.)  I want to find the bounding box which contains
> them all.  Trivial stuff, except that I've got about 200MB of data
> points.

> Currently my Python script takes 1:41 to process a 650,000 point file,
> whereas a compiled C version takes 9 seconds.  How close can Python get
> to C's performance in this task?  What's the big bottleneck?

After some more twiddling, I arrived at the following code which processes a
648,000 point file in ~30 seconds versus ~20 seconds for the C-code. (I
apparently have a slower machine than you.) If I recklessly extrapolate to
your machine, it should run in ~14 seconds. Of course I don't guarantee that
it's actually correct.....

-tim

def getBBox(input):
    data = map(float,string.split(input.readline()))
    min_d = max_d = Numeric.array(data)
    extra = ""
    while 1:
        raw = input.read(2000)
        if not raw and not extra:
            break
        lastN = string.rfind(raw, "\n")
        if lastN != -1:
            raw, extra = extra + raw[:lastN+1], raw[lastN+1:]
        data = map(float, string.split(raw))
        adata = Numeric.reshape(data, (-1, 3)) # This gives an Nx3 array
        tmin = Numeric.minimum.reduce(adata, 0)
        tmax = Numeric.maximum.reduce(adata, 0)
        min_d = Numeric.minimum(min_d, tmin)
        max_d = Numeric.maximum(max_d, tmax)
    print min_d
    print max_d



Tue, 20 May 2003 08:34:27 GMT  
 Speed Tweaks
Tim's version is twice as fast as my version. Duh :-)

__Janko

--
  Institut fuer Meereskunde             phone: 49-431-597 3989
  Dept. Theoretical Oceanography        fax  : 49-431-565876

  24105 Kiel, Germany



Tue, 20 May 2003 09:17:24 GMT  
 Speed Tweaks
Have found something faster :-). It uses sscanf found on
ftp.python.org.

Time for fminmax
[  6.83916778e-06   8.81031156e-05   5.12786246e-05]
[ 0.99991876  0.99989396  0.99999774]
3.40321207047
Time for getBBox
[6.8391677814400002e-06, 8.8103115558599995e-05, 5.12786245963e-05]
[0.99991875886899995, 0.99989396333699998, 0.99999773502300005]
5.61707508564

from Numeric import *
import RandomArray
import sscanf
import time

def write_data():
    fo=open('test.dat','w')
    for i in range(10):
        a=RandomArray.random((2000,3))
        for line in range(a.shape[0]):
            fo.write(' '.join( map(str,a[line,:])))
            fo.write('\n')
    fo.close()

def fminmax(input):
    minres=[]
    maxres=[]
    tstart=time.time()
    while 1:
        a=[]
        lines=input.readlines(10000)
        if not lines:
            break
        for line in lines:
            a.append(sscanf.sscanf(line,'%f %f %f'))
        minres.append(minimum.reduce(array(a),0))
        maxres.append(maximum.reduce(array(a),0))
    print minimum.reduce(array(minres),0)
    print maximum.reduce(array(maxres),0)
    print time.time()-tstart

import sys,string

def getBBox(input):
    tstart=time.time()
    max = map(float,string.split(input.readline()))
    min = max[:]
    line = input.readline()
    while (line):
        line = map(float,string.split(line))
        for i in range(0,len(line)):
            if line[i] < min[i]:
                min[i] = line[i]
            if line[i] > max[i]:
                max[i] = line[i]
        line = input.readline()
    print min
    print max
    print time.time()-tstart

if __name__ == '__main__':

    write_data()
    input=open('test.dat','r')
    print 'Time for fminmax'
    fminmax(input)
    input.close()
    input=open('test.dat','r')
    print 'Time for getBBox'
    getBBox(input)

HTH,
__Janko

--
  Institut fuer Meereskunde             phone: 49-431-597 3989
  Dept. Theoretical Oceanography        fax  : 49-431-565876

  24105 Kiel, Germany



Tue, 20 May 2003 09:09:54 GMT  
 
 [ 8 post ] 

 Relevant Pages 

1. O/T: Finite element analysis speed tweaks?

2. Speed..Speed..Speed

3. MethodDictionary Tweaking

4. MiniRubyWiki - a Calendar, numerous esthetic tweaks, everything's a page

5. Tweaking graphics modes

6. Tweaked VGA modes

7. libf2c.a complex math tweaks

8. REPOST: Suggested tweak to base64.py

9. Suggested tweak to base64.py

10. tweaking ADO RecordSet tuple?

11. Tweaking for New Co-lo

12. registry tweaking program

 

 
Powered by phpBB® Forum Software