sorting efficiency in perl 
Author Message
 sorting efficiency in perl

I'm sorting a fairly large (200000+ records) file in perl (and then
doing some other stuff to it...).  It seems to be taking a long time,
but I'm not sure if it is overly long.  Does someone know if I'd be
better off: 1) moving to some other language, 2) using system("sort ..
3) some other option that I don't know about, 4) going out for coffee.
More details - I'm sorting on a 5 field numeric key, so I've written a
little subroutine for the sort logic.

BTW - I've pushed option 4 about as far as it will go.

Most answers will be much appreciated :)

Thanks,

Mike



Sun, 29 Dec 1996 01:29:00 GMT  
 sorting efficiency in perl


Quote:

>I'm sorting a fairly large (200000+ records) file in perl (and then
>doing some other stuff to it...).  It seems to be taking a long time,
>but I'm not sure if it is overly long.  Does someone know if I'd be
>better off: 1) moving to some other language, 2) using system("sort ..
>3) some other option that I don't know about, 4) going out for coffee.
>More details - I'm sorting on a 5 field numeric key, so I've written a
>little subroutine for the sort logic.

Benchmark method 2 and 4 with a watch! The answer to your question
depends very much on system ressources, so no universal answer is
available to your question. Perl uses qsort internally. Your system
provides qsort. Your system provides virtual memory, which may be
implemented better or worse. As long as you can sort the entire array
within memory, external sort should not gain anything. But things can
become disastrous, if your VM starts doing the wrong thing.

Of course, your subroutine might be the reason, too. I don't know.

--andreas



Sun, 29 Dec 1996 15:52:18 GMT  
 sorting efficiency in perl


Quote:

>I'm sorting a fairly large (200000+ records) file in perl (and then
>doing some other stuff to it...).  It seems to be taking a long time,
>but I'm not sure if it is overly long.  Does someone know if I'd be
>better off: 1) moving to some other language, 2) using system("sort ..
>3) some other option that I don't know about, 4) going out for coffee.
>More details - I'm sorting on a 5 field numeric key, so I've written a
>little subroutine for the sort logic.

Tom Christiansen showed me a great way to sort lots of data.  In my
situation I was sorting large arrays of OIDs.  He (and others) packed
each OID down to a number and pushed it onto array. Then create yet
another array from 1 to the number of OIDs (this array hence refered
to as indexes).  These indexes are then sorted according to the packed
representation of the OID, and the resulting array is used to index
the array containing the original OID data.

Very similiar to an indirect sort (see Algorithms by Sedgewick page 104).

I hope this helps.

(raig

--
-------------------------------------------------------------
Craig H. Smith                "Try not. Do, or do not.
Graduate Student               There is no try." - Yoda
University of Oregon

-------------------------------------------------------------
'Discussion is an exchange of knowledge;
   argument an exchange of ignorance.' - Robert Quillen
-------------------------------------------------------------



Sun, 29 Dec 1996 22:48:03 GMT  
 sorting efficiency in perl

"(raig'o> Tom Christiansen showed me a great way to sort lots of data.  In my
"(raig'o> situation I was sorting large arrays of OIDs.  He (and others) packed
"(raig'o> each OID down to a number and pushed it onto array. Then create yet
"(raig'o> another array from 1 to the number of OIDs (this array hence refered
"(raig'o> to as indexes).  These indexes are then sorted according to the packed
"(raig'o> representation of the OID, and the resulting array is used to index
"(raig'o> the array containing the original OID data.

"(raig'o> Very similiar to an indirect sort (see Algorithms by Sedgewick page 104).

And also *verrrry* similar to what I just posted about sorting IP
addresses in a parallel thread.

require "rod-serling.pl"; # :-)

print "Just another Perl hacker," # but not what the media calls "hacker!" :-)

--
Name: Randal L. Schwartz / Stonehenge Consulting Services (503)777-0095
Keywords: Perl training, UNIX[tm] consulting, video production, skiing, flying

Phrase: "Welcome to Portland, Oregon ... home of the California Raisins!"



Mon, 30 Dec 1996 00:18:43 GMT  
 sorting efficiency in perl

[..some deleted..]
: And also *verrrry* similar to what I just posted about sorting IP
: addresses in a parallel thread.

: Name: Randal L. Schwartz / Stonehenge Consulting Services (503)777-0095
: Keywords: Perl training, UNIX[tm] consulting, video production, skiing, flying

: Phrase: "Welcome to Portland, Oregon ... home of the California Raisins!"

and vary (sic) similar to what I came up with (thanks for the clues Randal).
I'm seeing about a 3x speed increase.

Thanks to everyone who posted and emailed me with suggestions - they
were all very helpful.

Mike



Mon, 30 Dec 1996 02:54:42 GMT  
 
 [ 5 post ] 

 Relevant Pages 

1. re-sorting a sorted list (and sorting by date)

2. efficiency in data-processing perl script ...

3. Interpreted or compiled -- Perl efficiency

4. references for perl efficiency

5. long string efficiencies in perl

6. Efficiency/New Perl Book: return @array vs \@array?

7. perl memory efficiency

8. Efficiency/New Perl Book: return @array vs \@array

9. Perl efficiency question

10. efficiency, perl vs C

11. perl efficiency tip

12. references for perl efficiency

 

 
Powered by phpBB® Forum Software