randomize.c 
Author Message
 randomize.c


Quote:
>The following short C program reads in lines from stdin, randomizes
>their order, and then writes out the newly ordered lines to stdout.

I couldn't resist rewriting it in perl.

#! /usr/bin/perl

$limit = $#lines;
$first = 0;
srand;

while ($first <= $limit) {
   $r = rand($limit - $first) + $first;
   print $lines[$r];
   $lines[$r] = $lines[$first++];

Quote:
}

--




Thu, 16 Feb 1995 11:41:21 GMT  
 randomize.c

Quote:

>   >The following short C program reads in lines from stdin, randomizes
>   >their order, and then writes out the newly ordered lines to stdout.

>   I couldn't resist rewriting it in perl.

>   [lots of lines deleted]


--
Gisle Aas               |  snail: Boks 114 Blindern, N-0314 Oslo, Norway
Norsk Regnesentral      |  X.400: G=Gisle;S=Aas;O=nr;P=uninett;C=no



Fri, 17 Feb 1995 20:36:20 GMT  
 randomize.c

I considered using splice and decided not to take the risk of a
possibly inefficient implementation.  Later I was told that the perl
book already has such a program.

Just how efficient is splice?  I guess it depends on whether arrays in
perl are internally implemented as linked lists.
--




Sat, 18 Feb 1995 02:39:03 GMT  
 randomize.c

Because it's not random?

Try

--

This .signature virus is a self-referential statement that is true - but
you will only be able to consistently believe it if you copy it to your own
.signature file!



Sat, 18 Feb 1995 07:15:32 GMT  
 randomize.c

:

A rand() returns a number *exclusive* of its endpoint, so you'd really want


: I considered using splice and decided not to take the risk of a
: possibly inefficient implementation.  Later I was told that the perl
: book already has such a program.

Unless you know how things are implemented, it's best to try it both
ways and see which is faster.  Most builtins are fairly efficient,
though you can occasionally beat them if you have a problem that's
weird in some way.  This one's weird enough that you might be able
to beat splice by copying the actual strings like you did.  As it happens,
on a sparc the splice wins by about 20%.

In general, you can assume that I haven't been totally stupid.  :-)

: Just how efficient is splice?  I guess it depends on whether arrays in
: perl are internally implemented as linked lists.

Arrays are *not* implemented as linked lists.  That would be tragic.
It would force all subscripting operations (including splice) to be O(n).
Yech.

Arrays are implemented in C as an array of pointers to scalar headers
which in turn point to the actual values.  Splice merely needs to copy
some number of pointers to fill the hole.  It figures out whether to
copy up from the front or down from the back.  It's true that that's
also O(n), or maybe O(n/2) or so, but machines are often optimized to
move blocks of memory quickly, and a splice on the middle of an array
is fairly rare as operations go.  Subscripting, however, happens
constantly, so we optimize for that.

Larry



Sun, 19 Feb 1995 05:03:32 GMT  
 randomize.c

Quote:
>This one's weird enough that you might be able
>to beat splice by copying the actual strings like you did.  As it happens,
>on a sparc the splice wins by about 20%.

Taking this advice, I went ahead and tried it!  I ran my benchmarks on
a SPARCstation-1+ with 64 megs of main memory.  In addition to
"ize.copy" (my original perl version, which copies lines) and "splice"
(splice version, from camel book p20), I also tried "ize.ind", which
only copies integers, at the cost of an extra level of indirection for
subscripting.

This turned out to be quite a bit of fun.

     program     no. of data lines     real time, min:sec
     -------     ----------            ------------------
     ize.copy      1,914               0:01.42
     ize.ind       1,914               0:01.35
     splice        1,914               0:01.06
     ize.copy     78,474               0:53
     ize.ind      78,474               1:01
     splice       78,474               6:51

For typical uses (small amount of input), there isn't much difference.
For atypical uses (e.g., randomizing 40 csh manuals), the traditional
array method wins over splice.

Session transcript follows, and then sources for each program.

===== begin edited session log =====
% ls -F
ize.copy*  ize.ind*   splice*
% man csh | col -b > small.data
% repeat 40 cat small.data >> big.data
% wc -l *.data
   76560 big.data
    1914 small.data
   78474 total
% time ize.copy < small.data > junk
1.1u 0.2s 0:01.41 97.1% 0+430k 3+12io 0pf+0w
% time ize.ind < small.data > junk
1.1u 0.2s 0:01.36 101.4% 0+468k 0+12io 0pf+0w
% time splice < small.data > junk
0.8u 0.2s 0:01.06 102.8% 0+422k 0+8io 0pf+0w
% time ize.copy < small.data > junk
1.1u 0.2s 0:01.35 101.4% 0+429k 0+9io 0pf+0w
% time ize.ind < small.data > junk
1.1u 0.2s 0:01.34 101.4% 0+469k 0+9io 0pf+0w
% time splice < small.data > junk
0.8u 0.2s 0:01.06 102.8% 0+420k 0+9io 0pf+0w
% time ize.copy < big.data > junk
43.3u 8.5s 0:52.76 98.2% 0+8217k 1+363io 0pf+0w
% time ize.ind < big.data > junk
42.8u 17.5s 1:01.76 97.8% 0+10278k 2+361io 0pf+0w
% time splice < big.data > junk
379.8u 14.9s 6:52.05 95.7% 0+8861k 3+384io 0pf+0w
% time ize.copy < big.data > junk
43.1u 8.8s 0:52.94 98.2% 0+8179k 7+362io 0pf+0w
% time ize.ind < big.data > junk
42.9u 15.5s 0:59.25 98.7% 0+10226k 4+362io 0pf+0w
% time splice < big.data > junk
379.4u 12.5s 6:51.03 95.3% 0+8853k 10+385io 0pf+0w

===== begin ize.copy =====
#! /usr/bin/perl

$limit = $#lines;
$first = 0;
srand;

while ($first <= $limit) {
   $r = rand($limit - $first) + $first;
   print $lines[$r];
   $lines[$r] = $lines[$first++];

Quote:
}

===== begin ize.ind =====
#! /usr/bin/perl


$first = 0;
srand;

while ($first <= $limit) {
   $r = rand($limit - $first) + $first;
   print $lines[$ind[$r]];
   $ind[$r] = $first++;

Quote:
}

===== begin splice =====
#! /usr/bin/perl
# from camel book p20



Quote:
}

===== end =====
--




Sun, 19 Feb 1995 16:54:15 GMT  
 randomize.c

Quote:
>      ......                      Subscripting, however, happens
>constantly, so we optimize for that.

Are associative arrays optimized?  For example, many times I do:

if ( $foo{$thing} ne $frob ) {
        $foo{$thing} += $fratz;

Quote:
}

Is there a one-entry cache or does it traverse the hash tables twice?

Tom
--

"Oh! I thought it was one of those useless demos of everything that
    a GUI builder could do." -Anonymous person watching demo of
         Solaris 2.0's graphical tool for managing NIS+



Sun, 19 Feb 1995 21:50:55 GMT  
 randomize.c

:
: >      ......                      Subscripting, however, happens
: >constantly, so we optimize for that.
:
: Are associative arrays optimized?  For example, many times I do:
:
: if ( $foo{$thing} ne $frob ) {
:       $foo{$thing} += $fratz;
: }
:
: Is there a one-entry cache or does it traverse the hash tables twice?

Well, traverse is perhaps a misleading word, since it usually lands
right on the entry in question.  But it does calculate the hash value
twice.  It doesn't know offhand that the value of $thing hasn't changed
in the meanwhile.  It could be made to.  But you lose most of the
advantage if you have to compare the two strings to find out.  The right
way to do that is to associate the hash value with the string, and
invalidate it if the string changes.  I'll need to redo how scalars
are stored for Perl 5 anyway, so something like that might sneak in.
Certainly an evaled string ought to cache its parse tree, and this is
quite similar.

Larry



Tue, 21 Feb 1995 03:05:04 GMT  
 
 [ 9 post ] 

 Relevant Pages 

1. How to randomize a list?

2. Summary: how to randomize a list

3. randomize list

4. Randomize in Perl

5. Randomize in Perl

6. Randomize number between -0.5 and 0.5

7. randomized file "loader"

8. randomizing results from a mysql database

9. Randomizing a list

10. Subject : randomizing a hash

11. Randomize array.....

12. Randomize Please Help

 

 
Powered by phpBB® Forum Software