Q on Guttman-Rosler 
Author Message
 Q on Guttman-Rosler

  k> I want to sort an array of records.  The records are strings
  k> consisting of tab-delimited fields.  The field I want to sort by
  k> represents a float (in ASCII).  Furthermore, the first field
  k> represents the record's (1-based) position in the array.  Therefore,
  k> after sorting the array, I must update the value of the first field
  k> in each record.

  k> Currently, I do something like:

  k>   my $i = 1;


  k>       sort { $b->[5] <=> $a->[5] }

  k> but I would like to use the Guttman-Rosler Transform (packed-default
  k> sort) instead.

  k> It seems to me that the best way to extract the sort field, and to
  k> update the position field afterwards, is to first transform the
  k> record into an anonymous array (as I do above).  The problem is
  k> that I don't know how to turn this anonymous array into a payload
  k> suitable for the packed-default sort, and how to append it to the
  k> packed key.  Any help would be much appreciated.

since you need to change a field later, why not keep the split records
around and just sort on their floats with their index numbers as the
payload.

the paper at http://www.*-*-*.com/
packed sort (either endian) for a float. just append the index number
and that will be used to reorder the input records to output
records. and you can then add the updated index number (similar to what
you do above.

this is rough (untested) code for how i would do it. it could be
converted to the tighter map/sort/map if desired but it is fairly
complex so i would keep it this way.

# this was copied from the sort paper. it could be improved to assign
# into the symbol table one of two different subs. it would make the
# endian test be purely compile time.

BEGIN {
  my $big_endian =
    pack('N', 1) eq
    pack('L', 1);
  sub double_sort ($) {
    ($big_endian ?
      pack 'd', $_[0] :
      reverse pack 'd', $_[0]) ^
      ($_[0] < 0 ? "\xFF" x 8 :
          "\x80" . "\x00" x 7)
  }

Quote:
}

# untested but should work and be faster than the above:

BEGIN {
  no strict 'refs' ;

  *{'double_sort'} = pack('N', 1) eq pack('L', 1) ?
                sub ($) { pack 'd', $_[0] ^
                                ($_[0] < 0 ?
                                        "\xFF" x 8 : "\x80" . "\x00" x 7)
                } :
                sub ($) {
                        reverse pack 'd', $_[0] ^
                                ($_[0] < 0 ?
                                        "\xFF" x 8 : "\x80" . "\x00" x 7)
                } ;
   }

# this just splits the records into a LoL


# this creates a packed sort key based on the float with the index for payload


# this gets the sorted indices and uses them to to index into the
# pre-sorted records. it then assigns the new index number and makes a
# tabbed record out of it.

my $i ;

                my $row = $pre_sort_records[ $_ ] ;
                $row->[0] = ++$i ;


--

----- Stem and Perl Development, Systems Architecture, Design and Coding ----
Search or Offer Perl Jobs  ----------------------------   http://www.*-*-*.com/
Damian Conway Perl Classes - January 2003 -- http://www.*-*-*.com/



Sun, 24 Jul 2005 20:48:37 GMT  
 Q on Guttman-Rosler
Quote:

> I want to sort an array of records.  The records are strings
> consisting of tab-delimited fields.  The field I want to sort by
> represents a float (in ASCII).  Furthermore, the first field
> represents the record's (1-based) position in the array.  Therefore,
> after sorting the array, I must update the value of the first field
> in each record.

[snip]

Try this:

   BEGIN {
      8 == length pack 'd'
         or die "Expected double length to be 8";
   }
   use constant BIGENDIAN => pack('N', 1) eq pack('L', 1);
   my $i = 0;

      grep substr($_, 0, index($_, "\t", 8)-1, ++$i),
      sort
      map {
         my $n = (split /\t/, $_, 6)[5];
         (BIGENDIAN ?
            pack('d', $n) : reverse pack('d', $n))
            ^ ($n < 0 ? "\xFF" x 8 : "\x80")) . $_

   __END__

[untested]

By avoiding subroutine calls, and because the BIGENDIAN ?...:... gets
constant folded so only the code for one of the two branches remains,
this *should* be faster than either of the solutions uri posted.

--
"So, who beat the clueless idiot today?"
"Well, we flipped for it, but when Kuno
 landed, he wasn't in any shape to fight."
"Next time, try flipping a *coin.*"



Mon, 25 Jul 2005 01:22:12 GMT  
 Q on Guttman-Rosler

Quote:


> >Try this:
[some code]
> That's a nice optimization; it is indeed faster than the version
> with the sub calls.  It had three bugs/typos that I've fixed below:

>    BEGIN {
>       8 == length pack 'd'
>          or die "Expected double length to be 8";
>    }
>    use constant BIGENDIAN => pack('N', 1) eq pack('L', 1);
>    my $i = 0;

>       grep substr($_, 0, index($_, "\t", 8), ++$i),

Doh!  Since index() returns a 0 based index, and the third arg of substr
is a length, it's already 1 less than the length, so I didn't need to
subtract 1.  Thanks.

Quote:
>       sort
>       map {
>          my $n = (split /\t/, $_, 6)[5];
>          (BIGENDIAN ?
>             pack('d', $n) : reverse pack('d', $n))
>             ^ ($n < 0 ? "\xFF" x 8 : "\x80" . "\x00" x 7) . $_

I see two changes here -- parens, and the ' . "\x00" x 7' part.  There
is no effect at all from xoring with "\x00" bytes... I *intentionally*
left that out.  As to the parens, I left out a left paren, and you
'fixed' it by removing a right paren.  So, it should be:

         ((BIGENDIAN ?
            pack('d', $n) : reverse pack('d', $n))
            ^ ($n < 0 ? "\xFF" x 8 : "\x80")) . $_

(This may be effectively identical to leaving out those parens, but it's
generally clearer to have more parens than fewer.)

Quote:

>    __END__

> kj

--
"So, who beat the clueless idiot today?"
"Well, we flipped for it, but when Kuno
 landed, he wasn't in any shape to fight."
"Next time, try flipping a *coin.*"


Mon, 25 Jul 2005 19:41:44 GMT  
 
 [ 3 post ] 

 Relevant Pages 

1. looking for guttman scaling algorithm

2. Larry Rosler interview on perl.com!

3. Stanford Perl Mongers 10/26: Larry Rosler

4. Cancelled by Larry Rosler

5. Perl 4a12e Solaris 2.3 build Qs

6. 2 Perl Style Qs

7. : novice Qs: minimal input loops, field specifiers

8. Qs on writing an HTTPS client

9. Multidimensional asociative array syntax Qs

10. Qs on writing an HTTPS client

11. Qs on Perl memory usage on Linux and Win32 systems

12. Qs for professional Perl/CGI developers

 

 
Powered by phpBB® Forum Software