Author 
Message 
Mark Jason Domin #1 / 11

Median of Three
In an early article about programming style, Kernighan and Plaugher look at an example where the programmer is trying to compute the minimum of three values, approximately as follows: if ($x > $y) { if ($y > $z) { $min = $z; } else { $min = $y; } } else { if ($x > $z) { $min = $z; } else { $min = $x; } } They point out that you can make the code much shorter and simpler like this: $min = $x; if ($min > $y) { $min = $y } if ($min > $z) { $min = $z } This also generalizes better to more than three items. Suppose you want to find the median of three values instead of the minimum. Is there an analogous short version?
rd ($p{$_})&6];$p{$_}=/ ^$P/ix?$P:close$_}keys%p}p;p;p;p;p;map{$p{$_}=~/^[P.]/&& close$_}%p;wait until$?;map{/^r/&&<$_>}%p;$_=$d[$q];sleep rand(2)if/\S/;print

Wed, 05 May 2004 12:18:22 GMT 


Ben Pfaf #2 / 11

Median of Three
Quote: > In an early article about programming style, Kernighan and Plaugher > look at an example where the programmer is trying to compute the > minimum of three values, approximately as follows: > if ($x > $y) { > if ($y > $z) { > $min = $z; > } else { > $min = $y; > } > } else { > if ($x > $z) { > $min = $z; > } else { > $min = $x; > } > } > They point out that you can make the code much shorter and simpler > like this: > $min = $x; > if ($min > $y) { $min = $y } > if ($min > $z) { $min = $z } > This also generalizes better to more than three items. > Suppose you want to find the median of three values instead of the > minimum. Is there an analogous short version?
I think the answer to this question is "no". I will try to back up that answer with some lessthanairtight arguments. In the process, I'll assume that x, y, and z all have different values. This is mostly a mathematical or computer science question, not a programming or Perl question, so I'll use languageneutral notations in my discussion. First: Suppose that we compare x and y, as in the outer comparison in the long form above. Afterward, we are able to eliminate either x or y as the minimum value. But regardless of the result, either could still be the median, so we don't learn as much from a single comparison in the median case. Second (really just a restatement of the first): There are six possible orderings for a triple of different real numbers: x < y < z x < z < y z < x < y y < x < z y < z < x z < y < x Suppose again that we first compare x and y, as in either form above. Whether we are looking for the minimum or the median, this comparison eliminates either the first or the second line of possibilities. If x < y, then there are only two possibilities left for the minimum (either x or z), so we can decide in one more comparison; but there are still three possibilities left for the median, so we need two more comparisons. So the median is inherently more difficult to compute than the minimum, even if there are only three values, and we can't really expect that the code for it will be as simple. Third: Suppose that we're using an array `p' of three elements p(0), p(1), p(2), instead of three discrete values. Let each of the Boolean variables a, b, c the value 1 if the expression p(0) < p(1), p(0) < p(2), p(1) < p(2), respectively, is true, 0 otherwise. Finally, let the output be the 2bit index into `p', one of 00, 01, or 10 (representing 0, 1, or 2), of the minimum or the median, as desired. Adopting the typical digital logic text notation of juxtaposition as logical AND, suffix of ' as logical NOT, and addition as logical OR, we come up with the following as minimal sumofproducts solutions for the MSB and LSB of the minimum: MSB: b'c' LSB: a'c For the median, on the other hand: MSB: b'c + bc' LSB: a'b'c' + abc This looks to me like further evidence that median is more complex than minimum, even among only three choices. *shrug* At this point I'm pretty convinced that there's no simple solution. Maybe someone cleverer than me can come up with one. I'd be surprised, but in a pleasant way. 
Stanford PhD Student  MSU Alumnus  Debian Maintainer  GNU Developer Personal webpage: http://www.msu.edu/~pfaffben

Wed, 05 May 2004 14:25:18 GMT 


Keith Kell #3 / 11

Median of Three
BEGIN PGP SIGNED MESSAGE Hash: SHA1
Quote: > In an early article about programming style, Kernighan and Plaugher > look at an example where the programmer is trying to compute the > minimum of three values, approximately as follows: [snip] > They point out that you can make the code much shorter and simpler > like this: > $min = $x; > if ($min > $y) { $min = $y } > if ($min > $z) { $min = $z } > This also generalizes better to more than three items. > Suppose you want to find the median of three values instead of the > minimum. Is there an analogous short version?
This should work, but I don't think it's easily generalizable: $i = $x; $j = $y; if ($x > $y) {$i = $y; $j = $x;} if ($i > $z) { $med = $i } if ($j > $z) { $med = $z } $med = $j; The difficulty lies in the difference between the median and the minimum: the minimum is less than everything, so it's easier to generalize than the median, which is less than some items but greater than others. Thus the extra comparison (and an additional implicit comparison) and dummy variables. I believe the above will work if any or all of the values are equal, which is quite nice.  keith  
public key: http://wombat.sanfrancisco.ca.us/kkeller/public_key alt.os.linux.slackware FAQ: http://wombat.sanfrancisco.ca.us/perl/fom BEGIN PGP SIGNATURE Version: GnuPG v1.0.4 (GNU/Linux) Comment: For info see http://www.gnupg.org iEYEARECAAYFAjv2FFcACgkQhVcNCxZ5ID/sHQCfVGHc33EtbZJbeeZ1E6o8FkfP pk4AmgInRaQ0INa4aWpAcVhDi0x7axqi =S7jm END PGP SIGNATURE

Wed, 05 May 2004 15:40:24 GMT 


James Wetter #4 / 11

Median of Three
Quote: > ...They point out that you can make the code much shorter and simpler >like this: > $min = $x; > if ($min > $y) { $min = $y } > if ($min > $z) { $min = $z } >This also generalizes better to more than three items. >Suppose you want to find the median of three values instead of the >minimum. Is there an analogous short version?
In the case of three values the median is guaranteed to be closest to the mean, so we can take the mean of the absolute value of the diffs from the mean and then use that. Something like. sub median { my $mean;
my $difmin = $difs[0]; my $med = $_[0]; if ($difmin > $difs[1]) { $difmin = $difs[1]; $med = $_[1]; } if ($difmin > $difs[2]) { $difmin = $difs[2]; $med = $_[2]; } $med; Quote: }
This doesn't work for more values, alas. 
"Well, enough of these vague generalities. On to the vague specifics."  Larry Wall, in "Apocalypse 3".

Thu, 06 May 2004 00:27:11 GMT 


James Wetter #5 / 11

Median of Three
... Quote: >This should work, but I don't think it's easily generalizable: > $i = $x; $j = $y; > if ($x > $y) {$i = $y; $j = $x;} > if ($i > $z) { $med = $i } > if ($j > $z) { $med = $z } > $med = $j;
This code gives $med = 5 for $x = 3, $y = 5, $z = 4, so it's wrong. 
"Well, enough of these vague generalities. On to the vague specifics."  Larry Wall, in "Apocalypse 3".

Thu, 06 May 2004 00:28:26 GMT 


Tony Curti #6 / 11

Median of Three
Quote: >> On 17 Nov 2001 11:27:11 0500,
> In the case of three values the median is guaranteed to > be closest to the mean, so we can take the mean of the > absolute value of the diffs from the mean and then use > that. Something like. > ... > This doesn't work for more values, alas.
Statistics::Descriptive has a "median" method. Seems easiest unless you're explicitly looking for interesting ways of finding the median.  Oh! I've said too much. Smithers, use the amnesia ray.

Thu, 06 May 2004 00:33:36 GMT 


Bruce Van Alle #7 / 11

Median of Three
Quote:
>> In an early article about programming style, Kernighan and Plaugher >> look at an example where the programmer is trying to compute the >> minimum of three values, approximately as follows:
[snip] Quote: >> $min = $x; >> if ($min > $y) { $min = $y } >> if ($min > $z) { $min = $z } >> This also generalizes better to more than three items. >> Suppose you want to find the median of three values instead of the >> minimum. Is there an analogous short version? >I think the answer to this question is "no". I will try to back
[snip] Quote: >Second (really just a restatement of the first): There are six >possible ***orderings*** for a triple of different real numbers: > x < y < z x < z < y z < x < y > y < x < z y < z < x z < y < x
Quote: >The difficulty lies in the difference between the median and the >minimum: the minimum is less than everything, so it's easier to >generalize than the median, which is ***less than some items but >greater than others***. Thus the extra comparison (and an additional >implicit comparison) and dummy variables.
Seems like finding the median is topologically a sorting issue; I've marked in *** the relevant phrases in Ben's and Keith's comments quoted above. Using Perl's sort() and int() functions, plus array indexing, allows:
#7 So look to sorting algorithms when thinking of how to structure the $x : $y : $z : ... comparisons. (And this seems to support the argument that finding the median is inherently more complex than finding the minimum...) 1;   Bruce __bruce_van_allen__santa_cruz_ca__

Thu, 06 May 2004 02:16:22 GMT 


MarkJason Dominu #8 / 11

Median of Three
[mailed and posted] Quote: Bruce Van Allen writes: > Seems like finding the median is topologically a sorting issue ... > So look to sorting algorithms when thinking of how to structure the > $x : $y : $z : ... comparisons.
I think that although that's true in general, it's the wrong intuition for this particular question. If we were trying to find the minimum, looking to sorting algorithms would produce a large ifelse tree of comparisons like the 'before' code in the Kernighan and Plaugher example I posted: if ($x > $y) { if ($y > $z) { $min = $z; } else { $min = $y; } } else { if ($x > $z) { $min = $z; } else { $min = $x; } } But it turns out that simpler code, not obviously based on sorting, is possible. If you run quicksort on three items and look at what it is doing, you'll find that it follows a tree very much like the one in the 'before' code.

Thu, 06 May 2004 02:48:02 GMT 


MarkJason Dominu #9 / 11

Median of Three
[mailed and posted] Quote: > Second (really just a restatement of the first): There are six > possible orderings for a triple of different real numbers: > x < y < z x < z < y z < x < y > y < x < z y < z < x z < y < x > Suppose again that we first compare x and y, as in either form > above. Whether we are looking for the minimum or the median, > this comparison eliminates either the first or the second line of > possibilities. If x < y, then there are only two possibilities > left for the minimum (either x or z), so we can decide in one > more comparison; but there are still three possibilities left for > the median, so we need two more comparisons. So the median is > inherently more difficult to compute than the minimum, even if > there are only three values, and we can't really expect that the > code for it will be as simple.
That's a great argument. It got me thinking about the informationtheoretic situation. My thinking went something like this: Initially, you have no information about the order; it could be any of six possibilities. To determine which certainly requires three comparisons in the worst case. But it's not clear that you do need to determine which possibility. At the end, you might know that $y was the median without knowing whether $x or $z was larger than $y. This only requires 1.57 bits of information, so you mighbt not need more than two comparisons to find out, if they are the right ones. (If this seems implausible, note that the 'improved' version of the minimumvalue code does precisely this.) You points out that an initial comparison, say between x and y, eliminates three possibilities, but not the three you need to narrow down the situation for the median. This makes sense, and I agree that it shows that the initial comparison (if there is one) shouldn't be directly between two of the items. So then I got to thinking about comparisons that would eliminate three possibilites, but not necessarily the three in the top row or the three in the bottom row. It occurred to me that a different way to proceed is to note that the sequence x,y,z,x,y,z either contains an increasing sequence of three values or a decreasing sequence of three values, but not both, and that perhaps there's some way to find out which of these two situations holds without knowing anything else about the relative order of the three numbers. And it turns out that there *is* a way to do that:
my $n = ($x$y)*($y$z)*($z$x); my $s = $n < 0 ? 'decreasing' : 'increasing';
This was heartening, but then I realized that this doesn't actually give you any information about the median. Any of the three numbers could be median, regardless of whether the sequence is increasing or decreasing. But still, maybe it's a step in the right direction. But then I thought, suppose z is median. Then the difference between x and y must be larger than the difference between x and z or the difference between y and z, because z is between x and y. So if we could find the largest of the three *differences*, that would tell us which element was in the middle! And we already have twocomparison code for computing the maximum. So here's my first cut at analogous code to produce the median: my $diff = abs($x$y); my $median = $z; if ($diff < abs($x$z)) { $median = $y; $diff = abs($x$z) } if ($diff < abs($y$z)) { $median = $x; } This is rather goofy, because it's so much less efficient than the ifelse tree, and unlike the analogous minimumvalue code, it's not obvious how it works. But it does work and I don't think anyone can say it isn't analogous. I expect (and hope) someone will come up with an improvement to this, but even if nobody does, I'll be happy, because my original question asked for analogous code, and at the time I asked, I expected that the answer would be that there was no such thing. This is probably the best Usenet experience I've had this year, because other folks' articles got me thinking in a different way and I came up with an answer I don't think I would have otherwise that I didn't think could be found at all. Thanks, especially to Ben and Bruce!

Thu, 06 May 2004 03:12:21 GMT 


Keith Kell #10 / 11

Median of Three
BEGIN PGP SIGNED MESSAGE Hash: SHA1
Quote:
> ... >>This should work, but I don't think it's easily generalizable: >> $i = $x; $j = $y; >> if ($x > $y) {$i = $y; $j = $x;} >> if ($i > $z) { $med = $i } >> if ($j > $z) { $med = $z } >> $med = $j; > This code gives $med = 5 for $x = 3, $y = 5, $z = 4, so it's wrong.
Serves me right for trying to write math proofs late at night. It also fails for x=6,y=5,z=4. The way to fix it is to change the last line to else { $med = $j } I'd implemented it as a Perl sub initially, with return statements, then decided to make it less Perlish. I'll never do it again. :)  keith  
public key: http://wombat.sanfrancisco.ca.us/kkeller/public_key alt.os.linux.slackware FAQ: http://wombat.sanfrancisco.ca.us/perl/fom BEGIN PGP SIGNATURE Version: GnuPG v1.0.4 (GNU/Linux) Comment: For info see http://www.gnupg.org iEYEARECAAYFAjv2wfUACgkQhVcNCxZ5ID8peACfa9X9l3GSdxoWAV6qmQWML56d ZJMAniUlY5vWx6eteEBJ7R2DebYCx9ch =CnxC END PGP SIGNATURE

Thu, 06 May 2004 04:00:56 GMT 


Benjamin Goldber #11 / 11

Median of Three
[snip] Quote: > >> Suppose you want to find the median of three values instead of the > >> minimum. Is there an analogous short version? [snip] > Seems like finding the median is topologically a sorting issue; I've > marked in *** the relevant phrases in Ben's and Keith's comments > quoted above. Using Perl's sort() and int() functions, plus array > indexing, allows:
> #7 > So look to sorting algorithms when thinking of how to structure the > $x : $y : $z : ... comparisons. (And this seems to support the > argument that finding the median is inherently more complex than > finding the minimum...)
It doesn't make sense to me to sort the entire array when what we want
kind of partial sort, as follows: sub _sign { $_[0] < 0 ? 1 : $_[0] > 0 : 1 : 0 }
return if $n < 0  $n > $#$list;
if( $n <= $#{$parts[0]} ) { $list = $parts[0]; next }
if( $n <= $#{$parts[1]} ) { $list = $parts[1]; last }
$list = $parts[2]; } $$list[$n]; Quote: }
[untested] Where sort takes O(N log N) time to act on an N item list, nthmin only takes O(N) time [on average] with an N item list. It's worst case times is O(N**2), but that only happens if every partition is bad. That's nearly impossible unless the data was *constructed* for that to occur, and since partition selection is randomized, it's kinda difficult to construct bad data. NB: To make the speed of this comparable to perl's builtin sort, you would have to write it in XS.  Klein bottle for rent  inquire within.

Fri, 07 May 2004 12:08:52 GMT 


