Median of Three
Author Message Median of Three

In an early article about programming style, Kernighan and Plaugher
look at an example where the programmer is trying to compute the
minimum of three values, approximately as follows:

if (\$x > \$y) {
if (\$y > \$z) {
\$min = \$z;
} else {
\$min = \$y;
}
} else {
if (\$x > \$z) {
\$min = \$z;
} else {
\$min = \$x;
}
}

They point out that you can make the code much shorter and simpler
like this:

\$min = \$x;
if (\$min > \$y) { \$min = \$y }
if (\$min > \$z) { \$min = \$z }

This also generalizes better to more than three items.

Suppose you want to find the median of three values instead of the
minimum.  Is there an analogous short version?

rd
(\$p{\$_})&6];\$p{\$_}=/ ^\$P/ix?\$P:close\$_}keys%p}p;p;p;p;p;map{\$p{\$_}=~/^[P.]/&&
close\$_}%p;wait until\$?;map{/^r/&&<\$_>}%p;\$_=\$d[\$q];sleep rand(2)if/\S/;print

Wed, 05 May 2004 12:18:22 GMT  Median of Three

Quote:
> In an early article about programming style, Kernighan and Plaugher
> look at an example where the programmer is trying to compute the
> minimum of three values, approximately as follows:

>         if (\$x > \$y) {
>           if (\$y > \$z) {
>             \$min = \$z;
>           } else {
>             \$min = \$y;
>           }
>         } else {
>           if (\$x > \$z) {
>             \$min = \$z;
>           } else {
>             \$min = \$x;
>           }
>         }

> They point out that you can make the code much shorter and simpler
> like this:

>         \$min = \$x;
>         if (\$min > \$y) { \$min = \$y }
>         if (\$min > \$z) { \$min = \$z }

> This also generalizes better to more than three items.

> Suppose you want to find the median of three values instead of the
> minimum.  Is there an analogous short version?

I think the answer to this question is "no".  I will try to back
up that answer with some less-than-airtight arguments.  In the
process, I'll assume that x, y, and z all have different values.
This is mostly a mathematical or computer science question, not a
programming or Perl question, so I'll use language-neutral
notations in my discussion.

First: Suppose that we compare x and y, as in the outer
comparison in the long form above.  Afterward, we are able to
eliminate either x or y as the minimum value.  But regardless of
the result, either could still be the median, so we don't learn
as much from a single comparison in the median case.

Second (really just a restatement of the first): There are six
possible orderings for a triple of different real numbers:

x < y < z       x < z < y       z < x < y
y < x < z       y < z < x       z < y < x

Suppose again that we first compare x and y, as in either form
above.  Whether we are looking for the minimum or the median,
this comparison eliminates either the first or the second line of
possibilities.  If x < y, then there are only two possibilities
left for the minimum (either x or z), so we can decide in one
more comparison; but there are still three possibilities left for
the median, so we need two more comparisons.  So the median is
inherently more difficult to compute than the minimum, even if
there are only three values, and we can't really expect that the
code for it will be as simple.

Third: Suppose that we're using an array `p' of three elements
p(0), p(1), p(2), instead of three discrete values.  Let each of
the Boolean variables a, b, c the value 1 if the expression p(0)
< p(1), p(0) < p(2), p(1) < p(2), respectively, is true, 0
otherwise.  Finally, let the output be the 2-bit index into `p',
one of 00, 01, or 10 (representing 0, 1, or 2), of the minimum or
the median, as desired.  Adopting the typical digital logic text
notation of juxtaposition as logical AND, suffix of ' as logical
NOT, and addition as logical OR, we come up with the following as
minimal sum-of-products solutions for the MSB and LSB of the
minimum:
MSB: b'c'
LSB: a'c
For the median, on the other hand:
MSB: b'c + bc'
LSB: a'b'c' + abc
This looks to me like further evidence that median is more
complex than minimum, even among only three choices.

*shrug*  At this point I'm pretty convinced that there's no
simple solution.  Maybe someone cleverer than me can come up with
one.  I'd be surprised, but in a pleasant way.

--

Stanford PhD Student - MSU Alumnus - Debian Maintainer - GNU Developer
Personal webpage: http://www.msu.edu/~pfaffben

Wed, 05 May 2004 14:25:18 GMT  Median of Three
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Quote:

> In an early article about programming style, Kernighan and Plaugher
> look at an example where the programmer is trying to compute the
> minimum of three values, approximately as follows:
[snip]
> They point out that you can make the code much shorter and simpler
> like this:

>         \$min = \$x;
>         if (\$min > \$y) { \$min = \$y }
>         if (\$min > \$z) { \$min = \$z }

> This also generalizes better to more than three items.

> Suppose you want to find the median of three values instead of the
> minimum.  Is there an analogous short version?

This should work, but I don't think it's easily generalizable:

\$i = \$x; \$j = \$y;
if (\$x > \$y) {\$i = \$y; \$j = \$x;}
if (\$i > \$z) { \$med = \$i }
if (\$j > \$z) { \$med = \$z }
\$med = \$j;

The difficulty lies in the difference between the median and the
minimum: the minimum is less than everything, so it's easier to
generalize than the median, which is less than some items but
greater than others.  Thus the extra comparison (and an additional
implicit comparison) and dummy variables.

I believe the above will work if any or all of the values are
equal, which is quite nice.

- --keith

- --

public key:  http://wombat.san-francisco.ca.us/kkeller/public_key
alt.os.linux.slackware FAQ:  http://wombat.san-francisco.ca.us/perl/fom

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.4 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iEYEARECAAYFAjv2FFcACgkQhVcNCxZ5ID/sHQCfVGHc33EtbZJbeeZ1E6o8FkfP
pk4AmgInRaQ0INa4aWpAcVhDi0x7axqi
=S7jm
-----END PGP SIGNATURE-----

Wed, 05 May 2004 15:40:24 GMT  Median of Three

Quote:
> ...They point out that you can make the code much shorter and simpler
>like this:

>        \$min = \$x;
>        if (\$min > \$y) { \$min = \$y }
>        if (\$min > \$z) { \$min = \$z }

>This also generalizes better to more than three items.

>Suppose you want to find the median of three values instead of the
>minimum.  Is there an analogous short version?

In the case of three values the median is guaranteed to be closest to
the mean, so we can take the mean of the absolute value of the diffs
from the mean and then use that.  Something like.

sub median
{
my \$mean;

my \$difmin = \$difs; my \$med = \$_;
if (\$difmin > \$difs) { \$difmin = \$difs; \$med = \$_; }
if (\$difmin > \$difs) { \$difmin = \$difs; \$med = \$_; }
\$med;

Quote:
}

This doesn't work for more values, alas.
--

"Well, enough of these vague generalities. On to the vague specifics."
- Larry Wall, in "Apocalypse 3".

Thu, 06 May 2004 00:27:11 GMT  Median of Three

...

Quote:
>This should work, but I don't think it's easily generalizable:

>    \$i = \$x; \$j = \$y;
>    if (\$x > \$y) {\$i = \$y; \$j = \$x;}
>    if (\$i > \$z) { \$med = \$i }
>    if (\$j > \$z) { \$med = \$z }
>    \$med = \$j;

This code gives \$med = 5 for \$x = 3, \$y = 5, \$z = 4, so it's wrong.

--

"Well, enough of these vague generalities. On to the vague specifics."
- Larry Wall, in "Apocalypse 3".

Thu, 06 May 2004 00:28:26 GMT  Median of Three

Quote:
>> On 17 Nov 2001 11:27:11 -0500,

> In the case of three values the median is guaranteed to
> be closest to the mean, so we can take the mean of the
> absolute value of the diffs from the mean and then use
> that.  Something like.
> ...
> This doesn't work for more values, alas.

Statistics::Descriptive has a "median" method.  Seems
easiest unless you're explicitly looking for interesting
ways of finding the median.

--
Oh!  I've said too much.  Smithers, use the amnesia ray.

Thu, 06 May 2004 00:33:36 GMT  Median of Three

Quote:

>>  In an early article about programming style, Kernighan and Plaugher
>>  look at an example where the programmer is trying to compute the
>>  minimum of three values, approximately as follows:

[snip]

Quote:
>>          \$min = \$x;
>>          if (\$min > \$y) { \$min = \$y }
>>          if (\$min > \$z) { \$min = \$z }

>>  This also generalizes better to more than three items.

>>  Suppose you want to find the median of three values instead of the
>>  minimum.  Is there an analogous short version?

>I think the answer to this question is "no".  I will try to back

[snip]

Quote:
>Second (really just a restatement of the first): There are six
>possible ***orderings*** for a triple of different real numbers:

>         x < y < z       x < z < y       z < x < y
>         y < x < z       y < z < x       z < y < x

Quote:
>The difficulty lies in the difference between the median and the
>minimum: the minimum is less than everything, so it's easier to
>generalize than the median, which is ***less than some items but
>greater than others***.  Thus the extra comparison (and an additional
>implicit comparison) and dummy variables.

Seems like finding the median is topologically a sorting issue; I've
marked in *** the relevant phrases in Ben's and Keith's comments
quoted above. Using Perl's sort() and int() functions, plus array
indexing, allows:

#7

So look to sorting algorithms when thinking of how to structure the
\$x : \$y : \$z : ... comparisons. (And this seems to support the
argument that finding the median is inherently more complex than
finding the minimum...)

1;

--

- Bruce

__bruce_van_allen__santa_cruz_ca__

Thu, 06 May 2004 02:16:22 GMT  Median of Three

[mailed and posted]

Quote:
Bruce Van Allen writes:
> Seems like finding the median is topologically a sorting issue ...
> So look to sorting algorithms when thinking of how to structure the
> \$x : \$y : \$z : ... comparisons.

I think that although that's true in general, it's the wrong intuition
for this particular question.  If we were trying to find the minimum,
looking to sorting algorithms would produce a large if-else tree of
comparisons like the 'before' code in the Kernighan and Plaugher
example I posted:

if (\$x > \$y) {
if (\$y > \$z) {
\$min = \$z;
} else {
\$min = \$y;
}
} else {
if (\$x > \$z) {
\$min = \$z;
} else {
\$min = \$x;
}
}

But it turns out that simpler code, not obviously based on sorting, is
possible.

If you run quicksort on three items and look at what it is doing,
you'll find that it follows a tree very much like the one in the
'before' code.

Thu, 06 May 2004 02:48:02 GMT  Median of Three

[mailed and posted]

Quote:
> Second (really just a restatement of the first): There are six
> possible orderings for a triple of different real numbers:

>         x < y < z       x < z < y       z < x < y
>         y < x < z       y < z < x       z < y < x

> Suppose again that we first compare x and y, as in either form
> above.  Whether we are looking for the minimum or the median,
> this comparison eliminates either the first or the second line of
> possibilities.  If x < y, then there are only two possibilities
> left for the minimum (either x or z), so we can decide in one
> more comparison; but there are still three possibilities left for
> the median, so we need two more comparisons.  So the median is
> inherently more difficult to compute than the minimum, even if
> there are only three values, and we can't really expect that the
> code for it will be as simple.

That's a great argument.  It got me thinking about the
information-theoretic situation.  My thinking went something like
this: Initially, you have no information about the order; it could be
any of six possibilities.  To determine which certainly requires three
comparisons in the worst case.  But it's not clear that you do need to
determine which possibility.  At the end, you might know that \$y was
the median without knowing whether \$x or \$z was larger than \$y.  This
only requires 1.57 bits of information, so you mighbt not need more
than two comparisons to find out, if they are the right ones.  (If
this seems implausible, note that the 'improved' version of the
minimum-value code does precisely this.)

You points out that an initial comparison, say between x and y,
eliminates three possibilities, but not the three you need to narrow
down the situation for the median.  This makes sense, and I agree that
it shows that the initial comparison (if there is one) shouldn't be
directly between two of the items.

So then I got to thinking about comparisons that would eliminate three
possibilites, but not necessarily the three in the top row or the
three in the bottom row.

It occurred to me that a different way to proceed is to note that the
sequence x,y,z,x,y,z either contains an increasing sequence of three
values or a decreasing sequence of three values, but not both, and
that perhaps there's some way to find out which of these two
situations holds without knowing anything else about the relative
order of the three numbers.  And it turns out that there *is* a way to
do that:

my \$n = (\$x-\$y)*(\$y-\$z)*(\$z-\$x);
my \$s = \$n < 0 ? 'decreasing' : 'increasing';

This was heartening, but then I realized that this doesn't actually
give you any information about the median.  Any of the three numbers
could be median, regardless of whether the sequence is increasing or
decreasing.  But still, maybe it's a step in the right direction.

But then I thought, suppose z is median.  Then the difference between
x and y must be larger than the difference between x and z or the
difference between y and z, because z is between x and y.  So if we
could find the largest of the three *differences*, that would tell us
which element was in the middle!  And we already have two-comparison
code for computing the maximum.  So here's my first cut at analogous
code to produce the median:

my \$diff = abs(\$x-\$y); my \$median = \$z;
if (\$diff < abs(\$x-\$z)) { \$median = \$y; \$diff = abs(\$x-\$z) }
if (\$diff < abs(\$y-\$z)) { \$median = \$x; }

This is rather goofy, because it's so much less efficient than the
if-else tree, and unlike the analogous minimum-value code, it's not
obvious how it works.  But it does work and I don't think anyone can
say it isn't analogous.

I expect (and hope) someone will come up with an improvement to this,
but even if nobody does, I'll be happy, because my original question
asked for analogous code, and at the time I asked, I expected that the
answer would be that there was no such thing.  This is probably the
best Usenet experience I've had this year, because other folks'
articles got me thinking in a different way and I came up with an
answer I don't think I would have otherwise that I didn't think could
be found at all.

Thanks, especially to Ben and Bruce!

Thu, 06 May 2004 03:12:21 GMT  Median of Three
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Quote:

> ...
>>This should work, but I don't think it's easily generalizable:

>>        \$i = \$x; \$j = \$y;
>>        if (\$x > \$y) {\$i = \$y; \$j = \$x;}
>>        if (\$i > \$z) { \$med = \$i }
>>        if (\$j > \$z) { \$med = \$z }
>>        \$med = \$j;

> This code gives \$med = 5 for \$x = 3, \$y = 5, \$z = 4, so it's wrong.

Serves me right for trying to write math proofs late at night.
It also fails for x=6,y=5,z=4.  The way to fix it is to change
the last line to

else { \$med = \$j }

I'd implemented it as a Perl sub initially, with return statements,
then decided to make it less Perl-ish.  I'll never do it again.  :)

- --keith

- --

public key:  http://wombat.san-francisco.ca.us/kkeller/public_key
alt.os.linux.slackware FAQ:  http://wombat.san-francisco.ca.us/perl/fom

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.4 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iEYEARECAAYFAjv2wfUACgkQhVcNCxZ5ID8peACfa9X9l3GSdxoWAV6qmQWML56d
ZJMAniUlY5vWx6eteEBJ7R2DebYCx9ch
=CnxC
-----END PGP SIGNATURE-----

Thu, 06 May 2004 04:00:56 GMT  Median of Three
[snip]

Quote:
> >>  Suppose you want to find the median of three values instead of the
> >>  minimum.  Is there an analogous short version?
[snip]
> Seems like finding the median is topologically a sorting issue; I've
> marked in *** the relevant phrases in Ben's and Keith's comments
> quoted above. Using Perl's sort() and int() functions, plus array
> indexing, allows:

>      #7

> So look to sorting algorithms when thinking of how to structure the
> \$x : \$y : \$z : ... comparisons. (And this seems to support the
> argument that finding the median is inherently more complex than
> finding the minimum...)

It doesn't make sense to me to sort the entire array when what we want

kind of partial sort, as follows:

sub _sign { \$_ < 0 ? -1 : \$_ > 0 : 1 : 0 }

return if \$n < 0 || \$n > \$#\$list;

if( \$n <= \$#{\$parts} ) { \$list = \$parts; next }

if( \$n <= \$#{\$parts} ) { \$list = \$parts; last }

\$list = \$parts;
}
\$\$list[\$n];

Quote:
}

[untested]

Where sort takes O(N log N) time to act on an N item list, nthmin only
takes O(N) time [on average] with an N item list.

It's worst case times is O(N**2), but that only happens if every
partition is bad.  That's nearly impossible unless the data was
*constructed* for that to occur, and since partition selection is
randomized, it's kinda difficult to construct bad data.

NB: To make the speed of this comparable to perl's builtin sort, you
would have to write it in XS.

--
Klein bottle for rent - inquire within.

Fri, 07 May 2004 12:08:52 GMT

 Page 1 of 1 [ 11 post ]

Relevant Pages