Quote:

> Well I got a lot of replies very fast from that but to take some of you to

> task...

> The card sort IS O(n) because of the simple fact that doubling the number of

> cards EXACTLY doubles the time taken to sort those cards.

We agree on the facts and it's pointless for us to quibble over terminology, but

I point out to you the the most authoritative authors, such as Knuth and Aho,

Hopcroft, and Ullman characterize radix sort as O(n*p), where n is the number of

records and p is the number of passes. Certainly in a situation where p can be

regarded as a predetermined constant, then O(n*p)=O(n), so we do not disagree,

but if that is not the case ...

Asymptotic formulas characterizing the performance of algorithms are usually

derived over all possible inputs, not just a special subset. Also these formulas

must be applied with great caution. Several algorithms may have the same

asymptotic behavior but drastically different constants of proportionality.

Moreover there are other considerations that these formulas fail to address.

Deciding which of several algorithms to use in a specific case can be a

complicated business.

Quote:

> The sort key length is NOT related to the number of records (as someone

> claimed... ) for example if you are sorting on the last 8 columns of a card

> it is always 8 regardless of the number of cards you have.

> I agree that the number of passes depends on the key length and alpha zones

> etc but that is constant for a given sort and independent of the number of

> cards. All other sorts depend in some way on the sort key length, they all

> have to do some kind of key comparison and some keys are just physically

> faster to compare than others but since this is generally a constant for a

> given sort and not related in any way to the record count which is why the

> key length does not appear in the O( ) calculation for any sort algorithm.

The formulas for the time complexity of sort algorithms based on key comparison

are usually given in terms of the number of comparisons required. The

proportionality constant includes the cost of doing a comparison as well as any

associated subscript or pointer computations and data movement. (in contexts

where the cost of a comparison depends on n, the constant isn't really a

constant. As Marvin Minsky put it, some constants are more constant than

others.) In the case of radix sort what is being counted is not comparisons but

digit examinations.

Quote:

> (remember O( ) is not about how LONG a particular sort will take, but how

> sorting x records compares with sorting y records by the same key.)

That's one use that such a formula can be put to. It can also be used to

compare different algorithms, in which case the value of the proportionality

constant and the assumptions made to derive the formula must be considered

carefully to come to a correct conclusion.

Quote:

> Another characteristic of the card sort machine is that it makes no

> difference whatsoever whether the cards were originally in some good, bad or

> totally random order...it still takes O(n).

> It also makes no difference if some or all the cards have the same sort key,

> it still takes exactly the same time.

> (Someone mentioned Radix sort... if that's what I think it is then it

> doesn't handle duplicate keys... or am I thinking of something else?)

Card sorting in the manner we are talking about is radix sorting. Consult any

good textbook.

Quote:

> Lets think, just for fun...

> The time taken for a card sorter to sort n records with key length k

> will be C * n * k where C= some constant dependent on the speed of the

> machine. Note on a graph, this is a straight line as n is increased.

It really is A+(B+C*n)*k, where B is the between pass overhead, and A, which can

be positive or negative, adjusts for the fact the the overhead may be different

on the first and/or last passes.

Quote:

> Quicksort can sort in Q * n * log2(n) where Q is some constant dependent

> on the speed of the machine....(assume that the effect of k is too small to

> matter here) Note on a graph, this line curves up as n is increased .....

There is a big difference between these two formulas. The one for radix sorting

gives the time to sort n records on k digits independent of the original

ordering. The formula for Quicksort approximates the average time to sort n

records averaged over all n! possible permutations asymptotically as n

approaches infinity. In the worst case Quicksort can do on the order of n*2

comparisons.

Quote:

> Whatever the values of Q and C and k, these two lines will cross and when

> they do, both sorts will sort in the same wall-clock time.....

> this means that for a sufficiently humungous value of "n" an old

> mechanical card sort machine will be able to complete the sort FASTER than

> QuickSort on a supercomputer....

a) In theory, yes. In practice, work out how big n has to be for this to be

"so" and see if there is enough paper in the world to make the necessary cards

or any super computers with enough memory to store the records to be sorted. In

many cases something else breaks before the formula even becomes applicable.

b) That's why the sort packages that are generally used in large scale data

processing installations read in control staements that specify such things as

the record length and number of records in the dataset to be sorted as well as

an exact specification of the sort key and the type of comparison to be done.

The package then tailors the method to the situation.

Quote:

> I'm glad i'm not the poor old operator on that shift ;-)

I certainly agree with that.