Card Columns (was Why did they make ... ?) 
Author Message
 Card Columns (was Why did they make ... ?)

Quote:

> A card-sorter-like sort _is_ possible, and can be faster than QuickSort
> when measured in actual machine cycles if procedure-call overhead is
> high -- which it is in a recursive context in generalized mainframe
> assembler, due to the lack of a stack.

Lets not forget many sorts are sensitive to the sort order
of the data already in sort order. I think quicksort is rather
bad in this area. Is that type of sort called a merge sort?
Very useful if core* memory is limited on your machine.
Ben.
* program memory.


Tue, 05 Jul 2005 08:30:44 GMT  
 Card Columns (was Why did they make ... ?)
On Thu, 16 Jan 2003 16:04:56 -0800, "James J. Weinkam"

Quote:


>> With a large scaling factor k where k is the key length
>> determining the number of passes required. Adding 1 to k has the
>> same effect on time required as doubling n: so the time
>> complexity is O(n*2^(k-1)), linear in n, exponential in k, does
>> that sound about right?  

>No.

>The time complexity is O(n*k), assuming no columns contain alpahnumeric data and
>ignoring time between passes.  Of course, if you want to claim that k is a fixed

I think you're right -- I was over-complexifying the analysis!
What are you sorting if not alphanumeric data?
Do you mean only numeric data, or only enough codes to fit in the
pockets?
And we are talking about a card sorter, right?

Quote:
>pre determined constant then you can claim it is O(n)!  But really k varies from
>problem to problem and can't honestly be claimed to be a constant unless you
>want to use k=80.  It is true that once k is fixed, if you double n you double
>the time.

Which is what the previous poster claimed.

Thanks. Take care, Brian Inglis         Calgary, Alberta, Canada
--

    fake address                use address above to reply






Tue, 05 Jul 2005 08:39:57 GMT  
 Card Columns (was Why did they make ... ?)
Russ - I did program fortran, BASIC and a bunch of other languages
with a Baudot model 15.

.gt.  .eq.  etc were how the special characters were specified, and a
translate
program was used to create the source for submission. (this was on a Project
Genie machine in the early 60's.)

PL/I, however, was the first language that I know of that actually supported
an alternate character set from day 1.

/s/ Bill Turner, wb4alm

Quote:



>>Moreover some ttys could print only 72 columns.

>This is true, but I'm certain that the 72-column convention in
>programming languages was well-established on punched cards for
>the reason I gave. Programming was done on punched cards well
>before systems advanced to the stage where interaction from a
>terminal, for programming or other time-sharing functions, was
>very common.

>I'd need to see some evidence that this isn't just a coincidence.
>I have an old Teletype Model 15 sitting in my garage, vintage WW
>II. And it has exactly 72 print columns. It also has a very
>limited 5-bit character set that doesn't include things like '=',
>'+', or '*'.  It would be hard to do FORTRAN with that, much less
>PL/I or C.  The ASCII TTY didn't appear until the mid-'60s.



Tue, 05 Jul 2005 09:16:16 GMT  
 Card Columns (was Why did they make ... ?)
Well I got a lot of replies very fast from that but to take some of you to
task...

The card sort IS O(n) because of the simple fact that doubling the number of
cards EXACTLY doubles the time taken to sort those cards.

 The sort key length is NOT related to the number of records (as someone
claimed... ) for example if you are sorting on the last 8 columns of a card
it is always 8 regardless of the number of cards you have.

I agree that the number of passes depends on the key length and alpha zones
etc but that is constant for a given sort and independent of the number of
cards.  All other sorts depend in some way on the sort key length, they all
have to do some kind of key comparison and some keys are just physically
faster to compare than others but since this is generally a constant for a
given sort and not related in any way to the record count which is why the
key length does not appear in the O( ) calculation for any sort algorithm.
(remember O( ) is not about how LONG a particular sort will take, but how
sorting x records compares with sorting y records by the same key.)

Another characteristic of the card sort machine is that it makes no
difference whatsoever whether the cards were originally in some good, bad or
totally random order...it still takes O(n).
It also makes no difference if some or all the cards have the same sort key,
it still takes exactly the same time.

(Someone mentioned Radix sort... if that's what I think it is then it
doesn't handle duplicate keys... or am I thinking of something else?)

Lets think, just for fun...
    The time taken for a card sorter to sort n records with key length k
will be  C * n * k  where C= some constant dependent on the speed of the
machine. Note on a graph, this is a straight line as n is increased.

 Quicksort can sort in Q * n * log2(n)  where Q is some constant dependent
on the speed of the machine....(assume that the effect of k is too small to
matter here)    Note on a graph, this line curves up as n is increased .....

Whatever the values of Q and C and k, these two lines will cross and when
they do, both sorts will sort in the same wall-clock time.....

 this means that for a sufficiently humungous value of "n"  an old
mechanical card sort machine will be able to complete the sort FASTER than
QuickSort on a supercomputer....

I'm glad i'm not the poor old operator on that shift ;-)

True


Quote:

> > Interesting to think that a single card sorter is effectively  Order(n)
...
> > i.e. double the records = double the time exactly.
> > How many so called fast algorithms can match that ?

> Thats just for one pass.  The number of passes is equal to the number of
columns
> in the sort key plus the number of columns which are alphanumeric as
opposed to
> numeric (one pass for the zones and a second for the digits).

> If there are no duplicate keys this is at least n*log(n) for some base
(probably
> between 10 and 26).  On the other hand if there are many duplicate keys
and the
> number of records sorted greatly exceeds the maximum number of unique keys
that
> can be represented in the given number of columns then the performance can
be
> far better than n*log(n). Properly done, card sorting is also stable.
I.e.,
> records with identical keys retain their original ordering.  (Better not
drop
> the deck.)

> Technically, sorting in this manner on a card sorted is an example of
radix
> sorting, so called because the method is based on the structure of the
> representation of the key in positional notation in some base (radix).

> Ordinary comparison sorting relies solely on the ordering of the keys and
does
> not depend in any way on how they are encoded.



Tue, 05 Jul 2005 10:15:52 GMT  
 Card Columns (was Why did they make ... ?)



Quote:

> > Interesting to think that a single card sorter is effectively  Order(n)
...
> > i.e. double the records = double the time exactly.
> > How many so called fast algorithms can match that ?

> Cute, but there's a bit of a fallacy hiding there.  The width of the
> sorting field will normally be proportional to the logarithm of the
> number of records, which brings things back to the standard n*log n
> time.

Eh? I don't know what you're on but can I have some ;-)

If i sort everyone in my street (all 8 of them)  by 'phone number , the sort
key length will be 11.
IF I then sort everyone in the UK (say 50 million or so if you include the
mobiles) by 'phone number, the sort key length will be 11..

There's no relationship whatever between the sort key length and the number
of records.



Tue, 05 Jul 2005 10:22:17 GMT  
 Card Columns (was Why did they make ... ?)
Quote:

> Perhaps yes, perhaps no.  Say key is account number and the
record is customer
> account record.  Then indeed what you say is true.  Such keys
are usually
> structured to identify the issuing bank or whatever,

geographical region, etc as

Quote:
> well as the customer and also usually involve self checking
digits, so in that
> case performance will be n*log(n) or worse.

Well, as I said, my remarks were predicated on it being an
"optimal" key.  That is, one
with the shortest possible keylengh for the number of distinct
values being sorted.

Quote:
> On the other hand with the same key but sorting transaction

records there can be
Quote:
> many transactions for the same account.  It real applications
of this kind the
> batch of transcations sorted at one time is generally way too
small to realize
> performance better than n*log(n), but there are examples of
applications where
> there can be a huge number of records but a very short key with
many duplicates
> and in such cases the performance indeed is better than

n*log(n).  This is in

Quote:
> fact getting back to the real meaning of sort that someone else
pointed out:
> dropping into bins as opposed to arranging in order.

Yeah, that was me adding that little bit of linguistic trivia.
Sorters would "sort", in the
original English meaning of the word, and arranging the cards in
order was done
by a combination of the sorter mechanism and the operator's
card-restacking actions.

On the subject of non-optimal keys, I'm reminded of a minor
mystery that bugged me back in
High School that I didn't figure out until after I started
working at IBM as a CE and getting
trained on sorters:

There was a girl in my class named Bonny Holst, who was in
several classes with me. The
teacher's roll books were generated by punched card tab
equipment.  Bonny's name was
always listed just *before* mine (Russell Holsclaw), even though
the order should have been
the reverse in an alphabetic sort. Years later, I figured out
that they probably only sorted
on the first four letters of the last name, and probably a few
letters of the first name.
So, instead of "Holst, Bonny" and "Holsclaw, Russell", we were
probably sorted with keys
like "HOLSBONN" and "HOLSRUSS", which would have put me after
Bonny.  The practice of alpha
sorts that didn't include all columns in the field was intended
to reduce the number of
passes of the cards through the sorter, because MOST of the time,
only a few letters produced
a reasonably accurate alpha sort.  When sorting data in a
computer, there is little reason
not to include the whole key ... it makes virtually no difference
in the timing. On a card-
sorter, it makes a whole world of difference, especially since an
alphabetic sort required
two passes per column, one for the "zone" punches, and one for
the "numeric".  Some
sorters had a special "alpha-sort" mode which allowed many of the
cards to be passed
through only once per column, so you wound up with something like
1.66 passes per column.
That mode added some logic to the selection of chute blades,
rather than the card falling
in to a pocket depending solely on which non-suppressed punch it
saw first.

(Chute blades: a stack of thin strips of metal running from one
end of the sorter to the other.
The sort mechanism opened up this stack when sensing the punch in
a card so that it slipped
in just below the right one.  Each blade was bent downward so as
to deflect the card into the
selected pocket as the rollers were carrying it along. There were
13 pockets, one for each
row in the card, plus a "reject" pocket to receive cards with no
punches in the selected column.)



Tue, 05 Jul 2005 10:27:25 GMT  
 Card Columns (was Why did they make ... ?)



Quote:
>  The width of the
> sorting field will normally be proportional to the logarithm of the
> number of records, which brings things back to the standard n*log n
> time.

Having thought about this a little, I now realise what you are implying is
that to have large numbers of records one tends to need large keys to
uniquely destinguish them...

That may be true in whatever area you work but in mine (Project Management
ISV), unique key sorts are very rare indeed.

 We tend to sort "events" by date or duration, "costs" by value or currency,
"people" by name or address or age, "things" by name, size, color, weight or
location.  In all these cases the key lengths are unrelated to the number of
records and in most, duplicate keys are the norm rather than the exception.

I would be fascinated to learn what kind of things you sort.
Chris.



Tue, 05 Jul 2005 11:12:49 GMT  
 Card Columns (was Why did they make ... ?)

Quote:

> probably sorted with keys like "HOLSBONN" and "HOLSRUSS", which would have
> put me after Bonny.  The practice of alpha sorts that didn't include all
> columns in the field was intended to reduce the number of passes of the
> cards through the sorter, because MOST of the time, only a few letters
> produced a reasonably accurate alpha sort.  When sorting data in a

Yep.   When sorting my name for passenger manifests in the USAF (about
30 years ago) we only sorted on the first 3 columns.  But we ALWAYS
followed that with a sequence check with the collator.   Depending
how many out-of-sequence cards were kicked out we'd either insert
by hand, sort then insert by hand, or sort, sequence check, and
then use the collator to merge.

Sounds like the operators at your school DP center (remember when it
was called DP :-) were lazy!

// marc



Tue, 05 Jul 2005 12:00:37 GMT  
 Card Columns (was Why did they make ... ?)

Quote:




>>>>Default seems to be at col 72, in my experience.  Can't

>>imagine why....

>>>>:->  I mean, who's going to add a comment in the final 8

>>cols??

>>>Columns 73-80 were for card numbering.

>>I've mentioned this before, but for those who missed it:

>>The tradition of using only 1-72 in mainframe compilers dates
>>back to the 704 through 7094 computers, whose online card
>>readers only read in row-binary, and would read two 36-bit
>>words from each row of punches, 24 words per card.
>>Conversion to characters was done by software.

>>Columns 73-80 were ignored because the reader literally
>>couldn't pick them up, unless you dropped in
>>another control panel that was jumpered differently.
>>[...]
> Moreover some ttys could print only 72 columns.

Yes, but TTY's were not used on the IBM 704.  The fact that FORTRAN,
JCL, COBOL, PL/I -- virtually everything used on IBM mainframe computers
except RPG -- ignores columns 73-80 depends on that bit of 50's hardware.

--
John W. Kennedy
"The poor have sometimes objected to being governed badly;
the rich have always objected to being governed at all."
   -- G. K. Chesterton, "The Man Who Was Thursday"



Tue, 05 Jul 2005 12:40:00 GMT  
 Card Columns (was Why did they make ... ?)

Quote:


>> A card-sorter-like sort _is_ possible, and can be faster than QuickSort
>> when measured in actual machine cycles if procedure-call overhead is
>> high -- which it is in a recursive context in generalized mainframe
>> assembler, due to the lack of a stack.
> Lets not forget many sorts are sensitive to the sort order
> of the data already in sort order. I think quicksort is rather
> bad in this area. Is that type of sort called a merge sort?
> Very useful if core* memory is limited on your machine.

Where the merge sort really takes off is if you are only interested in
counts and/or totals.  Every time two records compare equal, add the
second record to the first record and then throw the second record away.

--
John W. Kennedy
"The poor have sometimes objected to being governed badly;
the rich have always objected to being governed at all."
   -- G. K. Chesterton, "The Man Who Was Thursday"



Tue, 05 Jul 2005 12:40:02 GMT  
 Card Columns (was Why did they make ... ?)

Quote:

> I'd need to see some evidence that this isn't just a coincidence.
> I have an old Teletype Model 15 sitting in my garage, vintage WW
> II. And it has exactly 72 print columns. It also has a very
> limited 5-bit character set that doesn't include things like '=',
> '+', or '*'.  It would be hard to do FORTRAN with that, much less
> PL/I or C.  The ASCII TTY didn't appear until the mid-'60s.

Nevertheless they did it:

  & was used for +
  - stood for itself
  * was used for multiply
  / for divide
  # for =

  % for left parenthesis
  {losenge} for right parenthesis
  For relational operations .LT. .LE. .EQ. .NE. .GE. and .GT. were used.


only way to introduce characters was in Holerith literals if I recall correctly.

You can encode anything if you're determined enough.



Tue, 05 Jul 2005 15:06:36 GMT  
 Card Columns (was Why did they make ... ?)

Quote:

> What are you sorting if not alphanumeric data?
> Do you mean only numeric data, or only enough codes to fit in the
> pockets?
> And we are talking about a card sorter, right?

Right.

If a key column may only contain a digit from 0 to 9 there is a single hole in
that column.  If it may contain any alphanumeric character, there are two holes
for the letters but still only the one for a digit.  Any cards with two holes in
the column being sorted on normally go into the reject or multipunch pocket.  It
is possible to set up the sorter to look at either the digits (1-9) only or the
zones only (0,11,12).  First do the digits.  Then do zones for the reject deck,
then do the zones for the 1-9 decks combined in that order, then put the pieces
together.  If special characters are involved some characters require three
holes and sorting gets trickier.



Tue, 05 Jul 2005 15:31:58 GMT  
 Card Columns (was Why did they make ... ?)


Quote:

>> probably sorted with keys like "HOLSBONN" and "HOLSRUSS", which would
have
>> put me after Bonny.  The practice of alpha sorts that didn't include all
>> columns in the field was intended to reduce the number of passes of the
>> cards through the sorter, because MOST of the time, only a few letters
>> produced a reasonably accurate alpha sort.  When sorting data in a

>Yep.   When sorting my name for passenger manifests in the USAF (about
>30 years ago) we only sorted on the first 3 columns.  But we ALWAYS
>followed that with a sequence check with the collator.   Depending
>how many out-of-sequence cards were kicked out we'd either insert
>by hand, sort then insert by hand, or sort, sequence check, and
>then use the collator to merge.

>Sounds like the operators at your school DP center (remember when it
>was called DP :-) were lazy!

Or they only rented an hour on the sorter.

/BAH

Subtract a hundred and four for e-mail.



Tue, 05 Jul 2005 20:29:19 GMT  
 Card Columns (was Why did they make ... ?)


If you want to have a discussion, DO NOT TOP POST, please.

Quote:
>Well I got a lot of replies very fast from that but to take some of you to
>task...

>The card sort IS O(n) because of the simple fact that doubling the number
of
>cards EXACTLY doubles the time taken to sort those cards.

Nope.  Go do some card sort work.  You also have to consider
how fast the operator is.  I preferred doing large decks because
I could stack and remove as the sorting happened.  I wouldn't say
double based on the card count.  It's the start and stop that
took the most time.

/BAH

Subtract a hundred and four for e-mail.



Tue, 05 Jul 2005 20:35:11 GMT  
 Card Columns (was Why did they make ... ?)


Quote:
> Nevertheless they did it:

>  ... etc. etc.
> Actually, I don't think the first version of FORTRAN even


Quote:
> only way to introduce characters was in Holerith literals if I
recall correctly.

> You can encode anything if you're determined enough.

OK, I didn't argue that people didn't find a way to use TTY's to
adapt.
And, yes, I've been aware of the Figs and Ltrs mode on the model
15 since the very first day I saw one, about
45 years ago.

Nevertheless, I remain certain that the idea of compilers
scanning 1-72, and ignoring the last 8 columns of the
card originated with the row-binary card readers on those first
36-bit machines.

The alternate character set that PL/I suppported was to
accomodate the limited character sets of earlier keypunches and
printers, which could accomodate no more than 48 printer glyphs
(that's only 12 special characters beyond the letters and numeric
digits.)

Again... although it's an interesting coincidence that TTY's ALSO
had a 72-character line-length, I'm still holding to the
assertion that the practice of leaving out the last 8 columns in
the card was originally motivated by the limitations imposed by
the hardware in which FORTRAN was developed. Everything else
mentioned in your posts came later. Everything.



Tue, 05 Jul 2005 21:27:58 GMT  
 
 [ 373 post ]  Go to page: [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25]

 Relevant Pages 
 

 
Powered by phpBB® Forum Software