All Sorts of Sorts 
Author Message
 All Sorts of Sorts

 Date: 02-07-90 (17:50)              Number: 2869 (Echo)
   To: ALL                           Refer#: NONE
 From: ZAFAR ESSAK                     Read: (N/A)
 Subj: LEARN TO SORT                 Status: PUBLIC MESSAGE

 This is not meant to detract from Ian Green's Modula-2 discussions
 where he has been discussing SORTS, but rather may shed more light on
 the topic; at least from a novice sorter's viewpoint.

 I would also like to apologize at the outset for the lack of references
 to the results of the Forth Sort contest that was held on GEnie.  I
 would appreciate a note from anyone who recalls the name(s) of the
 files that pertain to that contest.  I have looked over the various
 files related to sorts on the BCFB and found the descriptions of sorts
 by Norm Arnold in SORT_WIN.ZIP most helpful.

 To begin using and understanding SORTS I needed a generalized approach
 that would allow applying the SORT methods to a variety of data
 elements.  This requires the use of a few variables and deferred
 execution words.  Vector variables were chosen for the deferred words.
 There is an obvious performance penalty in using a generalized approach
 such as this but I felt it would give me an opportunity to understand
 the various SORT methods as well as allow experimentation on different
 data elements.

 The most striking, though simple, find was discovered through
 experimenting with consecutive SORTS.  Data elements containing several
 fields were first sorted on one field and then sorted on a second
 field.  Of the three sorts tried (BUBBLE, SHELL, and QWIK) only the
 BUBBLE Sort preserved the ordering of the first sort within the second
 sort.  Unfortunately, the BUBBLE sort is by far the slowest of the
 three, taking five times longer than the QWIK sort and twice as long as
 the SHELL sort.

 The next few messages contain the Forth code used.  This is essentially
 Forth-83 Standard which has implications in the DO...+LOOP where a
 negative integer is used.  A simple example of an array sort was used
 to initially test the functioning of the sorts.  This is illustrated
 with the code but requires a RANDOM number generator, several of which
 are available on the board.
 ---
  * Via Qwikmail 2.01

 NET/Mail : British Columbia Forth Board - Burnaby BC - (604)434-5886  
-----
This message came from GEnie via willett through a semi-automated process.



Tue, 28 Jul 1992 08:41:44 GMT  
 All Sorts of Sorts

 Date: 02-07-90 (17:50)              Number: 2870 (Echo)
   To: ALL                           Refer#: NONE
 From: ZAFAR ESSAK                     Read: (N/A)
 Subj: LEARN TO SORT                 Status: PUBLIC MESSAGE

 \ SORTS.FLY     Part 1 - Generalizing the Sort Function

 ((
 For a sort utility to have flexibility some words have deferred
 actions.  Essentially, when sorting a collection of data elements,
 two elements are compared at a time.  Depending on the results of the
 comparison the placement of two elements in the data collection may
 require that they be exchanged.

 The comparison test itself may need to be modified to suit the data
 elements.  Also the exchange routine will need to be modified for the
 particular data collection.

 The following assumes the data elements are fixed length and uses the
 space at PAD when comparing and exchanging elements.

 ))

 ONLY FORTH ALSO DEFINITIONS   DECIMAL

 VARIABLE DATA.MAX# ( --adr)     \ maximum number of data elements

 VARIABLE DATA.WIDTH ( --adr)    \ width of each data element



 VARIABLE ~DATA! ( --adr)        \ Vector for method to store data


 ((
 The following word uses PAD as the buffer and accesses different
 parts of the buffer for temporary storage of different data elements.

 ))


 : EXCHANGE ( n1,n2--)   \ n1,n2 are indexes to the data elements
     2DUP >R >R >R >R

     2 data.buf  R> DATA!    1 data.buf  R> DATA! ;

 ((
 In the following comparison of data elements the flag returned shall
 be -1, 0, or 1 to indicate if the data at adr1 is less than, equal
 to, or greater than the data at adr2, respectively.

 ))

 VARIABLE ~SORT? ( --adr)        \ Vector action for comparison test


 ((
 Various SORT algorhythms require the following variables be set by
 any applications that call the sort.

             DATA.MAX#
             DATA.WIDTH

             ~DATA!
             ~SORT?

 ))

 \ The following is a sort of an array of 2 byte data elements

 VARIABLE startdata ( --adr)




 : CELL.SORT ( --)
     2 DATA.WIDTH !

     ['] CELL!        ~DATA! !
     ['] CELL.COMPARE ~SORT? ! ;

                                                       Part 2...
 ---
  * Via Qwikmail 2.01

 NET/Mail : British Columbia Forth Board - Burnaby BC - (604)434-5886  
-----
This message came from GEnie via willett through a semi-automated process.



Tue, 28 Jul 1992 08:41:48 GMT  
 All Sorts of Sorts

 Date: 02-07-90 (17:50)              Number: 2871 (Echo)
   To: ALL                           Refer#: NONE
 From: ZAFAR ESSAK                     Read: (N/A)
 Subj: LEARN TO SORT                 Status: PUBLIC MESSAGE

 \ SORTS.FLY     Part 2 - Different Sort Methods

 ((
 The BUBBLE SORT compares the first element with the second element.
 If necessary they are exchanged.  Then the second element is compared
 with the third, etc.  Thus, the largest element will be moved to the
 end of the list while most others only move down one position.  The
 result is an unsorted list with a length one less than the previous
 list.  The list size is decreased and the sort repeated until the
 list size is zero.

 Consecutive sorts will preserve the effects of prior sorts.

 ))

 : BUBBLE ( --)

     ?DO I 0
         ?DO I DUP 1+
             0 data.buf 1 data.buf
             2OVER 2OVER

                 IF EXCHANGE ELSE 2DROP THEN
             LOOP
     -1 +LOOP ;

 ((
 The SHELL SORT is similar to the bubble sort except that instead of
 comparing each element with the next element it is compared with an
 element further down the list.  However, it does not preserve the
 effects of prior sorts when consecutive sorts are performed.
 When the sort is started the gap between the numbers being compared
 is set equal to half the length of the list.  When the routine can
 go through the list without trading the gap is halved and the
 routine repeated.  When the gap equals zero the sort is finished.

 ))

 VARIABLE gap ( --adr)
 VARIABLE traded ( --adr)    \ flag to indicate if elements exchanged

 : SHL ( --)



         ?DO traded OFF I 0

                 0 data.buf 1 data.buf
                 2OVER 2OVER

                     IF EXCHANGE traded ON   ELSE 2DROP  THEN
                 LOOP

         -1 +LOOP
     RECURSE ;


 ((
 The QWIK SORT is much faster than either the Bubble or Shell sorts.
 Furthermore, it does not preserve the effects of prior sorts when
 consecutive sorts are performed.

 The middle element of the unsorted list is used to divide the list
 into two piles.  The highest pile is in turn divided into two and
 this process repeated until the top pile is a collection of sorted
 elements.  Then attention is focused on the preceeding pile to sort
 it.  This is repeated until all the "piles" have been sorted.

 ))

 : QSORT| ( l,r--)                   \ QWIK SORT by Dan Phillips
     2DUP < NOT IF 2DROP EXIT THEN

     2DUP
         BEGIN SWAP

                   1 data.buf 0 data.buf SORT? 0<
             WHILE 1+
             REPEAT SWAP

                       1 data.buf 0 data.buf SORT? 0>
                 WHILE 1-
                 REPEAT 2DUP > NOT
                     IF 2DUP EXCHANGE SWAP 1+ SWAP 1-
                     THEN 2DUP >
         UNTIL -ROT SWAP
     RECURSE RECURSE ;


 ---
  * Via Qwikmail 2.01

 NET/Mail : British Columbia Forth Board - Burnaby BC - (604)434-5886  
-----
This message came from GEnie via willett through a semi-automated process.



Tue, 28 Jul 1992 08:41:52 GMT  
 All Sorts of Sorts

 Date: 02-07-90 (17:50)              Number: 2872 (Echo)
   To: ALL                           Refer#: NONE
 From: ZAFAR ESSAK                     Read: (N/A)
 Subj: LEARN TO SORT                 Status: PUBLIC MESSAGE

 \ SORTS.FLY     Part 3 - Testing the Sort Methods

 \ Test the cell sort

 INCLUDE \FORTH\RANDOM.TXT   \ Random number generator

 100 DATA.MAX# !

 CELL.SORT               \ Initialize Sort Function Variables


 testdata startdata !

 : printdata ( --)


 : BTEST ( --)

     printdata CR
     ." Go " CR
     BUBBLE
     printdata CR ;

 : STEST ( --)

     printdata CR
     ." Go " CR
     SHELL.SORT
     printdata CR ;

 : QTEST ( --)

     printdata CR
     ." Go " CR
     QSORT
     printdata CR ;

 ((
 Test times:
             BTEST = 9.76 secs on Turbo XT
             STEST = 5.16 secs on Turbo XT
             QTEST = 2.68 secs on Turbo XT

 ))

 \ SORTS.FLY   The End.
 ---
  * Via Qwikmail 2.01

 NET/Mail : British Columbia Forth Board - Burnaby BC - (604)434-5886  
-----
This message came from GEnie via willett through a semi-automated process.



Tue, 28 Jul 1992 08:41:56 GMT  
 All Sorts of Sorts

 Date: 02-08-90 (09:25)              Number: 2874 (Echo)
   To: ZAFAR ESSAK                   Refer#: 2869
 From: IAN GREEN                       Read: 02-08-90 (09:25)
 Subj: LEARN TO SORT                 Status: PUBLIC MESSAGE

    Sorting is a bit more subtle than you seem to realize. In my comm
 program, I use two sorts, one is the bubble sort where I sort the phone
 book, and the other is a stack oriented partition sort.
    Granted the bubble sort is slow on unordered random data, data that
 is already partially sorted can be sorted much faster with the bubble
 sort because it is able to exploit the partial ordering. Stack oriented
 sorts like the Quick Sort, cannot exploit this and thus are not all that
 suitable to partially ordered data like real data-bases.
    My phone book (in my comm program) is 500 entries, rather small
 compared to major data-bases, but still representative. Since I modify
 at most 2 or 3 entries, it makes sense to use the bubble sort since the
 data is already sorted from the last modification. Because it is always
 partially ordered at each modification, the logic of algorithm choice
 becomes obvious.
    My apologies for not having a Forth example for you.

 Ian Green

 NET/Mail : British Columbia Forth Board - Burnaby BC - (604)434-5886  
-----
This message came from GEnie via willett through a semi-automated process.



Wed, 29 Jul 1992 07:05:28 GMT  
 All Sorts of Sorts
[I'm porting this message in case anyone on the c.l.f side of the world
 would like to have access to any of the files mentioned in this post.
 I can arrange to have them uploaded to simtel20 (takes approx. 1 week)
 if you drop me a note at one of the addresses at the end of this message.
 -Doug]

 Date: 02-08-90 (13:44)              Number: 2875 (Echo)
 From: JACK BROWN                      Read: 02-08-90 (09:25)
 Subj: LEARN TO SORT                 Status: PUBLIC MESSAGE

 >I would also like to apologize at the outset for the lack of references
 >to the results of the Forth Sort contest that was held on GEnie.  I
 >would appreciate a note from anyone who recalls the name(s) of the
 >files that pertain to that contest.  I have looked over the various
 >files related to sorts on the BCFB and found the descriptions of sorts
 >by Norm Arnold in SORT_WIN.ZIP most helpful.
 >
 Zafar...  I did a  ZIPpy dir scan on BCFB  "  ZIP  SORT  "  and came up
 with the following references to sorting Forth.

 DVD&KNKR.ZIP     5527  10-17-89  Entry in the Forth Dimensions sort
 contest
 FIGSORT.ZIP      4470  11-09-89  Sort contest entry from H. Hansen
 RADXSORT.ZIP     2853  10-22-89  DK Elvey's Forth Radix sort. Contest
 entry.
 SORTCONT.ZIP    10814  07-24-89  Instructions for FIG's  SORT contest by
                                  Dennis Ruffer.

 SORTDEMO.ZIP    12928  01-04-88  Multi-Forth SortDemo&Gloss--Five Diff
 Sorts
 4THSORT.ZIP      2361  12-14-87  Some FORTH sorts (fig-FORTH)
 BATCHER.ZIP      1961  12-13-87  Batcher's Sort from Forth Dim (F83)
 SORT_WIN.ZIP     9318  09-27-88  Norm Arnold's Contest Winning Sorts.
 VISISORT.ZIP     9318  10-04-88  Visible sort routines - bubble, shell,
 kwik
 QSORT312.ZIP    40960  02-14-88  Good Sort Package for DOS (used by
 WORDNDX)
 SORT-OUT.ZIP   156155  08-02-89  Henning Hansen's wonderful sorting
                    demo.  Written in LMI Forth.  Includes source for
                    for sorts and windowing package.

 SORTLM.ZIP       2255  12-15-87  Adaption of Will Baden's Quicksort &
 SWORDS
 SORTZ80.ZIP      3211  12-15-87  Compact/fast sort for Z80 by S. Vance
 SORTURF.ZIP      1844  03-15-88  Sort and execution arrays for UR/Forth
 QSORT322.ZIP    41522  10-01-89  Great quick sort utility

 NET/Mail : British Columbia Forth Board - Burnaby BC - (604)434-5886  
-----
This message came from GEnie via willett through a semi-automated process.



Wed, 29 Jul 1992 07:05:32 GMT  
 All Sorts of Sorts

 Date: 02-09-90 (07:29)              Number: 2876 (Echo)
   To: IAN GREEN                     Refer#: 2874
 From: STEVE PALINCSAR                 Read: NO
 Subj: LEARN TO SORT                 Status: PUBLIC MESSAGE

 How does the selection sort compare to the bubble sort in your
 application?  In my (*very*) limited experience I found the selection
 sort to be considerably faster (between 2 & 4 times faster) in sorting
 80 character long strings quantity 100-250.

 Selection sort is also, according to Knuth, a stable method, preserving
 the original order for identical keys.  It's also extremely easy to code
 and understand -- far more so than the bubble sort.
-----
This message came from GEnie via willett through a semi-automated process.



Thu, 30 Jul 1992 11:22:56 GMT  
 All Sorts of Sorts

 Date: 02-09-90 (16:04)              Number: 2877 (Echo)
   To: IAN GREEN                     Refer#: 2874
 From: ZAFAR ESSAK                     Read: NO
 Subj: LEARN TO SORT                 Status: PUBLIC MESSAGE

 Ian,

     Your assumption that the tests I did were on random lists is
 absolutely correct.  Your comments on partly ordered lists resulting in
 admirable performances for the BUBBLE Sort got me to review the code.
 The following contains a minor revision to exit as soon as no
 exchanging of items is required.

 \ The BUBBLE Sort

 VARIABLE gap ( --adr)
 VARIABLE traded ( --adr)    \ flag to indicate if elements exchanged

 : pair.sort ( n1,n2--)            \ : pair.sort ( n1,n2--)



         IF EXCHANGE traded ON     \         IF EXCHANGE traded ON
         ELSE 2DROP THEN ;         \         ELSE 2DROP THEN ;

 : BUB ( --)

         ?DO traded OFF I 0


         -1 +LOOP ;

 : BUBBLE ( --) 1 gap ! BUB ;

 \ The SHELL Sort

 : SHL ( --)

         2/ gap ! BUB RECURSE ;


 This provides for some better factoring as well as performance.

 I have also gone through an early (1986) issue of Forth Dimensions and
 have added the BATCHER'S Sort to the set.  Here it is.

 ((
 BATCHER'S SORT is slightly slower, but more robust than Qwik sort
 and has consistent sorting times.  Depending on the input data, in
 some cases Batcher's sort may be faster than Qwik sort.

 Unfortunately, it too does not preserve the effects of prior sorts
 when consecutive sorts are performed.

 Batcher's sort consists of three nested loops.  The innermost loop
 compares pairs of keys that are completely independent.  Thus
 parallel computing could be applied for very fast sorting.

 From Forth Dimensions Nov/Dec 86 p 39 by John Konopka

 The names of the variables were chosen to be consistent with the
 description of the algorithm given by Knuth.

 ))

 CREATE 2** ( --adr)
        1 ,    2 ,    4 ,     8 ,    16 ,    32 ,    64 ,  128 ,
      256 ,  512 , 1024 ,  2048 ,  4096 ,  8192 , 16384 ,


 VARIABLE DD ( --adr)    VARIABLE NN ( --adr)    VARIABLE PP ( --adr)
 VARIABLE QQ ( --adr)    VARIABLE RR ( --adr)    VARIABLE TT ( --adr)

 : SELECT-T ( --)

     1- 14 MIN TT ! ;

 : INNER-LOOP ( --)



             THEN LOOP ;

 : Q-TEST ( --)




         THEN ;


 : BATCHER ( --)



         BEGIN INNER-LOOP Q-TEST

     UNTIL ;

 ---
  * Via Qwikmail 2.01

 NET/Mail : British Columbia Forth Board - Burnaby BC - (604)434-5886  
-----
This message came from GEnie via willett through a semi-automated process.



Thu, 30 Jul 1992 11:23:00 GMT  
 All Sorts of Sorts

 Date: 02-10-90 (08:33)              Number: 2880 (Echo)
   To: ZAFAR ESSAK                   Refer#: 2877
 From: IAN GREEN                       Read: NO
 Subj: LEARN TO SORT                 Status: PUBLIC MESSAGE

    Even though I am not much of a Forther, I am still willing to help
 Forth programmer's make the best of what they are doing.
    The added exit in the program will indeed improve the performance
 dramatically, especially in partially ordered data. If you saw my
 earlier discourse on sorts in the Modula-2 conference, you may have
 noticed how the use of a guard could improve the bubble sort even more.
 The reason is simple, if the zero-eth element of the array is set to the
 minimum value of the data type in question, then one of the two compares
 can be effectively eliminated by virtue of the fact that the bubbling
 process is now guaranteed to terminate at the top of the array.
    Later I plan to post so additional sorts in the Modula-2 conference
 so keep your eyes open.

 Ian Green

 NET/Mail : British Columbia Forth Board - Burnaby BC - (604)434-5886  
-----
This message came from GEnie via willett through a semi-automated process.



Fri, 31 Jul 1992 03:26:44 GMT  
 All Sorts of Sorts

 Date: 02-10-90 (08:39)              Number: 2881 (Echo)
   To: STEVE PALINCSAR               Refer#: 2876
 From: IAN GREEN                       Read: NO
 Subj: LEARN TO SORT                 Status: PUBLIC MESSAGE

    In the real world, the selection sort is so inefficient it isn't
 funny. Real databases are massive. Consider Brokerage XYZ whose has an
 army of analysists working on finding the next LBO candidate.
    If we needed to extract from the database some information and sort
 it so that it could be readily browsed and furthur parsed, huge amounts
 of sorting and selecting is required. It therefor behooves you to use
 the fastest algorithms available in order to get so 'performance' out of
 the query lest the client get impatient. Combined with insider
 information it is then practicle to make hundreds of millions of
 dollars, all tax free (at least under present rules). You don't need
 lots of money, all you need is smart software and a few well placed
 friends.

 Ian Green

 NET/Mail : British Columbia Forth Board - Burnaby BC - (604)434-5886  
-----
This message came from GEnie via willett through a semi-automated process.



Fri, 31 Jul 1992 03:26:48 GMT  
 All Sorts of Sorts

 Date: 02-11-90 (08:41)              Number: 2884 (Echo)
   To: IAN GREEN                     Refer#: NONE
 From: STEVE PALINCSAR                 Read: NO
 Subj: SORTING ALGORITHMS]           Status: PUBLIC MESSAGE

 As a matter of fact, even as august an authority as Knuth says that for
 some sorts of data, the selection sort happens to be the best one
 available.  But besides that, your assumption that "the real world"
 consists only of massive databases simply takes my breath away.

 As it happens, the application that I translated KNuth into Forth for is
 a real-world application for a task I do at work.  Every month I do a
 search on a Library of Congress bibliographic database on their inhouse
 SCORPIO system and I download a list of their reports added since the
 previous issue.  (That's usually between 60 and 100 or so items.)  The
 entries are downloaded in report number order.  My program strips off
 the unnecessary material from each citation, and puts the cites in
 author title order.

 In fact, I think the requirement to order the data came about because
 one of the editors of the publication secretly wished to eliminate the
 column, and thought that the amount of manual effort required to do so
 would cause me to simply abandon the effort.  Ha!  Using this program it
 takes me about 30 seconds to process the data.  As it so happens, she's
 taking over production of the column, and was astounded at how easy it
 was...

 Incidentally, if you'll stop to consider the situation, you'll see that
 the bubble sort is less efficient than the selection sort - so by your
 own criteria you oughtn't be using it!  But as we both know, there are
 times in the real world that you need to sort relatively small numbers
 of items, and in such cases more complex algorithms may be even slower
 than the simple ones.  Even Knuth says so...  ;-)
-----
This message came from GEnie via willett through a semi-automated process.



Sat, 01 Aug 1992 10:31:37 GMT  
 All Sorts of Sorts

 Date: 02-13-90 (14:57)              Number: 2901 (Echo)
   To: ZAFAR ESSAK                   Refer#: 2898
 From: STEVE PALINCSAR                 Read: NO
 Subj: SELECTION SORT                Status: PUBLIC MESSAGE

 To quote from Knuth (The Art of Computer Programming, vol. 3,
 Sorting and Searching), p. 139:
 .
 "5.2.3  Sorting by Selection
 .
 An important famly of sorting techniques is based on the idea
 of repeated selection.  The simplest selection method is
 perhaps the following:
     i) Find the smallest key; transfer the corresponding
 record to the output area; then replace the key by the value
 "" (which is assumed to be higher than any actual key).
     ii) Repeat step (i).  This time the second smallest key
 will be selected, since the smallest key has been replaced by
 .
     iii) Continue repeating step (i) until N records have
 been selected.
 .
 The above method involves N - 1 comparisons...  There is an
 obvious way to improve upon the situation, avoiding the use
 of : we can take the selcted value and move it into its
 proper position by exchanging it with the record currently
 occupying that position.  Then we need not consider that
 position again in future selections.  This idea yields our
 first selection sorting algorithm."
 .
 [He goes on to describe Algorithm S (Straight selection sort)
 which I won't quote.]  He goes on, on p. 141, to  say:
 .
 "It is interesting to compare Algorithm S to the bubble
 sort...  Bubble sorting does less comparisons than straight
 selection and it may seem to be preferable; but in fact
 Program 5.2.2B [the bubble sort] is more than twice as slow
 as Progam S!  Bubble sorting is handicapped by the fact that
 it does so many exchanges, while straight selection involves
 very little data movement."
 .
 To quote from Robert Sedgewick's _Algorithms_ (p. 95):  
 "Despite its simplicity, selection sort has quite an
 important application: it is the method of choice for sorting
 files with very large recoords and small keys."  Here'ss his
 code (inn Pascal):
    procedure selection;
       var i,j,min,t: integer;
       begin
       for i:=1 to N do
          begin
          min:=i;
          for j:=1+1 to N do
             if a[j]<a[min] then min:=j;
          t:=a[min]; a[min]:=a[i]; a[i]:=t
          end;
       end;
 .
 Sedgewick describes the algorithm as follows:  "find the smallest
 element in the array and exchange it with the element in the first
 position, then find the second smallest element and exchange it with the
 element in the second position, continuing in this way until the entire
 array is sorted. ...This is among the simplest of sorting methods, and
 it will work very well for small files.  Its running time is
 proportional  to N2.  ...Still selection sort is quite attractive for
 sorting (say) a thousand 1000-word records on one-word keys."
 (p. 95 for all the Sedgewick quotes & code)
-----
This message came from GEnie via willett through a semi-automated process.



Tue, 04 Aug 1992 10:35:51 GMT  
 
 [ 12 post ] 

 Relevant Pages 

1. Sort (List sorts...)

2. Binary Sort/Merge Sort in awk

3. Sort of Sorts

4. All Sorts of Sorts

5. All Sorts of Sorts

6. All Sorts of Sorts

7. Bubble sort/Selection sort help neded.

8. Qsort, Merge sort, Radix sort in prolog Help!

9. UNIX sort exponents (bc|sort)

10. ANN: New release of S.C.A. Micro Templates (Browse Header Sort and Reverse Sort Template)

11. Straight Insertion Sort and Binary Insertion Sort

12. Sorting a file w/o SORT verb

 

 
Powered by phpBB® Forum Software