Critique my first program? 
Author Message
 Critique my first program?

This is my first useful program I've written with Dylan.  It's intended
to be a Scrabble "aid."  It searches a list for words that contain the
letters specified on the command line, sorts them, and prints them out.

I'd really appreciate it if anyone would point out places where the code
could be improved, even stylistic nitpicks where I defy Dylan
convention.  I'm also very interested in ways to improve performance,
since it currently runs about 25 times slower than my equivalent C
program when compiled with d2c, despite my optimizing it as best I can.

Thanks in advance.  Here goes.

module: Papaya
synopsis: Searches a list of words for words containing the specified
letters, sorts them by length, and prints them

/* What is the difference between define function and define method? */

define variable FILE-PATH = "/usr/share/dict/extended_words";
//define variable FILE-PATH = "/Users/peter/testfile.txt";

define method useage() => ();
    puts("Useage: lsearch string\n");
end method useage;

/* convert a-z to 0-25 */
define method char-to-num(c :: <character>) => (result :: <integer>)
    as (<integer>, c) - 97 //as(<integer>, 'a')
end method char-to-num;

/* takes in a string, returns a vector of length 26 where each
    index stores the count of the corresponding letter
    e.g. string-to-nums("aac") is [2, 0, 1, 0, 0, ...]
  */
define method string-to-nums(string :: <string>) => (nums :: <vector>)
    let result = make(<vector>, size: 26, fill: 0);
    do ( method (a)
           let val = char-to-num(a);
           result[val] := result[val] + 1; //can I write this as
result[val]++ or similar?
         end,
         string);
    result
end method string-to-nums;

/* converts a string to an array of numbers.  string-to-array("aac") is
[0, 0, 2] */
define method string-to-array(string :: <string>) => (nums :: <vector>)
    map-as (<vector>, char-to-num, string)
end method string-to-array;

define method main-search(string :: <string>) => ()
    let base-array :: <vector> = string-to-nums(string);
    let matching-strings :: <stretchy-vector> = make(<stretchy-vector>);
    let matching-array :: <vector> = string-to-array(string);
    let handler <file-does-not-exist-error>
       = method(condition, next)
          puts(concatenate("Unable to open file ", FILE-PATH, "\n"));
          exit();
        end method;

    let stream :: <file-stream> = make(<file-stream>,
                                              locator: FILE-PATH);

    /* Read in the lines and save the ones that contain the letters
specified in string.
       A naive implementation would take n^2 time, so instead we convert
each string to a
       hash-table like construct using string-to-nums and operate on that */
    block()

       /* a string is good if it contains all of the characters in our
base-array */
       // 116.48 real        65.74 user         3.59 sys
       // This function, though simple, is also slow, and we do not use
it any more
       local method goodOld? (string :: <string>) => (good :: <boolean>)
          ~ any? ( negative?, map(\-, string-to-nums(string), base-array))
       end method goodOld?;

       // 43.83 real        24.56 user         0.53 sys
       // This function is equivalent to the above function but much faster
       local method good? (string :: <string>) => (good :: <boolean>)
          let new-sequence :: <vector> = string-to-nums(string);
          block (done)
             for (i from 0 below matching-array.size)
                let index :: <integer> = matching-array[i];
                new-sequence[index] := new-sequence[index] - 1;
                if (negative?(new-sequence[index])) done(#f); end if;
             finally
               #t
             end for;
          end block
       end method good?;

       while (#t)
          let line :: <string> = as-lowercase(read-line(stream));
          if (good?(line)) matching-strings := add!(matching-strings,
line) end if;
       end while;

    exception (<end-of-stream-error>)
      #f; //do we need to have some dummy value if we just want
execution to continue?
    end block;

    /* we are done with the stream */
    stream.close;

    /* sort the lines we saved */
    local method size-compare (a :: <string>, b :: <string>) =>
(size-compare :: <boolean>)
       (size(a) < size(b)) |
       (size(a) == size(b) & (a < b))
    end method size-compare;

    matching-strings := sort! (matching-strings, test: size-compare);

    /* output the result */
    do ( method (s) puts(s); puts("\n") end method, matching-strings);

end method main-search;

define method main (argv0 :: <byte-string>, #rest string)
   if (empty?(string)) useage();
   else main-search(as-lowercase(string[0]));
   end if;
end method main;



Mon, 12 Sep 2005 18:15:50 GMT  
 Critique my first program?



Quote:
> /* convert a-z to 0-25 */
> define method char-to-num(c :: <character>) => (result :: <integer>)
>     as (<integer>, c) - 97 //as(<integer>, 'a')
> end method char-to-num;

So I guess you've never heard of EBCDIC?

And how do you handle non-alpha?



Tue, 13 Sep 2005 03:00:14 GMT  
 Critique my first program?

Quote:



>>/* convert a-z to 0-25 */
>>define method char-to-num(c :: <character>) => (result :: <integer>)
>>    as (<integer>, c) - 97 //as(<integer>, 'a')
>>end method char-to-num;

> So I guess you've never heard of EBCDIC?

Thanks for the heads-up.  I'm familiar with differences in character
sets.  I rewrote it like so:

/* convert a-z to 0-25 */
define method char-to-num(c :: <character>) => (result :: <integer>)
    case c == 'a' => 0;
         c == 'b' => 1;
         c == 'c' => 2;
         c == 'd' => 3;
         c == 'e' => 4;
         c == 'f' => 5;
         c == 'g' => 6;
         c == 'h' => 7;
         c == 'i' => 8;
         c == 'j' => 9;
         c == 'k' => 10;
         c == 'l' => 11;
         c == 'm' => 12;
         c == 'n' => 13;
         c == 'o' => 14;
         c == 'p' => 15;
         c == 'q' => 16;
         c == 'r' => 17;
         c == 's' => 18;
         c == 't' => 19;
         c == 'u' => 20;
         c == 'v' => 21;
         c == 'w' => 22;
         c == 'x' => 23;
         c == 'y' => 24;
         c == 'z' => 25;
         otherwise => error("Bad input string");
      end case;
end method char-to-num;

I expected this to be much slower, but it was actually if anything
somewhat faster, and more portable.

Quote:

> And how do you handle non-alpha?

With the new function, it prints an error message and quits, which is
fine behavior.

Thanks for your thoughts.

-Peter



Tue, 13 Sep 2005 03:39:27 GMT  
 Critique my first program?

Quote:

> /* What is the difference between define function and define method? */

define function is the equivalent of this:

  // define function() /* do something */ end
  define constant my-function = method() /* do something */ end;

Functions don't have the capibility to do method dispatch. Methods
do. So if you want to optimize for speed and don't use any form of
dynamic dispatch you can use define function.

Chris.
--
http://www.double.co.nz/dylan



Tue, 13 Sep 2005 05:51:30 GMT  
 Critique my first program?

Quote:

> define variable FILE-PATH = "/usr/share/dict/extended_words";

Top-level variables should be written between asterisks: *file-path*.
This makes it easier to spot them.

Quote:
>     puts("Useage: lsearch string\n");

puts() is deprecated. Use one of:

write(*standard-output*, "foo");
format(*standard-output*, "foo: %=", 23);
print-object(*standard-output*, "foo");
format-out("foo: %=", 23);

The latter is defined as:

define constant format-out = curry(format, *standard-output*);

and just a convenience function.

print-object is a generic function that is called for the %= format
dispatch character on format.  write is the low-level method that
actually writes to the stream.

So in order to be portable, you could define

define constant puts = curry(write, *standard-output*);

in your code.

Quote:
> define method string-to-nums(string :: <string>) => (nums :: <vector>)

In most cases, if you deal with vectors of a specific type, it pays to
use a limited vector:

define constant <integer-vector> = limited(<vector>, of: <integer>);

and use <integer-vector> instead of <vector> throughout your code.

Quote:
>     let stream :: <file-stream> = make(<file-stream>,
>                                          locator: FILE-PATH);
[...]
>     /* we are done with the stream */
>     stream.close;

It's much more convenient to use the with-open-file macro here:

with-open-file(stream = *file-path*)
  ... your code here
end;

It will automatically care about closing your file, even if a
non-local exit is taken as the result of a condition.

Quote:
>           let line :: <string> = as-lowercase(read-line(stream));

Now here is the reason the code is slower than your C version: GD
generates full generic function dispatch for as-lowercase and
read-line, and a type check for the assignment.  I'm not sure at the
moment what to do about it, but the streams library desperately needs
an overhaul anyways.

Quote:
> define method main (argv0 :: <byte-string>, #rest string)
>    if (empty?(string)) useage();
>    else main-search(as-lowercase(string[0]));
>    end if;
> end method main;

Using an entry function is also a deprecated feature in Common
Dylan. Do something like:

begin
  let args = application-arguments();
  if(empty?(args)
    usage();
  else
    main-search(as-lowercase(args[0]));
  end;
end;

In other words: just use a side effect of a top-level statement.

Andreas

--
"In my eyes it is never a crime to steal knowledge. It is a good
theft. The pirate of knowledge is a good pirate."
                                                       (Michel Serres)



Tue, 13 Sep 2005 21:13:59 GMT  
 Critique my first program?


Quote:
> >           let line :: <string> = as-lowercase(read-line(stream));

> Now here is the reason the code is slower than your C version: GD
> generates full generic function dispatch for as-lowercase and
> read-line, and a type check for the assignment.  I'm not sure at the
> moment what to do about it, but the streams library desperately needs
> an overhaul anyways.

True.  I've done a little, and the code currently in CVS should run much
faster than the latest official release (2.3.9).

If "stream" is declared as an <fd-stream> then you should see at least a
couple of MB per second using read-line.  One program of mine is using
read-line to input a 300 MB text file with seven million lines in 55
seconds on an Athlon 1800+.  That's over 5 MB per second.  "wc -l" and
"perl -e 'while(<>){}'" both take 11 - 12 seconds so there's still room
for improvement, but certainly not a factor of 25.

NB that as soon as I added the code to parse the useful stuff out of the
input lines the Dylan program became at least an order of magnitude
faster than the Perl version and I abondoned Perl...   The program takes
2m45s in total to process that 300 MB file (which includes sorting those
7m records), so even if the I/O was as optimal as wc and perl it would
only speed it up by 25%.

-- Bruce



Tue, 13 Sep 2005 22:08:41 GMT  
 Critique my first program?


Quote:
> define method char-to-num(c :: <character>) => (result :: <integer>)
>     case c == 'a' => 0;
>          c == 'b' => 1;
>          c == 'c' => 2;

What are peoples thoughts on using '==' here instead of '=' ? I know
they're semantically equivalent for characters, and that == could
theoretically produce better code, but I'd expect a decent compiler to
generate the same code for each and it just seems to me that using == is
implying something misleading, as though the programmer is saying =
wouldn't produce the desired results.


Wed, 21 Sep 2005 19:22:33 GMT  
 Critique my first program?


Quote:


> > define method char-to-num(c :: <character>) => (result :: <integer>)
> >     case c == 'a' => 0;
> >          c == 'b' => 1;
> >          c == 'c' => 2;

> What are peoples thoughts on using '==' here instead of '=' ? I know
> they're semantically equivalent for characters, and that == could
> theoretically produce better code, but I'd expect a decent compiler to
> generate the same code for each and it just seems to me that using == is
> implying something misleading, as though the programmer is saying =
> wouldn't produce the desired results.

Use = except when object identity is really what you want.
If you add type declarations, the right thing will happen.

I could easily see a subclass of character where equality
and identity are not the same.



Wed, 21 Sep 2005 22:59:05 GMT  
 Critique my first program?

Quote:

>> define method char-to-num(c :: <character>) => (result :: <integer>)
>>     case c == 'a' => 0;
>>          c == 'b' => 1;
>>          c == 'c' => 2;
> What are peoples thoughts on using '==' here instead of '=' ? I know

I don't think it really matters here.  But I would prefer to use
select instead of case here:

select(c)
  'a' => 0;
  'b' => 1;
  ...
end;

Andreas

--
"In my eyes it is never a crime to steal knowledge. It is a good
theft. The pirate of knowledge is a good pirate."
                                                       (Michel Serres)



Wed, 21 Sep 2005 22:46:03 GMT  
 Critique my first program?

Quote:

> Use = except when object identity is really what you want.
> If you add type declarations, the right thing will happen.

> I could easily see a subclass of character where equality
> and identity are not the same.

After thinking about it, I must agree.  One instance where characters
are equal, but not identical, is with compatibility characters in
Unicode.

For instance, the character can be represented both with the Latin1
compatibility mapping:

U+00DC LATIN CAPITAL LETTER U WITH DIAERESIS

or as a combining character:

U+0038 LATIN CAPITAL LETTER U, followed by
U+0308 COMBINING DIAERESIS

Not that Dylan supports Unicode at the moment, but it should at some
time.  Support for characters is tricky, and the Unicode people spent
quite some time getting the abstractions right, and covering almost
all relevant scripts and languages.

Andreas

--
"In my eyes it is never a crime to steal knowledge. It is a good
theft. The pirate of knowledge is a good pirate."
                                                       (Michel Serres)



Sat, 01 Oct 2005 18:42:19 GMT  
 
 [ 10 post ] 

 Relevant Pages 

1. Critique a first prog?

2. structured programming: critiques?

3. Programming Language Critiques WWW page

4. critique this program pleaase

5. critique my 1st program ever

6. My first awk program

7. Smalltalk as first programming language

8. My first venture at programming

9. Gitano Software affiliate program - Product Scope 32 PRO bonus to first 25 signees*

10. Formula One 6.0 and First Impression 6.0 Beta Programs Announced

11. Eiffel as the first programming language

12. First Annual Conference on Patterns of Programming Languages

 

 
Powered by phpBB® Forum Software