string or int hashtables 
Author Message
 string or int hashtables

I need to search in a big word dictionary (similar to nick levin's fast-index)

Now I have two options:

A string hash-table with :test equal
or a numified hash-table with :test eql, where the number is a condensed but
unique integer representation of the string. so sxhash is not enough.

The problem with lisp is the different and rather small number of bytes for
immediate ints.
with 32 bit I get about 15-16 chars per word, with 24 much less.
So on any word longer than this upper limit the hash has to be upgraded to the
new type.
I think this will be slower than an initial string hash and :test equal.
I believe that the longest word will have 34 chars.
It should run on clisp, cmucl, corman lisp and scsh.
It should run fast, size does not matter.

So I'm stuck to equal, or?
I could also make two hashes seperated on the word length.
an immediate eq hash with ints, and an equal for the few long words.
is this common practice or does everybody just use strings?
--
Reini Urban
http://www.*-*-*.com/



Wed, 18 Feb 2004 07:45:35 GMT  
 string or int hashtables

Quote:

> I need to search in a big word dictionary (similar to nick levin's fast-index)

> Now I have two options:

> A string hash-table with :test equal
> or a numified hash-table with :test eql, where the number is a condensed but
> unique integer representation of the string. so sxhash is not enough.

As a relative Lisp beginner I may be missing something, but the # of
buckets in the hash table and the distribution across the buckets should
be the performance drivers in this application. The :test time is only a
factor as the bucket length grows.

joe



Wed, 18 Feb 2004 09:52:42 GMT  
 string or int hashtables

Quote:

> I need to search in a big word dictionary (similar to nick levin's fast-index)
> Now I have two options:

> A string hash-table with :test equal
> or a numified hash-table with :test eql, where the number is a condensed but
> unique integer representation of the string. so sxhash is not enough.

> The problem with lisp is the different and rather small number of bytes for
> immediate ints.
> with 32 bit I get about 15-16 chars per word, with 24 much less.
> So on any word longer than this upper limit the hash has to be upgraded to the
> new type.
> I think this will be slower than an initial string hash and :test equal.
> I believe that the longest word will have 34 chars.
> It should run on clisp, cmucl, corman lisp and scsh.
> It should run fast, size does not matter.

> So I'm stuck to equal, or?
> I could also make two hashes seperated on the word length.
> an immediate eq hash with ints, and an equal for the few long words.
> is this common practice or does everybody just use strings?

Maybe I'm the only one but I just don't understand the problem description
well enough to even guess how to begin looking for a solution.

I'm always weirded out by an "initial" problem description that begins
with "I have two options".  e.g., is it clear a priori that simple binary
search in a packed table would not be in the running?

And, btw, is this a "big" "word dictionary" or a "big word" "dictionary"?



Wed, 18 Feb 2004 12:27:48 GMT  
 string or int hashtables

Quote:

> I need to search in a big word dictionary (similar to nick levin's fast-index)

> Now I have two options:

> A string hash-table with :test equal
> or a numified hash-table with :test eql, where the number is a condensed but
> unique integer representation of the string. so sxhash is not enough.

As someone else said, the performance of a hashtable - if the
collisions are low, should be dominated by the time-to-hash, not the
equality test.  So if you make a suitably big hashtable you should be
doing pretty well.

But you have at *least* one more option, especially if size does not
matter (which I think you said).

(defun make-dictionary (name)
  (make-package name :use '()))

(defun intern-in-dictionary (word dict)
  (intern word dict))

(defun lookup-in-dictionary (word dict)
  (let ((f (find-symbol word dict)))
    (if f (symbol-name f) nil)))

This is typed in without access to a Lisp to test it, but something
like this should work, and probably be quick but waste a lot of space
for all the symbol slots.

There may well be other options as well.

--tim



Fri, 20 Feb 2004 18:12:33 GMT  
 string or int hashtables


Quote:
>But you have at *least* one more option, especially if size does not
>matter (which I think you said).

>(defun make-dictionary (name)
>  (make-package name :use '()))

This seems like a pretty silly solution, since a package will usually be
implemented internally as a string hashtable.

--

Genuity, Woburn, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.



Sun, 22 Feb 2004 00:57:37 GMT  
 string or int hashtables

Quote:

> This seems like a pretty silly solution, since a package will usually be
> implemented internally as a string hashtable.

I agree.  However I once did some tests which had packages
outperforming any hashtable I could make (that wasn't a package).  I
forget which implementation it was unfortunately.  Really my point was
that there are hashtable-like objects which are specialised to strings
- packages.

--tim



Sun, 22 Feb 2004 17:51:12 GMT  
 
 [ 6 post ] 

 Relevant Pages 

1. String HashTables

2. another problem with hashtables

3. Hashtables in Lisp

4. weak hashtables in LW 4.2

5. string to int or float

6. How to convert int to string

7. String to Int in GHC?

8. Conversion from string to int

9. Question ml: int to string

10. Conversion Int <-> String in Haskell

11. int->string

12. String-to-Int

 

 
Powered by phpBB® Forum Software