Need a FAST String Comparison! 
Author Message
 Need a FAST String Comparison!

HELP!  I've got a program that has to look for string patterns from a
dictionary, and it is SOOOO slow!  I need a faster way to check string
equivalency than :

For I = 0 to DictCount
   If Dict(I) = CheckString Then Exit For
Next I

If anybody has any ideas for a quicker comparison, or knows it's just not
possible, any help would be greatly appreciated.
-Jeff Mitchell



Sat, 23 Dec 2000 03:00:00 GMT  
 Need a FAST String Comparison!
How about storing them in a database table
(with an appropriate index), then doing a FindFirst
or a Seek)? Depending on the size of the dictionary,
it will probably be faster overall.
Quote:

>HELP!  I've got a program that has to look for string patterns from a
>dictionary, and it is SOOOO slow!  I need a faster way to check string
>equivalency than :

>For I = 0 to DictCount
>   If Dict(I) = CheckString Then Exit For
>Next I

>If anybody has any ideas for a quicker comparison, or knows it's just not
>possible, any help would be greatly appreciated.
>-Jeff Mitchell




Sat, 23 Dec 2000 03:00:00 GMT  
 Need a FAST String Comparison!
If they're sorted (alphabetically) a binary search is the fastest there is.

If you need me to explain how its done (there are plenty of examples on the
Net) I'll watch for your postings...



Sat, 23 Dec 2000 03:00:00 GMT  
 Need a FAST String Comparison!

Quote:

> HELP!  I've got a program that has to look for string patterns from a
> dictionary, and it is SOOOO slow!  I need a faster way to check string
> equivalency than :

> For I = 0 to DictCount
>    If Dict(I) = CheckString Then Exit For
> Next I

> If anybody has any ideas for a quicker comparison, or knows it's just not
> possible, any help would be greatly appreciated.

It all depends on the dictionary and the amount of control you have over
it.

- If it's sorted, you can do a binary search. Easy to implement and
fairly quick.

- If the dictionary is fairly static, you can build up a sorted index on
the side with pointers into the dictionary. Every once in a while you
rebuild the index. If the item isn't in the index, you perform the
standard search. This approach is VERY bad if you expect a lot of misses
- for every miss you have to perform TWO searches...

- If you have total control/knowledge over how/when items are added and
removed, you can maintain a hash table with indices into the dictionary
'on the side'. No need to update the index periodically, and you'll know
if it's a hit or miss without needing to perform the secondary linear
search.

- If you have full control over the dictionary, you have an almost
unlimited number of choices. You can implement it as a a hash table, a
binary tree, a treap etc. etc. It all depends on how often you'll add,
remove and search for items from it, the expected number of items, how
much time you can to spend on the code and so on. In case of hurry,
perhaps you could even look into using a third-party component.

Etc.

If you make clear what kind of influence and limitations you have over
the dictionary, it's a lot easier to name a possible solution.

Quote:
> -Jeff Mitchell


Regards
/J

--
Lente Impelle.



Sat, 23 Dec 2000 03:00:00 GMT  
 Need a FAST String Comparison!
Your problem is not the speed of your string comparison.  Your problem is
that your algorithm is crappy.  If all you are doing is searching to see if
a word is in a dictionary, you need to use some sort of fast lookup.  You
can do this sort of thing several ways.  There are a number of n*log(n)
algorithms available which use trees and comparisons, and there is also
hashing.  

The simplest method, which will probably yield passable results, is to use
VBs collection object in a slightly offbeat way.  For every word in your
dictionary, set all the characters to uppercase and insert the number 1
into the collection using your word as the key.  

Now, when you want to see if the word is in the dictionary, try to get the
word (upper-cased) from the collection.  If it doesn't give you an error,
it's in there.  If it gives you an error, it was not in there.

VBs collection using hashing to retrieve values from its collection, so it
is reasonably fast.  It also stores all values as variants, and you don't
really need any value to be associated with the words in the dictionary.
That is just the price you pay for not having to do it the hard way.



Quote:
> HELP!  I've got a program that has to look for string patterns from a
> dictionary, and it is SOOOO slow!  I need a faster way to check string
> equivalency than :

> For I = 0 to DictCount
>    If Dict(I) = CheckString Then Exit For
> Next I

> If anybody has any ideas for a quicker comparison, or knows it's just not
> possible, any help would be greatly appreciated.
> -Jeff Mitchell




Sat, 23 Dec 2000 03:00:00 GMT  
 Need a FAST String Comparison!

Quote:
> Aieee! FindFirst blows! Seek or doing a SQL query's much, much
> better. A real database would also facilitate finding correct
> words that are "close to" a misspelled word, much like the
> "alternates" that most spelling checkers can pull up. Try using
> Soundex. <shameless_plug> The full-text indexing code at
> <http://members.ricochet.net/~jfoster/> might be worth looking at
> for ideas. </shameless_plug>

This is assuming quite a literal definition of "dictionary"; for
some reason I have a hunch that the original poster was looking for
a more generic sort of solution.  Unfortunately that original
poster didn't really provide enough information for a strong
optimization (but there are a number of things better than the
original solution, even without making too many assumptions.)

Quote:
> The comparisons don't need to be faster, really. If the words in Dict
> are in sorted order, you can use a binary search:

That's a pretty big if.  Sorting the 'dictionary' could outweigh
any advantages from the binary search, especially if the dictionary
is especially volatile or dynamic.

It also assumes that the dictionary CAN be sorted, in the sense
that it consists of tokenized delimited atoms.  It may
(for example) consist of compressed encrypted text.  Again, there
is not really enough data in the original question to decide.




Sat, 23 Dec 2000 03:00:00 GMT  
 
 [ 7 post ] 

 Relevant Pages 

1. I need a FAST string comparison in VB

2. : Need Help with Now() and String comparison

3. Comparison strings with strings?

4. fast string search algo needed

5. q: Need super fast string search method...

6. Need Fast DLL/OCX for String Processing/Parsing

7. Need help which way is fastest to pick out a segment from a string

8. Speed Comparison Post #1 of 2 (Was Fast Drawing)

9. Speed Comparison Post #2 of 2 - quite large (Was Fast Drawing)

10. String comparison does not work

11. Null comparison with string

12. Single quote and dash act strangely in string comparison

 

 
Powered by phpBB® Forum Software