Bit-string matching algorithm wanted
Author Message
Bit-string matching algorithm wanted

The following problem is simple in concept, awkward to describe,
and hard (for me) to solve.

Define D as the number of bit positions in which two bit strings
differ.  Another way to say the same thing is that it's the number
of "1" bits in (string1 XOR string2).  For example,
D(011010, 111000) = 2.

I want an algorithm that will allow me to quickly determine which
bit strings in a permanently stored satisfy
D(stored_stringN, test_string) <= K, where K is some arbitrary constant.

I *don't* just want an efficient way to scan every string
in the stored set and see if it matches, for reasons that will
become obvious.  I want a way to be able to jump right to the
strings which match the criteria, or to at least examine
only a subset of the total data set.

Sample problem follows.

Stored bit strings of about 512 bits each.  Somewhere in the
neighborhood of 16384 permanent strings.  The bit strings will
never change once the problem is set up, so time-consuming pre-
processing makes sense if it makes future access faster.
The stored bit strings will then be compared with tens of thousands
of test bit strings (also 512 bits each), and for each test string I
need to be able to quickly determine which stored bit strings differ
by no more than, say, 100 bits.  The 100 bit limit also will be
fixed and unchanging during the course of the problem.
All the numbers listed above may be altered somewhat if it
simplifies the problem.

The stored bit strings can be either random or carefully chosen to
make searching easier, as long as they are somewhat uniformly
distributed throughout the space of all possible strings.  In fact,
the bit strings don't even have to be stored at all!  If the "stored"
strings are generated by some predictable formula, the index (0-16383)
of the appropriate strings will be sufficient.  I don't even need to be
able to reconstruct the original bit strings.  I only need the
mathematical property that similar test strings will return
similar subsets of the indices in a way consistent with the
bit-string matching model.

For the curious, I'm trying to efficiently simulate a sparse
distributed associative computer memory.  Please feel free to
suggest other appropriate newsgroups or to forward this problem
to people who might be helpful.  Thank you.

Tue, 31 Oct 1995 05:05:31 GMT

 Page 1 of 1 [ 1 post ]

Relevant Pages