Regular Expressions/Pattern Matching/Unordered pattern 
Author Message
 Regular Expressions/Pattern Matching/Unordered pattern

I've been trying to find a regular expression or something to do some
pattern matching.  I'm currently using a RegEx library.

Basically I have a group of letters say "dtseupx".  Now I want to
apply words to this group of letters and see if it's possible to spell
that word with that group of letters.

So first I would check like "setup"  which is valid but "setups" is in
valid because there's only 1 "s" in the group of letters.  Also, "sex"
is valid but "axe" is invalid.

Is there some sort of regular expression to do this or would I have to
order the list of letters alphabetically first?  Has anybody come
across this before?

Cristov



Mon, 13 Dec 2004 23:22:58 GMT  
 Regular Expressions/Pattern Matching/Unordered pattern

Quote:

> I've been trying to find a regular expression or something to do some
> pattern matching.  I'm currently using a RegEx library.

> Basically I have a group of letters say "dtseupx".  Now I want to
> apply words to this group of letters and see if it's possible to spell
> that word with that group of letters.

> So first I would check like "setup"  which is valid but "setups" is in
> valid because there's only 1 "s" in the group of letters.  Also, "sex"
> is valid but "axe" is invalid.

I think you can use strspn to get a rough estimate and then something a
little more complicated to do a final comparison (e.g., to make sure a
letter is not used twice). For example, you might sort both "words", do
the strspn test, and then go character-by-character through your source
(dtseupx) skipping characters that don't match until you've matched all
the characters in the target (setups).

        david

--
If 91 were prime, it would be a counterexample to your conjecture.
    -- Bruce Wheeler



Tue, 14 Dec 2004 00:13:00 GMT  
 Regular Expressions/Pattern Matching/Unordered pattern

Quote:

> I've been trying to find a regular expression or something to do some
> pattern matching.  I'm currently using a RegEx library.

> Basically I have a group of letters say "dtseupx".  Now I want to
> apply words to this group of letters and see if it's possible to spell
> that word with that group of letters.

> So first I would check like "setup"  which is valid but "setups" is in
> valid because there's only 1 "s" in the group of letters.  Also, "sex"
> is valid but "axe" is invalid.

> Is there some sort of regular expression to do this or would I have to
> order the list of letters alphabetically first?  Has anybody come
> across this before?

> Cristov

This may be useful.  At

        www.csd.uwm.edu/~whopkins

there is an extension of grep called rex, including source code, written
in C.
Excerpts from rex.doc:

"   The Interleave Product of two regular expressions, A, B, is the set
of all
the interleavings of the expressions A and B.  For instance:

                 a^b = ab|ba
                 ab^ab = aabb|abab = a(a^b)b
                 a^b^c = abc|acb|bac|bca|cab|cba"

"   An example should illustrate just how powerful this notation,
particularily
the interleave product, is.  The regular expression

                       <(n^a^p^o^l^e^o^n)>

will match any anagram of Napoleon when REX is called with the -i option
(to make matching case insensitive).  This works in reverse as well.
One could
take an anagram (like aponenol) and run REX on a dictionary with it in
order
to find all of its matches."

Regards,
Brian Clausing



Tue, 14 Dec 2004 02:22:19 GMT  
 Regular Expressions/Pattern Matching/Unordered pattern

Quote:

>I've been trying to find a regular expression or something to do some
>pattern matching.  I'm currently using a RegEx library.

>Basically I have a group of letters say "dtseupx".  Now I want to
>apply words to this group of letters and see if it's possible to spell
>that word with that group of letters.

>So first I would check like "setup"  which is valid but "setups" is in
>valid because there's only 1 "s" in the group of letters.  Also, "sex"
>is valid but "axe" is invalid.

>Is there some sort of regular expression to do this or would I have to
>order the list of letters alphabetically first?  Has anybody come
>across this before?

Sounds like homework.

Rather than using regexes, which might be inefficient for this and
which have more flexibility than you need, I would probably write
special-purpose functions to do the job.  Seems to me, what you need
is a datatype that can represent the frequency (number of occurrences)
of each character in the character set of interest.  The obvious
choice would seem to be an array of ints.  If CHAR_BIT == 8, then
you'd want 256 ints.

Then you need one such array for the "target" pattern, and one for
the current candidate which you wish to test against the target for
a match.

And you need a function to scan a string and count the occurrences
of letters, storing the results in one of these arrays.

And you need a function to compare two such arrays: if every integer
in the "candidate" array is <= the corresponding int in the "target"
array, then you have a hit.

Code this up and bring it with you the next time you show up here.  :)

--Ben

--



Tue, 14 Dec 2004 02:24:45 GMT  
 Regular Expressions/Pattern Matching/Unordered pattern

Quote:
>I've been trying to find a regular expression or something to do some
>pattern matching.  I'm currently using a RegEx library.

>Basically I have a group of letters say "dtseupx".  Now I want to
>apply words to this group of letters and see if it's possible to spell
>that word with that group of letters.

>So first I would check like "setup"  which is valid but "setups" is in
>valid because there's only 1 "s" in the group of letters.  Also, "sex"
>is valid but "axe" is invalid.

>Is there some sort of regular expression to do this or would I have to
>order the list of letters alphabetically first?  Has anybody come
>across this before?

This seems to work (not extensively tested):

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int
can_spell_it(const char *letters,const char *word)
{
   char *dup;
   int match = 1;
   size_t len;

   len = strlen(letters);
   dup = malloc(len + 1);

   if (!dup) {
      match = -1;
   } else {
      memcpy(dup,letters,len+1);
      while (match && *word) {
         char *found = memchr(dup,*word,len);  
         if (found) {
            *found = 0; /* mark as visited. */  
            ++word;
         } else {
            match = 0;
         }
      }
      free(dup);
   }  
   return match;

Quote:
}

int main(int argc,char **argv)
{
   int match;

   if (argc < 2) {
      fprintf(stderr,"usage: %s <letters> <word>\n",argv[0]);
      return EXIT_FAILURE;
   }
   printf("Can \"%s\" be spelled with \"%s\"? %s\n",
          argv[2],argv[1],
          (match = can_spell_it(argv[1],argv[2])) == 1 ?
           "YES" : match == 0 ? "NO" : "OUT OF MEMORY");
   return 0;

Quote:
}



Tue, 14 Dec 2004 04:47:23 GMT  
 Regular Expressions/Pattern Matching/Unordered pattern

Quote:



> >I've been trying to find a regular expression or something to do some
> >pattern matching.  I'm currently using a RegEx library.

> >Basically I have a group of letters say "dtseupx".  Now I want to
> >apply words to this group of letters and see if it's possible to spell
> >that word with that group of letters.

> >So first I would check like "setup"  which is valid but "setups" is in
> >valid because there's only 1 "s" in the group of letters.  Also, "sex"
> >is valid but "axe" is invalid.

> >Is there some sort of regular expression to do this or would I have to
> >order the list of letters alphabetically first?  Has anybody come
> >across this before?

> Sounds like homework.

> Rather than using regexes, which might be inefficient for this and
> which have more flexibility than you need, I would probably write
> special-purpose functions to do the job.  Seems to me, what you need
> is a datatype that can represent the frequency (number of occurrences)
> of each character in the character set of interest.  The obvious
> choice would seem to be an array of ints.  If CHAR_BIT == 8, then
> you'd want 256 ints.

> Then you need one such array for the "target" pattern, and one for
> the current candidate which you wish to test against the target for
> a match.

> And you need a function to scan a string and count the occurrences
> of letters, storing the results in one of these arrays.

> And you need a function to compare two such arrays: if every integer
> in the "candidate" array is <= the corresponding int in the "target"
> array, then you have a hit.

> Code this up and bring it with you the next time you show up here.  :)

> --Ben

> --

I actually do have this working using an array that keeps track of the
number of each letter that occurs in the search group and the actual
word.  It just takes time to go through each word character by
character in order to establish the frequency of letters and then
compare it to the search group, especially when you're searching huge
dictionaries for matching words.  I thought that using RegEx might
help me speed things up a bit.

Thanks,
Chris



Mon, 03 Jan 2005 07:35:14 GMT  
 Regular Expressions/Pattern Matching/Unordered pattern


Quote:



> > >I've been trying to find a regular expression or something to do some
> > >pattern matching.  I'm currently using a RegEx library.

> > >Basically I have a group of letters say "dtseupx".  Now I want to
> > >apply words to this group of letters and see if it's possible to spell
> > >that word with that group of letters.

> > >So first I would check like "setup"  which is valid but "setups" is in
> > >valid because there's only 1 "s" in the group of letters.  Also, "sex"
> > >is valid but "axe" is invalid.

> > >Is there some sort of regular expression to do this or would I have to
> > >order the list of letters alphabetically first?  Has anybody come
> > >across this before?

[...]

> I actually do have this working using an array that keeps track of the
> number of each letter that occurs in the search group and the actual
> word.  It just takes time to go through each word character by
> character in order to establish the frequency of letters and then
> compare it to the search group, especially when you're searching huge

I'm not sure it would be much more efficient, but you could build a valid
Regex. Like if you can have an AND-ed expression for "dts".

([dt]*s[dt]*)&([st]*d[st]*)&([sd]*t[sd]*)

This would insure (if I'm not wrong) that there is a match if and only if
the "dts" letters are found only once. Maybe you should add ^ and $ to
insure that the match is limited (this would complexify the example, but
you can certainly add these by yourself.

The advantage is that the expression, though complex to read is rather
simple to compute from the letters group.

Check that your library can support and-ing expressions. If it does not
consider using one like my own YGrep Search Engine
(http://ygrep.cjb.net/).

Yves Roumazeilles



Wed, 05 Jan 2005 03:17:48 GMT  
 
 [ 7 post ] 

 Relevant Pages 

1. Pattern Matching | Regular Expressions

2. Regular Expression pattern matching classes

3. RegExp - regular expression pattern

4. Regular Expression - Replace Pattern

5. Pattern Definition GREP With Pattern Editing

6. Pattern Matching in C -

7. Pattern Matching Tool

8. Pattern Matching in C

9. Help, Pattern used in Regex.Matches(...)

10. pattern matching and string replacement

11. help with a fast pattern matching utility requested

12. Pattern Matching in execution time

 

 
Powered by phpBB® Forum Software