Regular Expressions/Pattern Matching/Unordered pattern
Author |
Message |
Chri #1 / 13
|
 Regular Expressions/Pattern Matching/Unordered pattern
I've been trying to find a regular expression or something to do some pattern matching. Basically I have a group of letters say "dtseupx". Now I want to apply words to this group of letters and see if it's possible to spell that word with that group of letters. So first I would check like "setup" which is valid but "setups" is in valid because there's only 1 "s" in the group of letters. Also, "sex" is valid but "axe" is invalid. Is there some sort of regular expression to do this or would I have to order the list of letters alphabetically first? Has anybody come across this before? Cristov
|
Sun, 12 Dec 2004 16:55:55 GMT |
|
 |
Steffen Mülle #2 / 13
|
 Regular Expressions/Pattern Matching/Unordered pattern
Quote: > I've been trying to find a regular expression or something to do some > pattern matching. > Basically I have a group of letters say "dtseupx". Now I want to > apply words to this group of letters and see if it's possible to spell > that word with that group of letters. > So first I would check like "setup" which is valid but "setups" is in > valid because there's only 1 "s" in the group of letters. Also, "sex" > is valid but "axe" is invalid. > Is there some sort of regular expression to do this or would I have to > order the list of letters alphabetically first? Has anybody come > across this before?
Here's a quick solution. I am not aware of a pure-regex solution, but I am no re guru. "Quick" as in quickly written, not fast not efficient. #!/usr/bin/perl my $chars = 'sadletjluaskpdjlekjsasd'; my %chars; foreach ( split //, $chars ) { $chars{$_}++; Quote: }
my %copy = %chars;
print "$word can " . ( ( grep { $_ < 0 } values %copy ) ? 'not ' : '' ) . "be made of the characters $chars.\n"; Quote: }
Steffen
|
Sun, 12 Dec 2004 17:12:20 GMT |
|
 |
Bart Lateu #3 / 13
|
 Regular Expressions/Pattern Matching/Unordered pattern
Quote:
>Basically I have a group of letters say "dtseupx". Now I want to >apply words to this group of letters and see if it's possible to spell >that word with that group of letters. >So first I would check like "setup" which is valid but "setups" is in >valid because there's only 1 "s" in the group of letters. Also, "sex" >is valid but "axe" is invalid. >Is there some sort of regular expression to do this or would I have to >order the list of letters alphabetically first? Has anybody come >across this before?
Ooh, there's a thought. Sort your letters of both words alphabetically. Next, try to match the larger word with the smaller word, with .* inserted between the letters. So: $pool = join '', sort split //, 'dtseupx'; $word = 'setup'; $pattern = join '.*', sort split //, $word; if($pool =~ /$pattern/) { print "We've got a winner!\n"; } else { print "Better luck next time.\n"; } Now, that works. I wouldn't think it's impossible that it would occasionally take a long time, because of a lot of backtracking in the search, thanks to the many occurrences of ".*". Now, perhaps the "do not backtrack here" assertion might help, I don't know. Perhaps somebody with a better understanding of those, could jump in. (See /(?>PATTERN)/ in perlre.) Another approach would be to use hashes. You make a pool of available letters, with a count for each. Now, for all letters in the word, subtract one from its hash value. It fails as soon as you find one negative result. Like this: $pool = 'dtseupx'; $word = 'setup'; foreach (split //, $pool) { $present{$_}++; } my $fail; foreach (split //, $word) { if(--$present{$_} < 0) { $fail = 1; last; } } unless($fail) { print "We've got a winner!\n"; } else { print "Better luck next time.\n"; } -- Bart.
|
Sun, 12 Dec 2004 17:15:49 GMT |
|
 |
Steffen Mülle #4 / 13
|
 Regular Expressions/Pattern Matching/Unordered pattern
[...] Quote: > $pool = 'dtseupx'; > $word = 'setup'; > foreach (split //, $pool) { > $present{$_}++; > } > my $fail; > foreach (split //, $word) { > if(--$present{$_} < 0) { > $fail = 1; > last; > }
Maybe add this line here to speed things up: last if $fail; Quote: > } > unless($fail) { > print "We've got a winner!\n"; > } else { > print "Better luck next time.\n"; > }
Steffen
|
Sun, 12 Dec 2004 17:27:16 GMT |
|
 |
Bart Lateu #5 / 13
|
 Regular Expressions/Pattern Matching/Unordered pattern
Quote:
>> if(--$present{$_} < 0) { >> $fail = 1; >> last; >> } >Maybe add this line here to speed things up: >last if $fail;
Won't help. last is executed anyway, as soon as fail is set. You could simplify it to $fail = --$present{$_} < 0 and last; -- Bart.
|
Sun, 12 Dec 2004 17:29:01 GMT |
|
 |
Steffen Mülle #6 / 13
|
 Regular Expressions/Pattern Matching/Unordered pattern
Quote:
> >> if(--$present{$_} < 0) { > >> $fail = 1; > >> last; > >> } > >Maybe add this line here to speed things up: > >last if $fail; > Won't help. last is executed anyway, as soon as fail is set.
Yes, of course. I somehow misread the inner if block to be a loop out of which the last would jump. I'll shut down my brain for an overhaul now and go play squash. :) Steffen
|
Sun, 12 Dec 2004 17:46:33 GMT |
|
 |
Godzilla #7 / 13
|
 Regular Expressions/Pattern Matching/Unordered pattern
Quote:
> Basically I have a group of letters say "dtseupx". Now I want to > apply words to this group of letters and see if it's possible to spell > that word with that group of letters. > So first I would check like "setup" which is valid but "setups" is in > valid because there's only 1 "s" in the group of letters. Also, "sex" > is valid but "axe" is invalid.
This can be easily accomplished with a simple substitution operator and Perl's length () function. Godzilla! -- My primary web site enjoys an average twelve-thousand hits per month. How popular is your web site? TEST SCRIPT: ____________ #!perl print "Content-type: text/plain\n\n";
$input = "setup"; $test = $input; &Compare; $input = "setups"; $test = $input; &Compare; $input = "sex"; $test = $input; &Compare; $input = "axe"; $test = $input; &Compare; sub Compare { local ($") = "";
{ $test =~ s/$_//; if (length ($test) == 0)
} if (length ($test) > 0)
} PRINTED RESULTS: ________________ setup can be spelled with dtseupx setups cannot be spelled with dtseupx sex can be spelled with dtseupx axe cannot be spelled with dtseupx
|
Sun, 12 Dec 2004 18:00:17 GMT |
|
 |
Janek Schleiche #8 / 13
|
 Regular Expressions/Pattern Matching/Unordered pattern
Chris wrote at Wed, 26 Jun 2002 17:55:55 +0200: Quote: > I've been trying to find a regular expression or something to do some pattern matching. > Basically I have a group of letters say "dtseupx". Now I want to apply words to this group of > letters and see if it's possible to spell that word with that group of letters. > So first I would check like "setup" which is valid but "setups" is in valid because there's only > 1 "s" in the group of letters. Also, "sex" is valid but "axe" is invalid. > Is there some sort of regular expression to do this or would I have to order the list of letters > alphabetically first? Has anybody come across this before?
Year, in fact there's a one liner: Let's assume that my $letters = "dtseupx"; my $word = "setups"; Then $word =~ s/$_// for split(//, $letters); removes as much (valid) letters from word. So now $word is empty iff it had been an valid word. E.g. print $word ? "In-Valid" : "Valid"; Greetings, Janek
|
Sun, 12 Dec 2004 19:13:36 GMT |
|
 |
Benjamin Goldber #9 / 13
|
 Regular Expressions/Pattern Matching/Unordered pattern
Quote:
> I've been trying to find a regular expression or something to do some > pattern matching. > Basically I have a group of letters say "dtseupx". Now I want to > apply words to this group of letters and see if it's possible to spell > that word with that group of letters. > So first I would check like "setup" which is valid but "setups" is > invalid because there's only 1 "s" in the group of letters. Also, > "sex" is valid but "axe" is invalid.
for my $str ( qw( setup setups sex axe ) ) { print "$str is "; print "in" if $str =~ tr/dtseupx//c; print "valid\n"; Quote: } > Is there some sort of regular expression to do this or would I have to > order the list of letters alphabetically first? Has anybody come > across this before? > Cristov
--
pack 'u', pack 'H*', 'ab5cf4021bafd28972030972b00a218eb9720000';
|
Sun, 12 Dec 2004 22:10:58 GMT |
|
 |
Godzilla #10 / 13
|
 Regular Expressions/Pattern Matching/Unordered pattern
(snipped) Quote: > for my $str ( qw( setup setups sex axe ) ) { > print "$str is "; > print "in" if $str =~ tr/dtseupx//c; > print "valid\n"; > }
You would be prudent to test your code before posting. Godzilla! -- My primary web site enjoys an average twelve-thousand hits per month. How popular is your web site?
|
Sun, 12 Dec 2004 21:32:17 GMT |
|
 |
Bart Lateu #11 / 13
|
 Regular Expressions/Pattern Matching/Unordered pattern
Quote:
>> So first I would check like "setup" which is valid but "setups" is >> invalid because there's only 1 "s" in the group of letters. Also, >> "sex" is valid but "axe" is invalid. >for my $str ( qw( setup setups sex axe ) ) { > print "$str is "; > print "in" if $str =~ tr/dtseupx//c; > print "valid\n"; >}
First of all, on 5.6.1, this prints: Modification of a read-only value attempted in the tr///, of course. I thought this got fixed. Second, you're doing something wrong: setup is valid setups is valid sex is valid axe is invalid The second one is wrong. You're not counting instances of the same letter. -- Bart.
|
Sun, 12 Dec 2004 21:32:40 GMT |
|
 |
Benjamin Goldber #12 / 13
|
 Regular Expressions/Pattern Matching/Unordered pattern
Quote:
> >> So first I would check like "setup" which is valid but "setups" is > >> invalid because there's only 1 "s" in the group of letters. Also, > >> "sex" is valid but "axe" is invalid. > >for my $str ( qw( setup setups sex axe ) ) { > > print "$str is "; > > print "in" if $str =~ tr/dtseupx//c; > > print "valid\n"; > >} > First of all, on 5.6.1, this prints: > Modification of a read-only value attempted > in the tr///, of course. I thought this got fixed.
It did, in 5.8.0. Quote: > Second, you're doing something wrong: > setup is valid > setups is valid > sex is valid > axe is invalid > The second one is wrong. You're not counting instances of the same > letter.
Oops, you're right. Hmm. for my $string ( qw( setup setups sex axe ) ) { my $scopy = $string; # First, check with tr///, since that's fast. print("$string is invalid\n"), next if $scopy =~ tr/dtseupx//c; $scopy =~ s/\Q$_// for qw[d t s e u p x]; print("$string is invalid\n"), next if length $scopy; print("$string is valid\n"); Quote: }
[still untested, though] The second part might be done better with a hash, as others have suggested. --
pack 'u', pack 'H*', 'ab5cf4021bafd28972030972b00a218eb9720000';
|
Sun, 12 Dec 2004 23:22:40 GMT |
|
 |
Ian Boreh #13 / 13
|
 Regular Expressions/Pattern Matching/Unordered pattern
Quote:
> Ooh, there's a thought. Sort your letters of both words alphabetically.
An idea along similar lines: #!/usr/bin/perl -w use strict; my $letters_pattern = '^' . join('?', sort split //, 'dtseupx') . "?\$"; foreach my $word (<DATA>) { chomp($word); my $sorted_word = join '', sort split //, $word; print "$word: ", ($sorted_word =~ /$letters_pattern/)? "yes\n": "no\n"; Quote: }
__DATA__ setup setups axe sex miscellaneous This version sorts the allowed letters, and makes each optional in the regex, i.e. the matching string must consist of some subset of the allowed set. Of course the input word must be sorted also. Quote: > Now, that works. I wouldn't think it's impossible that it would > occasionally take a long time, because of a lot of backtracking in the > search, thanks to the many occurrences of ".*".
There's still a bit of backtracking possible in this version, but it's greatly restricted since there are only "?"s, and no ".*"s. It would be possible to generate a regex for an allowed set that did not require sorting of the input, but the regex would still need to be generated beforehand, so it would be of marginal usefulness. Two approaches to this are: . recursively make a regex that is the union of regexes starting with each letter and continuing with a regex that takes the same form for the remaining letters (optionally). Basically it would describe all languages consisting of all permutations of any number of the allowed letters . Using lookahead assertions that each allowed letter appears in the string no more times than in the allowed list. #!/usr/bin/perl -w use strict; my $allowed = 'dtseuspx'; my $letters_pattern = "^(?!.*[^$allowed])"; foreach my $letter (split //, $allowed) { my $count = ($allowed =~ s/$letter//g) + 1; $letters_pattern .= "(?!(?:.*$letter){$count})" if ($count > 1); Quote: }
print "PATTERN: $letters_pattern\n"; foreach my $word (<DATA>) { chomp($word); print "$word: ", ($word =~ /$letters_pattern/)? "yes\n": "no\n"; Quote: }
__DATA__ setup setups setupss axe sex miscellaneous This constructs a regex that forbids disallowed letters, and adds an assertion for each allowed letter that it does not appear too many times. I guess this might be useful, in that any string can then simply be matched against the (compiled) pattern at a later stage. Note that these techniques will work with letters, but any regex metachars would need special handling. Regards, Ian
|
Fri, 17 Dec 2004 10:22:52 GMT |
|
|
|