searching for arbitrary literal string as opposed to regexp
Author |
Message |
Jonathan Rochki #1 / 6
|
 searching for arbitrary literal string as opposed to regexp
I am writing a cgi script in perl, which will basically take a search string and search a file to see if that string exists within it, and tell you yes or no. [It actually does something a bit more complex, but this is the crucial part for my question]. This search string _isn't_ a regexp, however. It's just straight text that may or may not exist within a certain document. If it were a regexp, I'd just do a while(<FILE>) {/$searchstring/ && $found=1;} kind of thing, but that wont' work, because the user might unwittingly write a regexp when he means not to. For instance, he might be searching for a _literal_ period, not using a period as a wildcard as the above construction would. So what's the solution? What do I do? I have this nagging feeling that there's some real obvious answer, but for some reason I'm just stumped. Thanks in advance, cc's to email appreciated.
|
Sat, 22 Nov 1997 03:00:00 GMT |
|
 |
Ian Phillip #2 / 6
|
 searching for arbitrary literal string as opposed to regexp
Quote:
>This search string _isn't_ a regexp, however. It's just straight text >that may or may not exist within a certain document. If it were a regexp, >I'd just do a while(<FILE>) {/$searchstring/ && $found=1;} kind of thing, >but that wont' work, because the user might unwittingly write a regexp >when he means not to. For instance, he might be searching for a _literal_ >period, not using a period as a wildcard as the above construction would.
In Perl REs, all magic characters match /(\W|\\\w)/ . In perl 5, there's a new function, quotemeta, and a new RE character, \Q, specifically for this - i.e. you can match on /\Q$searchstring/. This allows the user to put \E in the string to countermand the quoting. Construe this as a feature :-) In earlier versions of perl, you'll need to do $searchstring =~ s/\W/\\$&/g Ian
|
Sun, 23 Nov 1997 03:00:00 GMT |
|
 |
Hans Muld #3 / 6
|
 searching for arbitrary literal string as opposed to regexp
Quote: >In Perl REs, all magic characters match /(\W|\\\w)/ . >In perl 5, there's a new function, quotemeta, and a new RE character, >\Q, specifically for this - i.e. you can match on /\Q$searchstring/. >This allows the user to put \E in the string to countermand the quoting.
Errhm, no. You can countermand \Q with \E in the RE itself, but if the variable $searchstring happens to contain the combination \E, perl will happily put an extra \ in front of it. For example, if you do: $searchstring = 'foo\E(*+)bar'; print "\Q$searchstring\E\n"; perl prints: foo\\E\(\*\+\)bar , as it should. -- Hope this helps, HansM
|
Sun, 23 Nov 1997 03:00:00 GMT |
|
 |
Alberto Accomaz #4 / 6
|
 searching for arbitrary literal string as opposed to regexp
: I am writing a cgi script in perl, which will basically take a search : string and search a file to see if that string exists within it, and tell : you yes or no. [It actually does something a bit more complex, but this : is the crucial part for my question]. I wrote some PERL code a while go along these lines. In it, I would let the user choose whether the input pattern was to be treated as a regex, an exact match and whether the output records should have the matched pattern show as bold. Here is the relevant code: # quote metachars if this is not a regular expression $patt =~ s/(\W)/\\$1/g if (!$in{'regex'}); # anchor search item if exact match $patt = '^' . "$patt" . '$' if ($in{'exact'}); # check regular expression syntax eval "/\$patt/";
Make sure to specify a correct <A HREF=\"http://www.cis.ohio-state.edu/htbin/info/info/perl.info,Regular%20Exp..."> PERL regular expression</A><BR>"); # exits } while(<FILE>) { if (/($patt)/) { # bold match if requested s/($1)/<B>\1<\/B>/g if ($in{'bold'}); # write out matched line &PrintHTML($_); } } Hope this helps. - Alberto ============ Alberto Accomazzi Smithsonian Astrophysical Observatory
http://cfa-www.harvard.edu/~alberto Cambridge, MA 02138 USA
|
Sun, 23 Nov 1997 03:00:00 GMT |
|
 |
Jeffrey Frie #5 / 6
|
 searching for arbitrary literal string as opposed to regexp
|> If it were a regexp, I'd just do a |> while(<FILE>) {/$searchstring/ && $found=1;} |> kind of thing, but that wont' work, because the user might |> unwittingly write a regexp when he means not to. while(<FILE>) { $found=1, $last if index($_, $searchstring) >= 0; Quote: }
Jeffrey --------------------------------------------------------------------------- -
See my Jap/Eng dictionary at http://www.wg.omron.co.jp/cgi-bin/j-e
|
Fri, 28 Nov 1997 03:00:00 GMT |
|
 |
Hans Muld #6 / 6
|
 searching for arbitrary literal string as opposed to regexp
Quote: >I am writing a cgi script in perl, which will basically take a search >string and search a file to see if that string exists within it, and tell >you yes or no. [It actually does something a bit more complex, but this >is the crucial part for my question]. >This search string _isn't_ a regexp, however. It's just straight text >that may or may not exist within a certain document. If it were a regexp, >I'd just do a while(<FILE>) {/$searchstring/ && $found=1;} kind of thing, >but that wont' work, because the user might unwittingly write a regexp >when he means not to. For instance, he might be searching for a _literal_ >period, not using a period as a wildcard as the above construction would. >So what's the solution? What do I do?
$searchstring =~ s/\W/\\$&/g; # defuse any magic characters while(<FILE>) { if(/$searchstring/o) { $found = 1; last; } } Or, if you insist on cramming everything on a single line: /$searchstring/o && ($found = 1) && close FILE while <FILE>; You can't "last" out of a postfix-while loop, but closing the file has pretty much the same effect. -- Hope this helps, HansM
|
Sat, 29 Nov 1997 03:00:00 GMT |
|
|
|