searching for arbitrary literal string as opposed to regexp 
Author Message
 searching for arbitrary literal string as opposed to regexp

I am writing a cgi script in perl, which will basically take a search
string and search a file to see if that string exists within it, and tell
you yes or no.  [It actually does something a bit more complex, but this
is the crucial part for my question].

This search string _isn't_ a regexp, however.  It's just straight text
that may or may not exist within a certain document.  If it were a regexp,
I'd just do a while(<FILE>) {/$searchstring/ && $found=1;} kind of thing,
but that wont' work, because the user might unwittingly write a regexp
when he means not to.  For instance, he might be searching for a _literal_
period, not using a period as a wildcard as the above construction would.

So what's the solution?  What do I do?  I have this nagging feeling that
there's some real obvious answer, but for some reason I'm just stumped.

Thanks in advance, cc's to email appreciated.



Sat, 22 Nov 1997 03:00:00 GMT  
 searching for arbitrary literal string as opposed to regexp

Quote:

>This search string _isn't_ a regexp, however.  It's just straight text
>that may or may not exist within a certain document.  If it were a regexp,
>I'd just do a while(<FILE>) {/$searchstring/ && $found=1;} kind of thing,
>but that wont' work, because the user might unwittingly write a regexp
>when he means not to.  For instance, he might be searching for a _literal_
>period, not using a period as a wildcard as the above construction would.

In Perl REs, all magic characters match /(\W|\\\w)/ .

In perl 5, there's a new function, quotemeta, and a new RE character,
\Q, specifically for this - i.e. you can match on /\Q$searchstring/.
This allows the user to put \E in the string to countermand the quoting.
Construe this as a feature :-)

In earlier versions of perl, you'll need to do $searchstring =~ s/\W/\\$&/g

Ian



Sun, 23 Nov 1997 03:00:00 GMT  
 searching for arbitrary literal string as opposed to regexp

Quote:
>In Perl REs, all magic characters match /(\W|\\\w)/ .
>In perl 5, there's a new function, quotemeta, and a new RE character,
>\Q, specifically for this - i.e. you can match on /\Q$searchstring/.
>This allows the user to put \E in the string to countermand the quoting.

Errhm, no.  You can countermand \Q with \E in the RE itself, but if the
variable $searchstring happens to contain the combination \E, perl will
happily put an extra \ in front of it.  For example, if you do:

        $searchstring = 'foo\E(*+)bar';
        print "\Q$searchstring\E\n";

perl prints:

        foo\\E\(\*\+\)bar

, as it should.

--
Hope this helps,

HansM



Sun, 23 Nov 1997 03:00:00 GMT  
 searching for arbitrary literal string as opposed to regexp
: I am writing a cgi script in perl, which will basically take a search
: string and search a file to see if that string exists within it, and tell
: you yes or no.  [It actually does something a bit more complex, but this
: is the crucial part for my question].

I wrote some PERL code a while go along these lines.
In it, I would let the user choose whether the input pattern
was to be treated as a regex, an exact match and whether the
output records should have the matched pattern show as bold.
Here is the relevant code:

        # quote metachars if this is not a regular expression
        $patt =~ s/(\W)/\\$1/g if (!$in{'regex'});
        # anchor search item if exact match
        $patt = '^' . "$patt" . '$' if ($in{'exact'});
        # check regular expression syntax
        eval "/\$patt/";


                Make sure to specify a correct
                <A HREF=\"http://www.cis.ohio-state.edu/htbin/info/info/perl.info,Regular%20Exp...">
                PERL regular expression</A><BR>");
            # exits
        }
        while(<FILE>) {
            if (/($patt)/) {
                # bold match if requested
                s/($1)/<B>\1<\/B>/g if ($in{'bold'});
                # write out matched line
                &PrintHTML($_);
            }
        }

Hope this helps.

- Alberto

============
Alberto Accomazzi                       Smithsonian Astrophysical Observatory

http://cfa-www.harvard.edu/~alberto     Cambridge, MA  02138  USA



Sun, 23 Nov 1997 03:00:00 GMT  
 searching for arbitrary literal string as opposed to regexp

|> If it were a regexp, I'd just do a
|>   while(<FILE>) {/$searchstring/ && $found=1;}
|> kind of thing, but that wont' work, because the user might
|> unwittingly write a regexp when he means not to.

while(<FILE>) {
    $found=1, $last if index($_, $searchstring) >= 0;

Quote:
}

        Jeffrey
----------------------------------------------------------------------------

See my Jap/Eng dictionary at http://www.wg.omron.co.jp/cgi-bin/j-e


Fri, 28 Nov 1997 03:00:00 GMT  
 searching for arbitrary literal string as opposed to regexp

Quote:
>I am writing a cgi script in perl, which will basically take a search
>string and search a file to see if that string exists within it, and tell
>you yes or no.  [It actually does something a bit more complex, but this
>is the crucial part for my question].
>This search string _isn't_ a regexp, however.  It's just straight text
>that may or may not exist within a certain document.  If it were a regexp,
>I'd just do a while(<FILE>) {/$searchstring/ && $found=1;} kind of thing,
>but that wont' work, because the user might unwittingly write a regexp
>when he means not to.  For instance, he might be searching for a _literal_
>period, not using a period as a wildcard as the above construction would.
>So what's the solution?  What do I do?

        $searchstring =~ s/\W/\\$&/g;       # defuse any magic characters

        while(<FILE>) {
                if(/$searchstring/o) {
                        $found = 1;
                        last;
                }
        }

Or, if you insist on cramming everything on a single line:

        /$searchstring/o && ($found = 1) && close FILE while <FILE>;

You can't "last" out of a postfix-while loop, but closing the file has
pretty much the same effect.

--
Hope this helps,

HansM



Sat, 29 Nov 1997 03:00:00 GMT  
 
 [ 6 post ] 

 Relevant Pages 

1. searching for arbitrary literal string as opposed to regexp

2. Optimization regexp/list for string search

3. regexp: resetting search position to start of string

4. Regexp literal *

5. control-backslash in a literal string

6. How to convert a literal string to regex

7. Trying to parse/match a C string literal

8. string literal backslash substitution

9. Literal string evaluation question

10. control-backslash in a literal string

11. Literal vs interpreted strings

12. Literal String and Scalar Var

 

 
Powered by phpBB® Forum Software