negating a string in a regexp 
Author Message
 negating a string in a regexp

A potentially stupid question, but I have tried to find out for myself
and not met with any success, ie the faq, asking people etc. Maybe I
just missed it...  but how does one negate a string in a regexp?

For example,

while (<>) {

        s/([^DT]+)/{$1}/g;                      
        print;
        }

will put curly brackets around the longest string that doesn't contain
a D or a T.

so 'foo D bar DT bar T foo'

becomes

'{foo }D{ bar }DT{ bar }T{ foo}'

But what I want is a string where 'DT' is negated in the match,
resulting in:

'{foo D bar }DT{ bar T foo}'

The substitution should look something like

        s/([^?DT?]+)/{$1}/g;    

where the '?' are some delimiters that tell the regexp matcher that I am
wanting the string "DT" negated, not "D" or "T".

I humble myself in your collective wisdom.

breck          



Tue, 26 Dec 1995 07:23:38 GMT  
 negating a string in a regexp

Quote:

>A potentially stupid question, but I have tried to find out for myself
>and not met with any success, ie the faq, asking people etc. Maybe I
>just missed it...  but how does one negate a string in a regexp?

sounds like a good question to me.

Quote:

>For example,

[code deleted]

Quote:

>will put curly brackets around the longest string that doesn't contain
>a D or a T.

>so 'foo D bar DT bar T foo'

>becomes

>'{foo }D{ bar }DT{ bar }T{ foo}'

>But what I want is a string where 'DT' is negated in the match,
>resulting in:

>'{foo D bar }DT{ bar T foo}'

>The substitution should look something like

>    s/([^?DT?]+)/{$1}/g;    

>where the '?' are some delimiters that tell the regexp matcher that I am
>wanting the string "DT" negated, not "D" or "T".

This is not exactly what you were looking for, but it's a one-liner
so is almost as good  :)  the following code:

        $test = 'foo D bar DT bar T foo';
        ($test = "{$test}") =~ s/DT/}DT{/g;  # <= the one-liner
        print "result: $test\n";

gives me:

        result: {foo D bar }DT{ bar T foo}

basically the one-liner puts the outermost {} around the string and then
does a substitute of every DT in the string...  sort of the inverse approach
to what you were looking for, but i hope this helps...  odd things will happen
if the string starts or ends with DT...  if you think that might happen,
and don't want strings coming out like '{}DT{ foo bar}', you might want
to through in an extra substitute or two to chop of the offending {}.
i could give another one-liner involving split and join but i think
that has the same problem.

Tom Wylie       | What is the difference between apathy and ignorance?



Tue, 26 Dec 1995 08:50:55 GMT  
 negating a string in a regexp

Quote:
(Breck Baldwin) writes:

|
| while (<>) {
|      
|       s/([^DT]+)/{$1}/g;                      
|       print;
|       }
|
|
| will put curly brackets around the longest string that doesn't contain
| a D or a T.
|
| so 'foo D bar DT bar T foo'
|
| becomes
|
| '{foo }D{ bar }DT{ bar }T{ foo}'
|
| But what I want is a string where 'DT' is negated in the match,
| resulting in:
|
| '{foo D bar }DT{ bar T foo}'

while (<>) {
    chop;
    s/DT/#/g;
    s/([^#]+)/{$1}/g;
    s/#/DT/g;
    print $_, "\n";

Quote:
}

will do the job, provided that '#' does not occur in your input.
(Otherwise use some other character.  The safest one is "\n",
which is guaranteed not to occur in the *middle* of a line.)

--
Uwe Waldmann, Max-Planck-Institut fuer Informatik
Im Stadtwald, D-66123 Saarbruecken, Germany



Tue, 26 Dec 1995 19:58:04 GMT  
 negating a string in a regexp

Quote:

>A potentially stupid question, but I have tried to find out for myself
>and not met with any success, ie the faq, asking people etc. Maybe I
>just missed it...  but
>how does one negate a string in a regexp?

That's a very difficult exercise, in general.  Here's a s///g statement that
works for your particular example:

    s/([^DT]T?|D[^T])+/{$&}/g;

But that approach is not very intuitive.  A better (more maintainable)
approach might be to do this:

    undef $a;
    $a .= "{$`}$&", $_ = $' while /DT/;
    $_ = "$a{$_}";

As you can see, you could pick a different /DT/ pattern easily.  (Note: the
repeated /DT/ match is probably not very expensive because after each
iteration we remove the beginning of $_.)

Here are some sample runs:

    foo D bar DT bar T foo -> {foo D bar }DT{ bar T foo}
    aDTbDTcDT -> {a}DT{b}DT{c}DT{}
    foo DDDDDDTTTTTT bar -> {foo DDDDD}DT{TTTTT bar}
    DT -> {}DT{}
    DDT -> {D}DT{}
    DTDT -> {}DT{}DT{}
    DDDDDD -> {DDDDDD}
    DTD -> {}DT{D}

Michael.
--
"We are floating in a medium of vast extent, always drifting uncertainly,
 blown to and fro; whenever we think we have a fixed point to which we can
 cling and make fast, it shifts and leaves us behind; if we follow it, it
 eludes our grasp, slips away, and flees eternally before us.  Nothing stands
 still for us.  This is our natural state and yet the state most contrary to
 our inclincations.  We burn with desire to find a firm footing, an ultimate,
 lasting base on which to build a tower rising up to infinity, but our whole
 foundation cracks and the earth opens..."  -- {*filter*}ia Woolf



Tue, 26 Dec 1995 22:45:28 GMT  
 negating a string in a regexp

   | But what I want is a string where 'DT' is negated in the match,
   | resulting in:
   |
   | '{foo D bar }DT{ bar T foo}'

A different approach would be:



or, as a one-liner:

$_ = join('DT', grep((/[DT]/) && ($_="{$_}"), split(/DT/, $_)));
--

"When the only tool you have is Perl, the whole | "Hooray for snakes!"
 world begins to look like your oyster." -- Me  |  -- The Simpsons (29 Apr 93)



Sat, 30 Dec 1995 00:08:15 GMT  
 
 [ 5 post ] 

 Relevant Pages 

1. regexp to negate 2 chars at once

2. negating qr// regexp

3. negated compiled regexp

4. substituting an exact negated string: can I?

5. Regular Expressions: Negating bracketed strings

6. Extracting strings from postscript in a single regexp?

7. Regexp multiple string matches

8. Using a string for a regexp pattern

9. The regexp [^string]

10. How to extract the toplevel-domain of a domainame from a string with regexp

11. searching for arbitrary literal string as opposed to regexp

12. Regexp to match a C-style string

 

 
Powered by phpBB® Forum Software