Match "ab" in "abc", but not in "abd" 
Author Message
 Match "ab" in "abc", but not in "abd"

Is it possible to write a regex that will match the "ab" in "abc", but
not match "abd"?  /ab/ won't work because it matches the "ab" in
"abd", and /abc/ won't work because it matches the whole string of
"abc".  I know that if I had a space instead of a "c", I could use
something like /ab\>/ (in gawk).  Is there any general case of this?
TIA.

--David

Ignore this next line:
abbcccdddd



Mon, 29 Dec 2003 05:50:37 GMT  
 Match "ab" in "abc", but not in "abd"

Quote:

>Is it possible to write a regex that will match the "ab" in "abc", but
>not match "abd"?  /ab/ won't work because it matches the "ab" in
>"abd", and /abc/ won't work because it matches the whole string of
>"abc".  I know that if I had a space instead of a "c", I could use
>something like /ab\>/ (in gawk).  Is there any general case of this?
>TIA.

The short answer is "No, (g)awk doesn't have context reg exps, like lex does."

However, you can usually work around this in AWK; what problem are we
trying to solve?



Mon, 29 Dec 2003 06:02:42 GMT  
 Match "ab" in "abc", but not in "abd"

Quote:

> Is it possible to write a regex that will match the "ab" in "abc", but
> not match "abd"?

Yes, but it requires support of lookahead assertions. Perl has
them, awk doesn't, so you'll have to use Perl to get them:

    /ab(?!d)/ # match ab only when NOT immediately followed by d

In awk, you'll have to use a negated character class and alternation:

    /ab([^d]|$)/ # match ab only when immediately followed by
                 # something other than d or nothing at all
                 # (i.e., the end of the string)

There's an important functional distinction between these two
regular expression patterns: the lookahead assertion does not
consume the text it matches, whereas the negated character class
does. This matters particular when the text you want to match (in
this case, ab) is being captured using subexpression memory, or
when globally matching possibly overlapping substrings.

Quote:
> /ab/ won't work because it matches the "ab" in
> "abd", and /abc/ won't work because it matches the whole string of
> "abc". I know that if I had a space instead of a "c", I could use
> something like /ab\>/ (in gawk).

And also in MKS awk. But you're right: word boundary anchors are
not the same as lookaround assertions. They address two different
problem domains.

Quote:
> Is there any general case of this?

Yes. Lookahead assertions are more general than the ...([^x]|$)
trick as they can use arbitrary regular expression patterns, not
just single characters, and they can be used to match (or not match)
text anywhere in the string, not just at the end of the string.

Quote:
> Ignore this next line:
> abbcccdddd

What next line? ;-)

--
Jim Monty

Tempe, Arizona USA



Mon, 29 Dec 2003 07:01:06 GMT  
 Match "ab" in "abc", but not in "abd"

Quote:

>Is it possible to write a regex that will match the "ab" in "abc", but
>not match "abd"?  /ab/ won't work because it matches the "ab" in
>"abd", and /abc/ won't work because it matches the whole string of
>"abc".  I know that if I had a space instead of a "c", I could use
>something like /ab\>/ (in gawk).  Is there any general case of this?

Not general, but (substr($0, match($0, /ab/) + RLENGTH, 1) != "d" && RSTART)
would identify such substrings, and RSTART and RLENGTH would be set so you
could use substr if your goal were replacing the substring.


Mon, 29 Dec 2003 08:00:19 GMT  
 Match "ab" in "abc", but not in "abd"


Quote:

>> Is it possible to write a regex that will match the "ab" in "abc", but
>> not match "abd"?

>Yes, but it requires support of lookahead assertions. Perl has
>them, awk doesn't, so you'll have to use Perl to get them:

And is thus completely O/T for this newsgroup, but then again, that never
bothered you, did it?

Maybe I *should* post the lex solution - and then I'll post one in PDP11
assembler, as well...



Mon, 29 Dec 2003 08:08:22 GMT  
 Match "ab" in "abc", but not in "abd"
Thanks a lot for all of your help, guys (I think you were all guys,
but I'm too lazy to go back and check your names).  I was trying to
write a regex that I could put in RS so that every record would be "",
but RT would contain a single C token (so that I could add
highlighting escapes for printing).  However, now that I've seen how
difficult it would be and I've learned that RT is not POSIX compliant,
I think I'll just use one-line records and parse the lines manually.
Maybe \> will come in handy, or maybe I'll write it portably.  If any
of you would like a copy of the finished script (it will be meant to
prep C source for printing by enscript(1)), just drop me an email (my
real address is in the header. I've got to fix that.)  Finally, as for
the last line in my original post: I was afraid that after a few days,
it would take me hours to find my thread, so I intended to use abbccc
(or whatever it was) as a search string.


Tue, 30 Dec 2003 03:12:22 GMT  
 Match "ab" in "abc", but not in "abd"



Quote:
> Is it possible to write a regex that will match the "ab" in "abc", but
> not match "abd"?  /ab/ won't work because it matches the "ab" in
> "abd", and /abc/ won't work because it matches the whole string of
> "abc".  I know that if I had a space instead of a "c", I could use
> something like /ab\>/ (in gawk).  Is there any general case of this?
> TIA.

> --David

> Ignore this next line:
> abbcccdddd

This works on GNA awk 3.0.3. in a dosbox under win95:
awk "/ab/ && !/abd/{ print }"

with an inputfile containing:
ab
abc
abd
abe
it outputs:
ab
abc
abe

Greetings,
Luuk



Tue, 30 Dec 2003 03:59:05 GMT  
 Match "ab" in "abc", but not in "abd"

Quote:

>Is it possible to write a regex that will match the "ab" in "abc", but
>not match "abd"?  /ab/ won't work because it matches the "ab" in
>"abd", and /abc/ won't work because it matches the whole string of
>"abc".  I know that if I had a space instead of a "c", I could use
>something like /ab\>/ (in gawk).  Is there any general case of this?

With a 'gawk' extension like 'gensub'?

awk '{$0=gensub(/ab([^d])/,"xy\\1","g",$0);print}' sourcefile

Read the fine manual (rtfm) about 'gensub'.  :-)

 -Falk



Tue, 30 Dec 2003 20:14:00 GMT  
 Match "ab" in "abc", but not in "abd"
I don't intend to use gensub(), since I like to write things portably.
 The perl solution was what I was really looking for, but part of the
reason I'm doing this program is to get some practice with AWK.  I
think I'll use some combination of Harlan and Luuk's solutions.  C
syntax is pretty tricky, anyway, so it will probably easier to use
some extra code than a magnificent 300 character regex ;-).  I'm still
accepting requests for the final program via e-mail (haven't gotten
any yet :-).  Perhaps I could tempt you by telling you that my next
excercise will be to adapt the script to print AWK :>.  Thanks a lot
for you're suggestions.


Sat, 03 Jan 2004 03:56:58 GMT  
 Match "ab" in "abc", but not in "abd"


Quote:
>I don't intend to use gensub(), since I like to write things portably.
> The perl solution was what I was really looking for, but part of the
>reason I'm doing this program is to get some practice with AWK.  I
>think I'll use some combination of Harlan and Luuk's solutions.  C
>syntax is pretty tricky, anyway, so it will probably easier to use
>some extra code than a magnificent 300 character regex ;-).  I'm still
>accepting requests for the final program via e-mail (haven't gotten
>any yet :-).  Perhaps I could tempt you by telling you that my next
>excercise will be to adapt the script to print AWK :>.  Thanks a lot
>for you're suggestions.

Yet another variant:-

!/abd/ && match($0,/ab/) {found=substr($0,RSTART,RLENGTH); print found}

or more generally:-

BEGIN {
  ilike="ab"
  butnot="d"

Quote:
}

$0!~(ilike butnot) && match($0,ilike) {
  found=substr($0,RSTART,RLENGTH)
  print "found=\"" found "\""

Quote:
}

$0~(ilike butnot) {
  print "discarded \"" $0 "\""

Quote:
}

Here's a test run.

sh-2.04$ awk -f butnot.awk
ab
found="ab"
aaabbb
found="ab"
aaabcdef
found="ab"
aaabdefg
discarded "aaabdefg"

Hope this helps
--
Alan Linton



Sat, 03 Jan 2004 05:07:31 GMT  
 Match "ab" in "abc", but not in "abd"

Quote:
> !/abd/ && match($0,/ab/) {found=substr($0,RSTART,RLENGTH); print found}

Well, it doesn't help much because I've totally redesigned my program,
but thanks for the suggestion.  However, even though the question is
completely academic (isn't that the motto of Usenet?), what if you ran
this script with the input record 'abceabd'?  I think it would discard
it.  Whether or not this poses a problem depends on the specific
application of this code (i.e.: no longer my problem).  I hope I
haven't come across as rude in this posting; it certainly wasn't my
intention.  Thank you all for your help.


Mon, 05 Jan 2004 03:22:00 GMT  
 
 [ 11 post ] 

 Relevant Pages 

1. string.join(["Tk 4.2p2", "Python 1.4", "Win32", "free"], "for")

2. BEGIN{want[]={"s1o", "s2o", "s2q", "s3q"}

3. Parsing ""D""?

4. "Fifth", "Forth", zai nar?

5. Ruby "finalize", "__del__"

6. beginners "let"/"random" question

7. ANNOUNCE: new "plus"- and "dash"-patches available for Tcl7.5a2/Tk4.1a2

8. Looking for "stdin", "stdout"

9. Hi, this code: text0 = "One $BLAH Three" text1 = "One @BLAH Three" text0.sub!("$BLAH", "Two") text1.sub!("@BLAH", "Two") print text0,"\n" print text1,"\n" produces thiHi, this code: text0 = "One $BLAH Three" text1 = "One @BLAH Three" text0.sub!("$BLAH", "T

10. replace string AFTER "size","initial", "next"

11. Lack of "D Tests", "E Tests", and "L Tests" for Generics

12. "?:", "a and b or c" or "iif"

 

 
Powered by phpBB® Forum Software