matching word boundaries with regexps 
Author Message
 matching word boundaries with regexps

It seems that matching word boundaries with \b in regexps doesn't work
properly.
I get random results. The following script should illustrate the problem.
Feed it with lines of the form: vote yes (or: vote no)
Or an I doing something wrong???

----------------------------------------------------------------
#! /usr/bin/perl

$yes = 0;
$no = 0;
$vote = 0;

while (<STDIN>) {
        chop;
        print "=$_=\n";
        if (/vote/i || /comp.text.tex/i) {
            $vote ++;
            $yes ++ if /\byes\b/i;
            $no ++ if /\bno\b/i;
        }
         print "vote = $vote, yes = $yes, no = $no\n";

Quote:
}

----------------------------------------------------------------
Piet* van Oostrum, Dept of Computer Science, Utrecht University,
Padualaan 14, P.O. Box 80.089, 3508 TB Utrecht, The Netherlands.
Telephone: +31-30-531806   Uucp:   uunet!mcsun!hp4nl!ruuinf!piet



Fri, 03 Jul 1992 18:57:37 GMT  
 matching word boundaries with regexps

: It seems that matching word boundaries with \b in regexps doesn't work
: properly.
: I get random results. The following script should illustrate the problem.
: Feed it with lines of the form: vote yes (or: vote no)
: Or an I doing something wrong???

It's a bug.  I fixed it a week or two ago here, and it will be fixed in
patch 9.  There was a bad interaction between the code that handles
case insensitivity and the code that checks for \b-ness at the beginning
of a string.  I tried your test program and it works under the new version.

Soon.

Larry



Sat, 04 Jul 1992 02:57:12 GMT  
 matching word boundaries with regexps

Quote:
> It seems that matching word boundaries with \b in regexps doesn't work
> properly.

[example deleted]

Knowing Piet* is running perl on HP-UX, I tried a little, and found
out that:
 - on VAX/Ultrix it behaves like expected
 - on HP-UX it fails.
 - it runs fine on HP-UX if the 'ignore case' spec is removed from the
   matches:
        $yes++ if /\byes\b/;
   So I think it has something to do with HP's NLS system (a wild
   guess, but - who knows?)

Johan
--

Multihouse Automatisering bv                   uucp: ..!{uunet,hp4nl}!mh.nl!jv
Doesburgweg 7, 2803 PL Gouda, The Netherlands  phone/fax: +31 1820 62944/62500
------------------------ "Arms are made for hugging" -------------------------



Sat, 04 Jul 1992 13:05:01 GMT  
 matching word boundaries with regexps

:> It seems that matching word boundaries with \b in regexps doesn't work
:> properly.
:[example deleted]
:
:Knowing Piet* is running perl on HP-UX, I tried a little, and found
:out that:
: - on VAX/Ultrix it behaves like expected
: - on HP-UX it fails.
: - it runs fine on HP-UX if the 'ignore case' spec is removed from the
:   matches:
:       $yes++ if /\byes\b/;
:   So I think it has something to do with HP's NLS system (a wild
:   guess, but - who knows?)
:
:Johan

Well, I tried the example on our HP 835 running HP-UX 7.0 and it worked
just fine. (O.B.S. This is not to be taken as a defence of HP-UX. 7.0 was
installed last Friday, and the bugs keeps coming in :-( )

Probably there is something more subtle to the error.
I agree that Johans guess is a wild one, did you forget to put out
a :-) somewhere, Johan ?

        Jan D.



Mon, 06 Jul 1992 15:32:11 GMT  
 
 [ 4 post ] 

 Relevant Pages 

1. word boundaries (\b) in pattern matching

2. Pattern match of no character, any character, or space (word boundary)

3. word boundaries in matches

4. Using boundaries within regexps

5. help wanted: case conversion off word boundary

6. Word-boundary \b changed between 4.019 and 4.036?

7. Word Boundaries

8. Chucking up text on word boundaries?

9. Regex: optional word boundary

10. What's a Word Boundary?

11. word boundary example

12. regex word boundary question

 

 
Powered by phpBB® Forum Software