regular expression: matching ( ) 
Author Message
 regular expression: matching ( )

Hallo!

I am trying to separate expressions in an argument-string, so that I can
calculate them one by one. Arguments can look like

VSUM({1;2;3},{sin(1.57);5.3;~v.z})

with nested () and {}. The string does not contain spaces, separators
are "," and ";"

What I need is the "{1;2;3},{sin(1.57);5.3;~v.z}",
and later the "{1;2;3}", etc.

How can I get a part within _matching_ () or {} using regular
expressions?
Or do I have to scan the string char by char, counting ( and )?

Karsten

--
Karsten Busch



Tue, 23 Mar 1999 03:00:00 GMT  
 regular expression: matching ( )

A regular expression cannot represent balanced parentheses.
Regular expressions are recognized by finite state machines.
While scanning a string of mixed ( and ) that might be balanced,
more than finitely many states can be reached:

       0 : currently balanced
       1 : 1 unmatched (
       2 : 2 unmatched (
       ...
       n : n unmatched (
       ...

In practice, your problem probably doesn't have arbitrarily deep
nesting and could be recognized by an RE.  However such re's get
messy enough that it is probably easier to scan with a counter.

--
Mike Brennan



Wed, 24 Mar 1999 03:00:00 GMT  
 regular expression: matching ( )

Recently I experienced that I have to write "\(" to match a "(". This
is inconsistent in itself, because "\(" is used for grouping regexps.

My awk was supplied with HP-UX 10.10...

--



Sat, 27 Mar 1999 03:00:00 GMT  
 regular expression: matching ( )

   Recently I experienced that I have to write "\(" to match a "(". This
   is inconsistent in itself, because "\(" is used for grouping regexps.

   My awk was supplied with HP-UX 10.10...

   --

I don't think that you are right. The parentheses are used for
grouping, whereas you have to 'escape' them when you want to match
them literally (cf. Aho/Kernighan/Weinberger: 'The AWK Programming
Language', Addison-Wesley, pp.28).

Best,

Max

--
----------------------------------------------------------------------

Multilingual Theory and Technology
Rank Xerox Research Centre                   Phone:    +33 76 61 50 86
6 chemin de Maupertuis                       Fax:      +33 76 61 50 99
F-38240 Meylan France                        URL:  http://www.xerox.fr
----------------------------------------------------------------------



Sat, 27 Mar 1999 03:00:00 GMT  
 regular expression: matching ( )


Quote:


>>   Recently I experienced that I have to write "\(" to match a "(". This
>>   is inconsistent in itself, because "\(" is used for grouping regexps.

>I don't think that you are right. The parentheses are used for
>grouping, whereas you have to 'escape' them when you want to match
>them literally (cf. Aho/Kernighan/Weinberger: 'The AWK Programming
>Language', Addison-Wesley, pp.28).

Some regular expression syntaxes are more regular than others.

What it boils down to is that sed and ed use the \(...\) construct for
grouping, which seems utterly perverse because it seems like if nothing else
in this world is true, it ought to be a the case that \c should be a literal
'c' for all values of c.  Unfortunately, it all goes back to C style strings
where \n means a newlines, etc.  Worse, SYS V style echo uses the C-style
notation for special characters and stuff (e.g., echo "Enter your name: \c")

AWK actually does things sensibly, such that () by themselves are special,
and you backslash them to get the literal meanings.  Unfortunately, if you
learned sed first, you're in trouble.

BTW, Thompson AWK (DOS, OS/2, NT, and soon, Solaris) allows you to
backreference things found within parens (a la sed/ed, via the \1 notation).
Does anyone know of a Unix-based AWK that does likewise?

(I'm already hunkering down for the inevitable flames from people pointing
out that I'm a communist for wanting this in AWK - that its not in the spec
for AWK - that I should use PERL, etc, etc, etc.  [like what happened when I
suggested that array sorting was a good idea...])

************************************************************************
Smiley captioned version available for the humor-impaired.
Email for price and availability ;-)


          hundreds, if not thousands, of dollars, every time he posts -
************************************************************************
rwvpf wpnrrj ibf ijrfer



Sat, 27 Mar 1999 03:00:00 GMT  
 regular expression: matching ( )

Quote:
> Recently I experienced that I have to write "\(" to match a "(".

Yep.

Quote:
> [...]   "\(" is used for grouping regexps.

Nope.  "(" groups regexps.  awk uses egrep-style regexeps.

[For the "\1" sidetrack:  Yes, I would be a nice thing to have, and
it would not run counter awk's philosophy / implementation strategy
to provide it.  We already have the special case "&" so let's have the rest,
too.  I just did some blacksmithing on free-format dates today and
am tired of "match()...substr(...RSTART,RLENGTH)" combos.]

                                                        Martin Neitzel
--

Unix, Networking, Internet-Services / Xlink-POP           "Alles wird gut."



Mon, 29 Mar 1999 03:00:00 GMT  
 regular expression: matching ( )

Quote:
>BTW, Thompson AWK (DOS, OS/2, NT, and soon, Solaris) allows you to
>backreference things found within parens (a la sed/ed, via the \1 notation).
>Does anyone know of a Unix-based AWK that does likewise?

Check out the gensub function in gawk 3.0.0, which allows getting at
the matches for subexpressions. Note that

        /(abc)(edf)\1/

in a regex makes no sence in gawk to mean /abcedfabc/, since \1 is the
octal escape for control-a.
--
Arnold Robbins -- guest account at Emory Math/CS        | Laundry increases

                                                        | number of children.
                                                        | -- Miriam A. Robbins



Sat, 03 Apr 1999 03:00:00 GMT  
 
 [ 7 post ] 

 Relevant Pages 

1. regular expression matching in J ? (or APL)

2. Regular expression matching with Halstenbach's REGEXP

3. Binding style and the universality of REs (was: Regular Expression Matching)

4. regular expression matching

5. Regular Expression matching...

6. Regular Expressions: Match any character including newline and white space

7. Regular expression matching performance

8. Expect regular expression matching

9. regular expression match question

10. Expect's Interact and regular expression matching

11. Regular Expression matching question.

12. "Invert" regular expression matching

 

 
Powered by phpBB® Forum Software