Regexp to match a C-style string 
Author Message
 Regexp to match a C-style string

Hi all

I need to create a text file format which holds strings.  The strings can
legally contain any ASCII character, so they must be encoded in some way.
C-style would be most convenient.  Some example records might be:

  ITEM foo="a string" bar="a string containing \"quotes\""
  ITEM baz="a string ending with a slosh: \\"
  ITEM qux="a string ending with a slosh and a quote: \\\""

Anyone know of a regexp that can find the limits of a string, and won't be
tripped up by combinations of escaped quotes and escaped sloshes?

My current effort is:
    /".*?(?<!\\)(\\\\)*"/
The idea is to find the nearest quote which is preceded by an even number
of sloshes (including 0).  Nice in theory, but unfortunately, it doesn't
seem to work.

Cheers

- rog



Mon, 18 Apr 2005 09:28:33 GMT  
 Regexp to match a C-style string
[posted & mailed]


Quote:
>I need to create a text file format which holds strings.  The strings can
>legally contain any ASCII character, so they must be encoded in some way.
>C-style would be most convenient.  Some example records might be:

You can find the regex in the FAQ:

  perldoc -q comment

shows a regex with this chunk in it:

  "(\\.|[^"\\])*"

You can also write that as

  "[^"\\]*(?:\\.[^"\\]*)*"

--
Jeff "japhy" Pinyan      RPI Acacia Brother #734      2002 Acacia Senior Dean
"And I vos head of Gestapo for ten     | Michael Palin (as Heinrich Bimmler)
 years.  Ah!  Five years!  Nein!  No!  | in: The North Minehead Bye-Election
 Oh.  Was NOT head of Gestapo AT ALL!" | (Monty Python's Flying Circus)



Mon, 18 Apr 2005 15:07:16 GMT  
 Regexp to match a C-style string
On 31 Oct 2002 00:28:33 -0800, Roger Sh{*filter*}

Quote:

> I need to create a text file format which holds strings.  The strings can
> legally contain any ASCII character, so they must be encoded in some way.
> C-style would be most convenient.  Some example records might be:

>   ITEM foo="a string" bar="a string containing \"quotes\""
>   ITEM baz="a string ending with a slosh: \\"
>   ITEM qux="a string ending with a slosh and a quote: \\\""

> Anyone know of a regexp that can find the limits of a string, and won't be
> tripped up by combinations of escaped quotes and escaped sloshes?

This is a FAQ.  

See perldoc -q delimit

    "How can I split a [character] delimited string except when inside
    [character]? (Comma-separated files)"

--
Garry Williams



Mon, 18 Apr 2005 15:20:37 GMT  
 
 [ 3 post ] 

 Relevant Pages 

1. Regexp multiple string matches

2. Tricky: Generate matching string from regexp

3. Regexp fails matches @ string start

4. REGEXP: Problem matching string containing parens or brackets

5. regexp: matching at least n chars out of a string of length m

6. regexp to return list of matches from string

7. decomposing an outline-style text body (ugly regexp question)

8. agrep-style fuzzy match in perl5?

9. Perl style pattern matching in C++

10. String matching with matching delimiters

11. Pattern Match - substitute a string after the match

12. Palindrome string matching (general match) ??

 

 
Powered by phpBB® Forum Software