substituting for matches not inside quotes? 
Author Message
 substituting for matches not inside quotes?

I am stuck trying to figure out some way in Perl to substitute for a
pattern in a string only when that pattern is not inside quotes.  One
can always result to doing it with single character parsing, but it
seem to me there should be an easier way.  I mainly want this to
remove the parenthetical comments in mail headers.  For instance, I
would like to transform

        Content-Type: message/external-body; access-type=local-file;
                (a comment) name="/tmp/somefile(withparens)"

to

        Content-Type: message/external-body; access-type=local-file;
                name="/tmp/somefile(withparens)"

--
_________________________________________________________________________

Stanford Linear Accelerator    End Station A           E143 Collaboration
http://www.*-*-*.com/ ~raines/index.html  PGP public key by finger



Tue, 11 Feb 1997 02:00:14 GMT  
 substituting for matches not inside quotes?

|> I am stuck trying to figure out some way in Perl to substitute for a
|> pattern in a string only when that pattern is not inside quotes.

|> For instance, I would like to transform
|>                 (a comment) name="/tmp/somefile(withparens)"
|> to
|>                 name="/tmp/somefile(withparens)"

I can give some pointers to get you started. I'll take the easy road and
assume that everything is self-contained on a line... i.e, that I don't
have to worry about quoted string being broken across lines.

With these kinds of things, one has really very little option but to parse
the line from start to end (here, using a regex to do the parsing).
Otherwise, you'll never know if you're inside a quoted string or not.

But it can be done more or less the way we naturally parse ourselves.

Since we're going to parse from the beginning, our regex would conceptually
be something like:

  1) skip stuff we're not interested in.
  2) pinpoint what we are interested in.

Now, breaking these down:

  1 ) skip stuff we're not interested in:
  1a     skip strings in double quotes
  1b     skip "other" text that's not what we're interested in.
  2 ) pinpoint what we are interested in:
  2a     stuff between parens.

---------

1a) skip strings in double quotes:
        "[^"]*"
    (note that the quotes above *are* part of the regex)

1b) skip other text we're not interested in. Well, don't count a double
    quote, since that's taken care of in 1a and hence not "other".
    Also, we'll not count an open paren since that's what we're interested
    in in 2a. Therefore, we'll want not-a-quote-and-not-an-open-paren:
        [^"(]

Let's put these into strings for easy use later:
    $double = qq/"[^"]*"/;
    $other  = qq/[^"(]/;

So, we'll want to skip text so long as it's either of the above:
    $skip = qq/($double|$other)*/;

What about what we'll want to pinpoint ("2a" above):
        \([^)]*\)
that's an open paren, followed by stuff that's not a close paren,
followed by a close paren.

Let's put this into a variable too:
    $interest  = q/\\([^)]*\\)/;

(the double \ is needed because it's going into a string and not into the
 regex directly).

Now, let's make a leap and throw this all together:

    $double = qq/"[^"]*"/;
    $other  = qq/[^"(]/;
    $interest  = q/\\([^)]*\\)/;
    $skip = qq/($double|$other)*/;

    s/^($skip)$interest/\1/o;     ### do it!

Here, we'll skip some stuff (remembering it in $1), then pinpoint what was
of interest. Since our goal is to eliminate the stuff of interest, we'll
replace both with $1, thereby removing what was matched but not in the first
set of parens [the stuff of interest].

Since the line may have more than one thing to be zapped, we'll keep doing
this until nothing is eliminated:

    1 while s/^($skip)$interest/\1/o;
    ^^^^^^^

Now, *each* time it goes to do the s///, it'll have to start parsing from
the beginning of the line. It'd be nice if we could pick up from where we
left off (sort of like /g might do, but that won't be of help since we
need to ancher the regex with the leading ^). I think perl5 has something
that would be of use here.

But we can't do it. If the string is sort, it likely doesn't matter all too
much. But if it's long, we'd probably want to make some createive use of $'.

Now that that's all done, let's have some fun and deal with:

   *) double-quoted strings
   *) single-quotes strings
   *) stuff that might be \escaped

One thing to remember. If we want to match a literal ``\'' in a regex,
we need to escape it: ``\\''. But if we're putting this into a string,
we need to escape each escape: ``\\\\''. Makes reading the following
a bit difficult.

   $escape = '\\\\.';                    ## an escaped character
   $double = qq/"($escape|[^"\\\\])*"/;  ## a double-quoted string
   $single = qq/'($escape|[^'\\\\])*'/;  ## a single-quoted string
   $other  = qq/[^\\\\"'(]/;             ## other stuff

   $skip = "($escape|$double|$other|$single)*";

   $eat  = qq/\\(($escape|[^)\\\\])*\\)/; ## stuff between parens

   1 while s/^($skip)$eat/\1/o;

Have fun.
        *jeffrey*
-------------------------------------------------------------------------

See my Jap/Eng dictionary at http://www.omron.co.jp/cgi-bin/j-e
                          or http://www.cs.cmu.edu:8001/cgi-bin/j-e



Tue, 11 Feb 1997 15:48:09 GMT  
 substituting for matches not inside quotes?

Quote:

> I am stuck trying to figure out some way in Perl to substitute for a
> pattern in a string only when that pattern is not inside quotes.  One
> can always result to doing it with single character parsing, but it
> seem to me there should be an easier way.  I mainly want this to
> remove the parenthetical comments in mail headers.  For instance, I
> would like to transform

>    Content-Type: message/external-body; access-type=local-file;
>            (a comment) name="/tmp/somefile(withparens)"

> to

>    Content-Type: message/external-body; access-type=local-file;
>            name="/tmp/somefile(withparens)"

sub transl {
    # Remove parentesized substrings.
  local($a) = $_[0];
  $a =~ s/\([^\(]*\)//g;
  return $a;

Quote:
}

  # Temporarily add a double quote at the beginning and end of the string.
$_ = '"' . $_ . '"';
  # Everything that was inside quotes is now outside quotes (and vice versa).
  # Now apply "transl" to every quoted substring.
s/(\"[^"]*\")/&transl($1)/ge;
  # Remove the additional quotes at the beginning and end.
s/^"//;
s/"$//;

--
Uwe Waldmann, Max-Planck-Institut fuer Informatik
Im Stadtwald, D-66123 Saarbruecken, Germany



Tue, 11 Feb 1997 19:49:07 GMT  
 
 [ 3 post ] 

 Relevant Pages 

1. Avoiding split matching inside quotes?

2. Help matching any word not inside parentheses

3. Pattern Match - substitute a string after the match

4. matching matching quotes - efficiency wanted

5. split and substitute, substitute, substitute

6. Comma delimited fields with line returns inside a double quote

7. $x{$y} inside double quotes in print problem

8. Function call inside Quotes?

9. Comma delimited fields with line returns inside a double quote

10. expanding variable inside double quotes

11. split on comma except when embedded inside quotes

12. newbie: Quote character inside a literal srting?

 

 
Powered by phpBB® Forum Software