multi-line substitution 
Author Message
 multi-line substitution

I am a relatively inexperienced Perl user who needs to pars a series
of files and remove multiple lines in the middle of the files.

I need to search for a particular start string and remove everything
between that and a particular end string.

I have tried everything I know, and can find in the book, and nothing
seems to make it past the first line where I find the start string.

thanks,

craig



Sat, 19 Sep 1998 03:00:00 GMT  
 multi-line substitution

comp.lang.perl.misc:

Quote:
> I am a relatively inexperienced Perl user who needs to pars a series
> of files and remove multiple lines in the middle of the files.
> I need to search for a particular start string and remove everything
> between that and a particular end string.
> I have tried everything I know, and can find in the book, and nothing
> seems to make it past the first line where I find the start string.

[ You didn't mention if you want to write out new files or change the
original, I write this as a simple parsing, you can use inplace
editing if changing the same file is what you're after. ]

When you're confused, start simple.  We can break this task up into
looping through the file, identifying a start line, ignoring some
number of lines, and finding an ending line.  Each of these are simple
tasks under Perl, the only challenge here is to put them together in
the right way.

We'll want to loop through the file:

  open(FILE,$file) || die "Can't open $file: $!\n";
  open(NEWFILE,"> $file.new") || die "Can't open newfile: $!\n";
  while(<FILE>) {
    print NEWFILE;
  }
  close(FILE);
  close(NEWFILE);

To identify the starting line for the part we should ignore, we can
use a regular expression.  I'll assume you've got the regexp in $start
(this could be a simple string if you aren't familiar with regular
expressions).

  while(<FILE>) {
    print "Found start line!\n" if /$start/o;
    print NEWFILE;
  }

Now that we know how to locate the starting string, we'll need to tell
perl to ignore the text it sees once we get that string.  First, we
can work things out the long way:  If we see the line that is the
start of the section to ignore, we'll want to set a variable so that
we know we are in the block to miss.  Otherwise, we want to print out
the line to the new file:

  if (/$start/o) {
    $ignore = 1;
  } else {
    print NEWFILE;
  }

While we are ignoring output, we'll need to search for the ending
pattern, and stop ignoring text when we see it:

  if (/$end/o) {
    $ignore = 0;
  }

Now all we need to do is to put it all together, with another if
statement controlling which one of these two statements is evaluated:

  while(<FILE>) {
    if ($ignore) {
      if (/$end/o) {
        $ignore = 0;
      }
    } else {
      if (/$start/o) {
        $ignore = 1;
      } else {
        print NEWFILE;
      }
    }
  }

This will do what you want, but its a little cumbersome to read.  By
munging with style a little we can get:

  while(<FILE>) {
    if ($ignore) {
      $ignore = 0 if /$end/o;
    } else {
      print NEWFILE unless /$start/o && $ignore = 1;
    }
  }

The first form is easier to read for someone who isn't used to Perl,
though.  You'll have to choose what fits your purpose and style best.
In place editing is a more elegant way to change the same file, take a
look at the -pi.new command line switches.

Regards, Robert

--

                            <URL:http://rseymour.com/>



Sat, 19 Sep 1998 03:00:00 GMT  
 multi-line substitution

Quote:

>   > I am a relatively inexperienced Perl user who needs to pars a series
>   > of files and remove multiple lines in the middle of the files.

>   > I need to search for a particular start string and remove everything
>   > between that and a particular end string.

>   This will do what you want, but its a little cumbersome to read.  By

>     while(<FILE>) {
>       if ($ignore) {
>     $ignore = 0 if /$end/o;
>       } else {
>     print NEWFILE unless /$start/o && $ignore = 1;
>       }
>     }

HOWEVER, you may be completely entranced by perl's ability to read in
a whole file into a string (I'm assuming rseymore's initial opening of
files etc AND that you have perl 5):

# $/ is the input line separator.  undef'ing it turns the whole file
# into a line of input
undef $/;
# read in the file
$file = <FILE>;            

# now get rid of everything between markers stored in $start and $end

$file =~ s/$start.*?$end//gs;

# The ? is necessary to make .* non-greedy.  Otherwise it will match
# everything between the first $start marker and the last $end marker,
# which is not what you want if there can be more than one set in a
# file (try it without the ? and see what happens).
# The 's' option says to treat $file as one line of input, which
# allows '.' to match newlines (see manpage for more details);

# print out the new file

print NEWFILE $file;

This approach is probably not appropriate if your input files are
very large.

Quote:
>   though.  You'll have to choose what fits your purpose and style best.

What he said.

You may also be interested in
http://www.perl.com/CPAN/doc/FMTEYEWTK/regexps.html

which is Tom Christiansen's treatsie on regular expressions worthy of
many hours study and meditation.

-j



Sat, 19 Sep 1998 03:00:00 GMT  
 multi-line substitution

: I am a relatively inexperienced Perl user who needs to pars a series
: of files and remove multiple lines in the middle of the files.

: I need to search for a particular start string and remove everything
                                    ^^^^^^^^^^^^
On a line by itself? Anywhere on a line? Bad specs get bad implementations.

: between that and a particular end string.

Your requirement specification above will leave the start/end strings
there. Did you want that?

Or did you mean to say:

...search for a particular start string and remove everything
between that and a particular end string (inclusive of the start/end
strings)

: I have tried everything I know, and can find in the book, and nothing
: seems to make it past the first line where I find the start string.

This oughtta be about 10-15 lines of code. Post it.

--
  Tad McClellan,      Logistics Specialist (IETMs and SGML guy)

  Interesting trivia: If you took all the sand in North Africa and spread
     it out... it would cover the Sahara desert.



Sun, 20 Sep 1998 03:00:00 GMT  
 multi-line substitution


 J> I am a relatively inexperienced Perl user who needs to pars a series
 J> of files and remove multiple lines in the middle of the files.
 J>
 J> I need to search for a particular start string and remove everything
 J> between that and a particular end string.
 J>
 J> I have tried everything I know, and can find in the book, and nothing
 J> seems to make it past the first line where I find the start string.
 J>

Presuming you want to remove entire lines and there's nothing
bizarre such as the end string occurring before the start string:

To keep start,end lines and toss everything between:

  perl -ni.bak -e '/end/ && ($d=0);print if !$d;/start/ && ($d=1)'file(s)

If you can get away with tossing start,end lines:

  perl -ni.bak -e 'print unless /start/../end/' file(s)

HTH,

-
Charles DeRykus



Sun, 20 Sep 1998 03:00:00 GMT  
 
 [ 5 post ] 

 Relevant Pages 

1. perl multi line substitution

2. Multi-line substitution

3. Multi-line substitution (adding newlines and global replace)

4. multi-line substitution

5. help: multi-line substitution

6. Multiline substitution from command line in Perl 4. How?

7. Substitution with multi-line regexp - I give up!

8. Multi-line regex w/ multi-file loop

9. How to grok multi-line lines

10. single line regex and multi-line regex without resetting $/

11. Multi-Line Record into One line

12. mutli-line substitution regexp

 

 
Powered by phpBB® Forum Software