read in a file and delete first 8 lines 
Author Message
 read in a file and delete first 8 lines

I need to read in a html file, then delete the first 8 lines (html tags)
of the file.  I also want to delete the
last two html tags.  The files that I will be reading in, were
previously saved by another script I wrote... so
I know exactly what the files will look like.

Here is what the file will look like:

<html>
<head>
<title>ISSUE</title>
</head>
<body bgcolor=#efefef text=#000000 link=#000000 vlink=#000000 alink=#000000>
<base href="http://nodename/">
<form action="/usr/local/script" method="post">
<input type="submit" value="UPDATE">
<pre>

body of html file

</pre>
</body>
</html>

Here is the code that I wrote that works.  Is there a better/easier way?

  if ( -e "$pathname" ) {                   # if the file exists, open it
    open(READ_FILE,$pathname);
    $g=0;                                   # init file line counter scalar
    while (<READ_FILE>) {
      $$ifile[$g] .= $_;                    # save each line to an associative array
      $$ifile[$g] =~ s/<\/body>//g;         # delete the </body> tag
      $$ifile[$g] =~ s/<\/html>//g;         # delete the </html> tag
      $g++ }                                # counter for length (number of lines) of file
    close(READ_FILE); }
  else {                                    # file does not exist
    print ("FILE CANNOT BE OPENED \n"); }
  for ($p = 8; $p <= $g; $p++) {
    print $$ifile[$p]; }                    # print out the file based on the number of lines
                                            # do not print the first 8 lines

David Clemmons
/* remove 'SPAM' from address to email me */



Wed, 18 Jun 1902 08:00:00 GMT  
 read in a file and delete first 8 lines

Quote:
>> On Mon, 22 May 2000 11:41:52 -0500,

> I need to read in a html file, then delete the first 8
> lines (html tags) of the file.  I also want to delete
> the last two html tags.  The files that I will be
> reading in, were previously saved by another script I
> wrote... so I know exactly what the files will look
> like.

Deleting the first 8 lines is probably best done something
like this:

    my $file = '/etc/passwd';

    open P, $file or die "Couldn't open $file: $!\n";

    while (<P>) {
      next if $. <= 8;
      print;
    }

    close P;

(perldoc perlvar to see what $. is).

As for removing the final HTML closing tags, if you know
they are really the last 2 lines of the file, then a
similar strategy applies as above, otherwise you could
match on the closing body tag and exit the read/print
loop.

TIMTOWTDI of course.

hth
t



Wed, 18 Jun 1902 08:00:00 GMT  
 read in a file and delete first 8 lines

Quote:

> I need to read in a html file, then delete the first 8 lines (html tags)
> of the file.  I also want to delete the
> last two html tags.  The files that I will be reading in, were
> previously saved by another script I wrote... so
> I know exactly what the files will look like.

> Here is what the file will look like:

> <html>
> <head>
> <title>ISSUE</title>
> </head>
> <body bgcolor=#efefef text=#000000 link=#000000 vlink=#000000 alink=#000000>
> <base href="http://nodename/">
> <form action="/usr/local/script" method="post">
> <input type="submit" value="UPDATE">
> <pre>

> body of html file

> </pre>
> </body>
> </html>

Okay, based on this it looks like you want to extract the <PRE>...</PRE>
segment, and include the enclosing <PRE> tags.  How 'bout this:

## EFFICIENTLY READ THE FILE INTO SCALAR $File
open INF, "whatever.html";  ## YOUR FILENAME HERE
binmode INF;  ## IF YOU ARE RUNNING ON WINDOWS
read INF, $File, -s INF;
close INF;

## REGEX TO LOSE ANYTHING AROUND THE <PRE></PRE> TAGS
$File =~ s/.+<PRE>.+?</PRE>.+/$1/ie;

Hope this helps,
JH

----------------------------------------------------------------
Jeff Helman                 Product Manager -- Internet Services

----------------------------------------------------------------

99 little bugs in the code, 99 bugs in the code.
Fix one bug, compile again, 100 little bugs in the code.
100 little bugs in the code, 100 bugs in the code.
Fix one bug, compile again, 101 little bugs in the code...

----------------------------------------------------------------



Wed, 18 Jun 1902 08:00:00 GMT  
 read in a file and delete first 8 lines


Quote:
>>> On Mon, 22 May 2000 11:41:52 -0500,

>> I need to read in a html file, then delete the first 8
>> lines (html tags) of the file.  I also want to delete
>> the last two html tags.  The files that I will be
>> reading in, were previously saved by another script I
>> wrote... so I know exactly what the files will look
>> like.

>Deleting the first 8 lines is probably best done something
>like this:

>    my $file = '/etc/passwd';

>    open P, $file or die "Couldn't open $file: $!\n";

>    while (<P>) {
>      next if $. <= 8;
>      print;
>    }

>    close P;

Or maybe

open P, $file or die "Couldn't open $file: $!\n";

while (<P>) {
  print unless 1 .. 8;

Quote:
}

hth,

Dave...

--

yapc::Europe - London, 22 - 24 Sep <http://www.yapc.org/Europe/>

"There ain't half been some clever bastards" - Ian Dury [RIP]



Wed, 18 Jun 1902 08:00:00 GMT  
 read in a file and delete first 8 lines


...

Quote:
> Okay, based on this it looks like you want to extract the <PRE>...</PRE>
> segment, and include the enclosing <PRE> tags.  How 'bout this:

> ## EFFICIENTLY READ THE FILE INTO SCALAR $File
> open INF, "whatever.html";  ## YOUR FILENAME HERE

You omitted the test and diagnostic for failure to open the file!

Quote:
> binmode INF;  ## IF YOU ARE RUNNING ON WINDOWS

No.  The file should be handled as a text file.

Quote:
> read INF, $File, -s INF;

Well, that certainly seems to be the most efficient way.  :-)

Quote:
> close INF;

> ## REGEX TO LOSE ANYTHING AROUND THE <PRE></PRE> TAGS
> $File =~ s/.+<PRE>.+?</PRE>.+/$1/ie;

> Hope this helps,

No, it won't help, for several reasons.

1.  It won't compile, because of the unescaped slash in the second tag.  
Use alternate delimiters.

2.  It won't match, because '.' doesn't match a newline.  Use the /s
modifier.

3.  There are no parentheses in the regex to capture what is matched.

4.  It will produce chaos, because you have specified that the captured
string be evaluated as a Perl expression.  Drop the /e modifier.

Untested (because I wouldn't recommend this approach in any case):

  $File =~ s%.+(<PRE>.+?</PRE>).+%$1%is;

It would be a good idea to enhance the value of your intended
helpfulness by posting only tested code, unless explicitly stated
otherwise.

--
(Just Another Larry) Rosler
Hewlett-Packard Laboratories
http://www.hpl.hp.com/personal/Larry_Rosler/



Wed, 18 Jun 1902 08:00:00 GMT  
 read in a file and delete first 8 lines

: I need to read in a html file, then delete the first 8 lines (html tags)
: of the file.  I also want to delete the last two html tags.  The files
: that I will be reading in, were previously saved by another script I
Quote:
: wrote... so I know exactly what the files will look like.

:
: Here is what the file will look like:
[snip]
: Here is the code that I wrote that works.  Is there a better/easier way?
:
:   if ( -e "$pathname" ) {                   # if the file exists, open it
:     open(READ_FILE,$pathname);

This would usually be done in one step, by checking the result of open.
Among other problems, your way involves a potential race condition.  Also,
the double quotes around $pathname in the conditional are redundant.

:     $g=0;                                   # init file line counter scalar
:     while (<READ_FILE>) {
:       $$ifile[$g] .= $_;                    # save each line to an associative array
:       $$ifile[$g] =~ s/<\/body>//g;         # delete the </body> tag
:       $$ifile[$g] =~ s/<\/html>//g;         # delete the </html> tag
:       $g++ }                                # counter for length (number of lines) of file
:     close(READ_FILE); }
:   else {                                    # file does not exist
:     print ("FILE CANNOT BE OPENED \n"); }
:   for ($p = 8; $p <= $g; $p++) {
:     print $$ifile[$p]; }                    # print out the file based on the number of lines
:                                             # do not print the first 8 lines

That seems like an awful lot of work!  I'm also unclear on why you used
references to access lines.  Here's how I'd do it, if I knew the file were
small enough to get into memory all at once without a problem (which will
be true for any reasonable html page).

  open READ_FILE, "< $pathname" or die $!;

  close READ_FILE;




Alternatively, replacing the last three lines with


does the same thing even more concisely, though I find it less pleasing.

--

 --*--  http://www.cinenet.net/users/cberry/home.html
   |   "The road of Excess leads to the Palace
      of Wisdom" - William Blake



Wed, 18 Jun 1902 08:00:00 GMT  
 read in a file and delete first 8 lines



Quote:
> I need to read in a html file, then delete the first 8 lines (html tags)
> of the file.  I also want to delete the
> last two html tags.  The files that I will be reading in, were
> previously saved by another script I wrote... so
> I know exactly what the files will look like.

> Here is what the file will look like:

> <html>
> <head>
> <title>ISSUE</title>
> </head>
> <body bgcolor=#efefef text=#000000 link=#000000 vlink=#000000
alink=#000000>
> <base href="http://nodename/">
> <form action="/usr/local/script" method="post">
> <input type="submit" value="UPDATE">
> <pre>

> body of html file

> </pre>
> </body>
> </html>

What about:-
use HTML::Parser ();
print <pre>
parse file
print file
print </pre>

JohnShep
(Assuming that you haven't got another pair of <pre> tags in the body)



Wed, 18 Jun 1902 08:00:00 GMT  
 read in a file and delete first 8 lines

Quote:



> ...

> > Okay, based on this it looks like you want to extract the <PRE>...</PRE>
> > segment, and include the enclosing <PRE> tags.  How 'bout this:

> > ## EFFICIENTLY READ THE FILE INTO SCALAR $File
> > open INF, "whatever.html";  ## YOUR FILENAME HERE

> You omitted the test and diagnostic for failure to open the file!

Yes, I did.  I also omitted the check to make sure that the read call
below didn't fail.  I also didn't check to see whether the file even had
<PRE></PRE> tags.  I was leaving bullet-proofing as an exercise for the
reader.

Quote:
> > binmode INF;  ## IF YOU ARE RUNNING ON WINDOWS

> No.  The file should be handled as a text file.

The only problem with this is that on Windows, land of the \r\n
newlines, the -s file test operator reports the size of the file in
bytes.  However, read reports that the \r\n construct is only one byte
long.  This makes error-checking a {*filter*} since a successful read reports
that it read fewer bytes than -s indicates.  And while a sucessive read
will report a 0 (end-of-file condition), I'll never feel totally secure
that I got it all.  Sorry, but I'm paranoid.

Quote:
> > read INF, $File, -s INF;

> Well, that certainly seems to be the most efficient way.  :-)

Actually, sysread is even better, but I was trying to conserve typing.

Quote:
> > close INF;

> > ## REGEX TO LOSE ANYTHING AROUND THE <PRE></PRE> TAGS
> > $File =~ s/.+<PRE>.+?</PRE>.+/$1/ie;

> > Hope this helps,

> No, it won't help, for several reasons.

> 1.  It won't compile, because of the unescaped slash in the second tag.
> Use alternate delimiters.

Oops.  Totally and utterly correct.  I can only plead sleep deprivation.
:)

Quote:
> 2.  It won't match, because '.' doesn't match a newline.  Use the /s
> modifier.

How the hell did I get /e when I meant /s?  See above...

Quote:
> 3.  There are no parentheses in the regex to capture what is matched.

Geez.  Was I drinking...?

Quote:
> 4.  It will produce chaos, because you have specified that the captured
> string be evaluated as a Perl expression.  Drop the /e modifier.

> Untested (because I wouldn't recommend this approach in any case):

>   $File =~ s%.+(<PRE>.+?</PRE>).+%$1%is;

> It would be a good idea to enhance the value of your intended
> helpfulness by posting only tested code, unless explicitly stated
> otherwise.

> --
> (Just Another Larry) Rosler
> Hewlett-Packard Laboratories
> http://www.*-*-*.com/


--
----------------------------------------------------------------
Jeff Helman                 Product Manager -- Internet Services

----------------------------------------------------------------

99 little bugs in the code, 99 bugs in the code.
Fix one bug, compile again, 100 little bugs in the code.
100 little bugs in the code, 100 bugs in the code.
Fix one bug, compile again, 101 little bugs in the code...

----------------------------------------------------------------



Wed, 18 Jun 1902 08:00:00 GMT  
 
 [ 8 post ] 

 Relevant Pages 

1. Deleting (Only) First Blank Line in File

2. deleting first two lines of text file???

3. reading the first line off the file

4. First line skip when reading file

5. Read first line of a file

6. Deleting Current Line of File Read

7. Deleting first line

8. Delete the first line of the Menu

9. Reading first 5 lines data

10. How do I read line by line of data from a file using perl

11. how to read a huge file line by line without loading it into memory

12. reading lines from one file and pulling matching lines in another

 

 
Powered by phpBB® Forum Software