Quickly read the last 10 lines of a very large logfile 
Author Message
 Quickly read the last 10 lines of a very large logfile

Hello,

is there a way to quickly read the last 10 lines of a logfile, so a way
to open a file and set the readposition without reading the whole file.

thanks



Mon, 21 Feb 2005 10:09:15 GMT  
 Quickly read the last 10 lines of a very large logfile

Quote:

> is there a way to quickly read the last 10 lines of a logfile, so a way
> to open a file and set the readposition without reading the whole file.

seek(FILE, -10, 2)

this should do the trick.

-- sad.
Those who educate children well are more to be honored than parents, for
these only gave life, those the art of living well.
                -- Aristotle



Mon, 21 Feb 2005 10:51:44 GMT  
 Quickly read the last 10 lines of a very large logfile

Quote:

> > is there a way to quickly read the last 10 lines of a logfile, so a way
> > to open a file and set the readposition without reading the whole file.

> seek(FILE, -10, 2)

> this should do the trick.

No.  Seek counts bytes, not lines.  I'd use File::ReadBackwards from
CPAN. (Hi Uri, beat you again :)

Anno



Mon, 21 Feb 2005 11:03:29 GMT  
 Quickly read the last 10 lines of a very large logfile

Quote:

> is there a way to quickly read the last 10 lines of a logfile, so a way
> to open a file and set the readposition without reading the whole file.

$foo = `tail -n 10 $myfile`;

----- stephan
Registered Linux User #71917 http://counter.li.org
I speak for myself, not my employer. Contents may
be hot. Slippery when wet. Reading disclaimers makes
you go blind. Writing them is worse. You have been Warned.



Mon, 21 Feb 2005 11:07:26 GMT  
 Quickly read the last 10 lines of a very large logfile
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Quote:
> Hello,

> is there a way to quickly read the last 10 lines of a logfile, so a way
> to open a file and set the readposition without reading the whole file.

Use either the File::ReadBackwards module or the Tie::File module.

- --
Eric
print scalar reverse sort qw p ekca lre reh
ts uJ p, $/.r, map $_.$", qw e p h tona e;

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com>

iQA/AwUBPXc6RGPeouIeTNHoEQJ1KACgl449vrG+1TUmeb7GV5UUvS33KDcAnjde
NHtd3FPIsKKkuWgcl2xs3wTp
=H7xT
-----END PGP SIGNATURE-----



Mon, 21 Feb 2005 12:04:07 GMT  
 Quickly read the last 10 lines of a very large logfile

Quote:
> > is there a way to quickly read the last 10 lines of a logfile, so a way
> > to open a file and set the readposition without reading the whole file.

$lastTenLines = `tail -10 $file`;

...assuming you work in an Unix environment.

-rl

--

I am *not* speaking for Infineon Technologies.



Mon, 21 Feb 2005 12:36:29 GMT  
 Quickly read the last 10 lines of a very large logfile

Eric> Use either the File::ReadBackwards module or the Tie::File module.

"quickly" and "Tie::File" do not belong in the same world.

Unless you mean "programmer time quickly".  Certainly not "runtime
quickly".

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095

Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!



Mon, 21 Feb 2005 19:47:44 GMT  
 Quickly read the last 10 lines of a very large logfile

  Eric> Use either the File::ReadBackwards module or the Tie::File module.
  >>
  >> "quickly" and "Tie::File" do not belong in the same world.
  >>
  >> Unless you mean "programmer time quickly".  Certainly not "runtime
  >> quickly".

  z> Heres something in pure perl to tail.

and tie::file and File::ReadBackwards are what? chopped liver?

  z> #####################################################
  z> #!/usr/bin/perl -w
  z> # Simple program to read the last n line(s) of a file.

add slow to this.

  z> # Reads from the end of the file for effeciency
  z> # linux only,{*filter*}\r\n

both of the modules handle that as well.

  z> binmode FILE;

why binmode the file if it is linux only and doesn't handle \r\n?

  z> # Rewind from the end of the file until counted eol's
  z> my $count=0;
  z>  while (1){
  z>    seek FILE,-1,1;
  z>    read FILE,$byte,1;

oh wow. reading a file one byte at a time. how sweet. i rewrote a
program that used getc and it got a 60x speedup. seek and read for each
byte will be almost as slow as getc if not slower.

  z>    if(ord($byte) == 10 ){$count++;if($count == $numlines){last}}

this is not nearly as useful as the tail program tom c wrote for the ppt
project. and of course it is not useful as a sub in another program
which is what the OP wanted.

uri

--

----- Stem and Perl Development, Systems Architecture, Design and Coding ----
Search or Offer Perl Jobs  ----------------------------   http://www.*-*-*.com/



Tue, 22 Feb 2005 21:37:14 GMT  
 Quickly read the last 10 lines of a very large logfile

Quote:



> >  z> Heres something in pure perl to tail.

> >and tie::file and File::ReadBackwards are what? chopped liver?

> >  z> #####################################################
> >  z> #!/usr/bin/perl -w
> >  z> # Simple program to read the last n line(s) of a file.

> >add slow to this.

> I definitely bow to your superior perl knowledge URI, and I realize
> that your module is totally portable, but I don't think it is faster.
> I have an 85 meg file I want to tail the last ten lines of.
> Of course, all things considered File::Backwards is better,
> but not in speed.

Wrong- the File::ReadBackwards implementation will blow the doors
off your code.  The major difference between your code snippets
is that you've traded "use File::ReadBackwards" for "use strict".
If I normalize both scripts by having them both "use strict" and
"use File::ReadBackwards", here's what I get with your method:

  real    0m0.434s
  user    0m0.400s
  sys     0m0.040s

Here's what I get with File::ReadBackwards:

  real    0m0.337s
  user    0m0.310s
  sys     0m0.030s

Now here's what I get for a script with just

  #!/usr/bin/perl
  use strict;
  use File::ReadBackwards;

in it:

  real    0m0.324s
  user    0m0.290s
  sys     0m0.030s

Since this is the common element to both, let's
naively [*] subtract it out from both-

        your method    File::ReadBackwards
real       110 ms            13 ms
user       110 ms            10 ms  
sys         10 ms             0 ms

IOW, it appears the File::ReadBackwards implementation
is about 10x faster than your code at grabbing the last
10 lines of the file.  If you want the last 100 or so,
the gap will be significantly wider.

[*] This is somewhat bogus; it's better to use Benchmark.pm
here to measure the difference over a few hundred repetitions.
Nevertheless, the results are about the same:

  % perl5.8.0 ptail.pl 10 /var/log/messages

  [...]

          Rate  stail  slurp  ntail   tail    frb  ptail
  stail 1.96/s     --   -41%   -81%   -97%   -99%   -99%
  slurp 3.29/s    68%     --   -68%   -94%   -99%   -99%
  ntail 10.3/s   425%   212%     --   -83%   -96%   -97%
  tail  58.8/s  2907%  1686%   472%     --   -79%   -84%
  frb    286/s 14506%  8574%  2680%   386%     --   -20%
  ptail  357/s 18157% 10743%  3375%   507%    25%     --

I've posted this Benchmark code in clp.misc before;
so I won't repeat it.  The new "entry" here is listed
above under ntail:

sub ntail {
    my $fh = shift;
    my $numlines  = shift;
    my $byte;

    # Rewind from the end of the file until counted eol's
    seek $fh,-1, 2;  #get past last eol
    my $count=0;
    while (1){
        seek $fh,-1,1;
        read $fh,$byte,1;
        if(ord($byte) == 10 ){$count++;if($count == $numlines){last}}
        seek $fh,-1,1;
        if (tell $fh == 0){last}  
    }
    local $/ unless wantarray;
    return <$fh>;

Quote:
}

--
Joe Schaefer   "I'm all in favor of keeping dangerous weapons out of the hands
                           of fools. Let's start with typewriters."
                                               -- Frank Lloyd Wright


Wed, 23 Feb 2005 15:54:18 GMT  
 Quickly read the last 10 lines of a very large logfile
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Quote:
> Of course, all things considered File::Backwards is better,
> but not in speed.

> Here is a method using File::Readbackwards :
...
> and here is my rudimentary method
...

> Here are times using the system's time command:

The system's time command is a poor way to compare two pieces of
Perl code.  Compile times skew things.  Use the Benchmark module;
that's what it's there for.

#!/usr/bin/perl

use strict;
use Benchmark;
use File::ReadBackwards;

sub frb
    {
    my $filename = shift;
    my $numlines  = shift;

    my $bw = File::ReadBackwards->new($filename) or
        die "can't read $filename $!" ;

    my $line;
    while(defined($line = $bw->readline))
        {


        }

    }

sub one
    {
    my $filename = shift;
    my $numlines  = shift;
    my $byte;

    # Open the file in read mode
    open FILE, "<$filename" or die "Couldn't open $filename: $!";

    # Rewind from the end of the file until count eol's
    seek FILE,-1, 2;  #get past last eol
    my $count=0;
    while (1)
        {
        seek FILE,-1,1;
        read FILE,$byte,1;
        if(ord($byte) == 10 ){$count++;if($count == $numlines){last}}
        seek FILE,-1,1;
        if (tell FILE == 0){last}
        }
    local $/=undef;
    my $tail = <FILE>;
    }

timethese(1000, {
              Readbackwards   => sub { frb('file', 10); },
              OneAtATime      => sub { one('file', 10); },
             });

Results:
Benchmark: timing 1000 iterations of OneAtATime, Readbackwards...


- --
Eric
print scalar reverse sort qw p ekca lre reh
ts uJ p, $/.r, map $_.$", qw e p h tona e;

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com>

iQA/AwUBPXtSrWPeouIeTNHoEQKqWgCg24YlD+sYs5pBNaHoLB+A9WYIk9IAoP6z
SMUnevzVNaLZ92cy230Vv6A4
=QNKn
-----END PGP SIGNATURE-----



Thu, 24 Feb 2005 14:36:58 GMT  
 Quickly read the last 10 lines of a very large logfile

Quote:

> On 07 Sep 2002 10:54:18 -0400, Joe Schaefer


[...]

Quote:
> >Wrong- the File::ReadBackwards implementation will blow the doors
> >off your code.  The major difference between your code snippets
> >is that you've traded "use File::ReadBackwards" for "use strict".
> >If I normalize both scripts by having them both "use strict" and
> >"use File::ReadBackwards", here's what I get with your method:

> What???  Why should I have to "normalize my script" by including
> "use FILE::ReadBackwards" ?

So you can compare the implementations without varying the
setup times.

Quote:
> The whole reason for trying this method is to NOT need to use a
> module.

If I thought that was your point, I wouldn't have responded
to your post.  I thought you were trying to understand Uri's
criticisms of the code you posted.  It appeared to me
that the runtimes you got were leading you in the wrong
direction.

I didn't realize you were only trying to win an argument.

--
Joe Schaefer          "Verbosity leads to unclear, inarticulate things."
                                               -- Dan Quayle



Thu, 24 Feb 2005 16:23:47 GMT  
 Quickly read the last 10 lines of a very large logfile

  z> OK guys, just to show that I'm not a total neanderthal, I set up
  z> some fair benchmarking code, instead of running time on
  z> commandline examples with ARGV's, which forces repetitive
  z> loading of a module.

  z> The File::ReadBackwards method is about 10 times faster.

which is what the other poster's benchmarks also showed. disregarding
startup time, reading one char at a time vs a whole block (which is what
file::readbackwards does) can't be faster. and if

  z>     my $bw = File::ReadBackwards->new('ARCHIVES') or
  z>                die "can't read filename $!" ;

  z>          while(defined($line = $bw->readline)){

  z>           $count++;
  z>           if ($count == 10){last}
  z>          }


when doing benchmarks it is wise to not do any extra work that is not
directly part of the test. why do you push the lines, count then, and
write them to a dummy file? just do the tight loop:

        while(defined($line = $bw->readline)){
                last if ++$count >= 10 ;
        }

  z>      while (1){
  z>        seek FILE,-1,1;
  z>        read FILE,$byte,1;
  z>        if(ord($byte) == 10 ){$count++;if($count == 10){last}}
  z>        seek FILE,-1,1;
  z>        if (tell FILE == 0){last}  
  z>      }
  z>      $/=undef;
  z>      my $tail = <FILE>;
  z>      print BLACKHOLE "$tail\n";

and drop the print there.

that will be a more realistic benchmark as file writing is not what you
are testing.

uri

--

----- Stem and Perl Development, Systems Architecture, Design and Coding ----
Search or Offer Perl Jobs  ----------------------------  http://jobs.perl.org



Thu, 24 Feb 2005 21:58:06 GMT  
 Quickly read the last 10 lines of a very large logfile

[...]

Quote:
>      while (1){
>        seek FILE,-1,1;
>        read FILE,$byte,1;
>        if(ord($byte) == 10 ){$count++;if($count == 10){last}}
>        seek FILE,-1,1;
>        if (tell FILE == 0){last}  
>      }

As Uri said, the reason why this isn't fast is because your
making 3 IO calls (two seeks + 1 read) per *character*. It's
not horrendous because the OS will buffer reads from the
disk, so you're not actually making a disk access on each
character.

To make it fast, you need to unroll all those IO-per-character
calls into a few block reads, and then scan the characters
*within each block*.  Here's a very mechanical attempt,
using rindex() to do the scanning:

sub tailz {
    my $fh = shift;
    my $numlines  = shift;
    my $block_size = 4096;
    my $buffer = '';
    # Rewind from the end of the file until counted eol's
    seek $fh,-1, 2;  #get past last eol
    my $count=0;

 LOOP: while (1) {
        seek $fh,-$block_size,1;
        read $fh,$buffer,$block_size;

        my $match = length $buffer;
        while ( ($match = rindex($buffer,"\n",$match-1)) >= 0 )  {
            next unless ++$count == $numlines;
            seek $fh, 1 + $match - length($buffer), 1;
            last LOOP;
        }

        seek $fh,-$block_size,1;
        if (tell $fh == 0) {last}
    }
    local $/ unless wantarray;
    return <$fh>;

Quote:
}

Light testing indicates this is even faster than File::ReadBackwards.
(It may have some off-by-one bugs, though.  I didn't test it very much).
So it turns out you were on the right track after all :-)

--
Joe Schaefer    "If you were plowing a field, which would you rather use? Two
                                strong oxen or 1024 chickens?"
                                               -- Seymour Cray



Thu, 24 Feb 2005 23:20:09 GMT  
 Quickly read the last 10 lines of a very large logfile

  JS> To make it fast, you need to unroll all those IO-per-character
  JS> calls into a few block reads, and then scan the characters
  JS> *within each block*.  Here's a very mechanical attempt,
  JS> using rindex() to do the scanning:

  JS>  LOOP: while (1) {
  JS>         seek $fh,-$block_size,1;
  JS>         read $fh,$buffer,$block_size;

  JS>         my $match = length $buffer;
  JS>         while ( ($match = rindex($buffer,"\n",$match-1)) >= 0 )  {
  JS>             next unless ++$count == $numlines;
  JS>             seek $fh, 1 + $match - length($buffer), 1;
  JS>             last LOOP;
  JS>         }

  JS>         seek $fh,-$block_size,1;
  JS>         if (tell $fh == 0) {last}
  JS>     }
  JS>     local $/ unless wantarray;
  JS>     return <$fh>;
  JS> }

the problem here is that it returns the whole tailed block as a single
string. most users of file::readbackwards are scanning the file line by
line from the end. so there are semantic differences to deal with as
well as speed issues. the solution in the perl cookbook would even be
faster if you factor out the initial index scan. there are many issues
involved when designing a module. i decided to make file::readbackwards
very easy to use and reasonably fast. it also handles different line
endings and you can even set the one you want. so can be slower than
some specific solution such as the above code. try adding support to
return the lines one at a time in reverse order. it will slow the above
code down greatly as it needs to buffer things and track them.

uri

--

----- Stem and Perl Development, Systems Architecture, Design and Coding ----
Search or Offer Perl Jobs  ----------------------------  http://jobs.perl.org



Thu, 24 Feb 2005 23:33:25 GMT  
 Quickly read the last 10 lines of a very large logfile

Quote:


>   JS> To make it fast, you need to unroll all those IO-per-character
>   JS> calls into a few block reads, and then scan the characters
>   JS> *within each block*.  Here's a very mechanical attempt,
>   JS> using rindex() to do the scanning:

>   JS>  LOOP: while (1) {
>   JS>         seek $fh,-$block_size,1;
>   JS>         read $fh,$buffer,$block_size;

>   JS>         my $match = length $buffer;
>   JS>         while ( ($match = rindex($buffer,"\n",$match-1)) >= 0 )  {
>   JS>             next unless ++$count == $numlines;
>   JS>             seek $fh, 1 + $match - length($buffer), 1;
>   JS>             last LOOP;
>   JS>         }

>   JS>         seek $fh,-$block_size,1;
>   JS>         if (tell $fh == 0) {last}
>   JS>     }
>   JS>     local $/ unless wantarray;
>   JS>     return <$fh>;
>   JS> }

> the problem here is that it returns the whole tailed block as a single
> string.

There's also a serious bug (the seek's can/will fail on smaller files,
so it needs some simple error-handling to catch that).  I was just
trying to show how to reduce the I/O by restructuring the original
IO-per-character loop.  Using rindex is also a big win over doing
an test-each-character perl loop, so there's really two speedups
here, not just one.

Too bad it's still borked, which is another good reason to use a
well-tested module.

--
Joe Schaefer     "Documentation is like term insurance: It satisfies because
                 almost no one who subscribes to it depends on its benefits."
                                               -- Alan J. Perlis



Sat, 26 Feb 2005 02:09:44 GMT  
 
 [ 15 post ] 

 Relevant Pages 

1. Last 10 Lines from a file

2. read last line without reading previous lines, how?

3. How to read and show 10 lines

4. Reading Last X Lines of File?

5. Reading last 3 lines from the file

6. read last line only...

7. Reading the last line of a file

8. best way to read last line?

9. Last Logon info extracted from RAS server logfiles Help

10. Please Help with 10 line programm -- beginner questions

11. 10 by 10 (error results?)

12. result pages (10 by 10)

 

 
Powered by phpBB® Forum Software