Reading logfiles backwards? 
Author Message
 Reading logfiles backwards?

I'm trying to read (line by line) a very large logfile that contains the
data of interest near the bottom of the file. Essentially, I would like to
be able to read one line at a time, in reverse, starting at the end of the
file. Is there any good way of doing this? fseek() and fsetpos() didn't seem
to do much for me.

Thanks,
Moshe

--

wreck.org bellsouth.net resnet.gatech.edu burdell.org yo.dhs.org gooning.org


--



Tue, 29 Apr 2003 03:00:00 GMT  
 Reading logfiles backwards?

Quote:
> I'm trying to read (line by line) a very large logfile that contains the
> data of interest near the bottom of the file. Essentially, I would like to
> be able to read one line at a time, in reverse, starting at the end of the
> file. Is there any good way of doing this? fseek() and fsetpos() didn't seem
> to do much for me.

Haven't seen any answer in a reasonable time, so:

fseek() or fsetpos() still are the way to go. However, for textfiles these
do not directly have any significance. A reasonable implementation will
set the current file position to a specific byte  -- and text files do not
have bytes but characters. On a MS Windows or DOS system a '\n' character
occupies TWO bytes ...

The basic algorithm to do what you want is something like:

Step 0: first allocate a buffer which is large enough to contain at least
the largest possible line. Or be prepared to check whether an entire line
has been read. Let's call the buffer's size BUFFER_SIZE

Step 1: fseek() or fsetpos() to the end of the file, then use ftell() or
fgetpos() to determine where that is and store the result.

Step 2: fseek() or fsetpos() to a position BUFFER_SIZE bytes before the
position found in Step 1.

Step 3: fgets() a partial line into the buffer. If the buffer is too small
the last character of the read string will not be an '\n' -- if so, repeat
this step.

Step 4: determine where we are now in the file and maybe store the result.
Similar to step 1.  This is a valid position of a start of a line.
If we are now already at (or even past -- another process might have
appended data) the position found in step 1 the buffer was too small. Too
bad.

Step 5: fgets() a line into the buffer. This is a valid and complete line
if it ends with a '\n'. Again determine where we are now in the file and
maybe store the result. Similar to step 4.  This is also a valid position
of a start of a line. Now if we are at (or past) the position found in
Step 1 we have found the last line in the file. If not we are not there
and we have to repeat this step. (Optionally after storing the content of
the buffer elsewhere as it is after all a valid line near the end of the
file.)

This finds the last line. For the next to last line: repeat steps 2 to 5
using the value found in step 4 as the value for "step 1"

Now implement this and show the complete results.

--
Greetings from
 _____
 /_|__| Auke Reitsma, Delft, The Netherlands.
/  | \  -------------------------------------
        Remove SPAMBLOCK from my address ...
--



Mon, 05 May 2003 08:44:33 GMT  
 Reading logfiles backwards?

Quote:

> Haven't seen any answer in a reasonable time, so:
> fseek() or fsetpos() still are the way to go. However, for textfiles these
> do not directly have any significance. A reasonable implementation will
> set the current file position to a specific byte  -- and text files do not
> have bytes but characters. On a MS Windows or DOS system a '\n' character
> occupies TWO bytes ...
> The basic algorithm to do what you want is something like:

<snip great explanation>

Thanks so much for your response. That was better than anyone else I'd
asked so far. You definitely took away a lot of the fear I had of ap-
proaching this problem.

Quote:
> Now implement this and show the complete results.

Heh... As I'm leaving town in 2 days, I won't have time for this until
I return. But I'll gladly post my result when I finish it!

Thanks again,
Moshe

--

wreck.org bellsouth.net resnet.gatech.edu burdell.org yo.dhs.org gooning.org


--



Mon, 05 May 2003 11:46:53 GMT  
 Reading logfiles backwards?

Quote:


>> I'm trying to read (line by line) a very large logfile that contains the
>> data of interest near the bottom of the file. Essentially, I would like to
>> be able to read one line at a time, in reverse, starting at the end of the
>> file. Is there any good way of doing this? fseek() and fsetpos() didn't seem
>> to do much for me.

>Haven't seen any answer in a reasonable time, so:

>fseek() or fsetpos() still are the way to go. However, for textfiles these
>do not directly have any significance. A reasonable implementation will
>set the current file position to a specific byte  -- and text files do not
>have bytes but characters. On a MS Windows or DOS system a '\n' character
>occupies TWO bytes ...

>The basic algorithm to do what you want is something like:

>Step 0: first allocate a buffer which is large enough to contain at least
>the largest possible line. Or be prepared to check whether an entire line
>has been read. Let's call the buffer's size BUFFER_SIZE

>Step 1: fseek() or fsetpos() to the end of the file, then use ftell() or
>fgetpos() to determine where that is and store the result.

Functions fseek() and ftell() may not work on files > 2GB on a 32
bit machine as the largest allowed offset is a signed integer.

Quote:
>Step 2: fseek() or fsetpos() to a position BUFFER_SIZE bytes before the
>position found in Step 1.

You can not fsetpos() elsewhere in a file, as fsetpos() only
accepts a magic cookie returned by fgetpos().

Quote:
>Step 3: fgets() a partial line into the buffer. If the buffer is too small
>the last character of the read string will not be an '\n' -- if so, repeat
>this step.

Text I/O may not work as fseek() on text files is limited to
moving to the beginning or end of file, current position, or some
position previously returned by ftell(), as that value is also a
magic cookie in the case of text files.

- Show quoted text -

Quote:
>Step 4: determine where we are now in the file and maybe store the result.
>Similar to step 1.  This is a valid position of a start of a line.
>If we are now already at (or even past -- another process might have
>appended data) the position found in step 1 the buffer was too small. Too
>bad.

>Step 5: fgets() a line into the buffer. This is a valid and complete line
>if it ends with a '\n'. Again determine where we are now in the file and
>maybe store the result. Similar to step 4.  This is also a valid position
>of a start of a line. Now if we are at (or past) the position found in
>Step 1 we have found the last line in the file. If not we are not there
>and we have to repeat this step. (Optionally after storing the content of
>the buffer elsewhere as it is after all a valid line near the end of the
>file.)

>This finds the last line. For the next to last line: repeat steps 2 to 5
>using the value found in step 4 as the value for "step 1"

>Now implement this and show the complete results.

You may be able to use this technique on files opened in binary
mode and less than 2GB in size, reading backwards a buffer at a
time, and recording the locations of end of line markers,
relative to the file, by adding the block and buffer offsets.

Thanks. Take care, Brian Inglis         Calgary, Alberta, Canada
--

                                use address above to reply
--



Mon, 05 May 2003 03:00:00 GMT  
 Reading logfiles backwards?
On 16 Nov 2000 18:19:57 GMT, Brian Inglis

Quote:



> >Step 1: fseek() or fsetpos() to the end of the file, then use ftell() or
> >fgetpos() to determine where that is and store the result.

> Functions fseek() and ftell() may not work on files > 2GB on a 32
> bit machine as the largest allowed offset is a signed integer.

Yeah. So there's that. Now what <censored> lets a text file grow over 2 GB
then expects to handle it with C.

Quote:
> >Step 2: fseek() or fsetpos() to a position BUFFER_SIZE bytes before the
> >position found in Step 1.

> You can not fsetpos() elsewhere in a file, as fsetpos() only
> accepts a magic cookie returned by fgetpos().

Theoretically and formally that's correct. Practically you can usually
ignore it. The cookie tends to be a plain long for common systems ...
Yes, that's against the standard. But if using the standard means it can't
be done I tend to throw it out ;-)
The better answer is to use fseek() ...

Quote:
> >Step 3: fgets() a partial line into the buffer. If the buffer is too small
> >the last character of the read string will not be an '\n' -- if so, repeat
> >this step.

> Text I/O may not work as fseek() on text files is limited to
> moving to the beginning or end of file, current position, or some
> position previously returned by ftell(), as that value is also a
> magic cookie in the case of text files.

See previous answer.

Quote:
> You may be able to use this technique on files opened in binary
> mode and less than 2GB in size, reading backwards a buffer at a
> time, and recording the locations of end of line markers,
> relative to the file, by adding the block and buffer offsets.

Yup. Conceptually the same as I did. Now if a fseek() or ftell() on a
textfile does not work more or less as if it was used on a binary file I
would NOT call that implementation of fseek() or ftell() reasonable.

Then add in the likely level of C-competence of the original poster ...

--
Greetings from
 _____
 /_|__| Auke Reitsma, Delft, The Netherlands.
/  | \  -------------------------------------
        Remove SPAMBLOCK from my address ...
--



Tue, 06 May 2003 03:00:00 GMT  
 Reading logfiles backwards?

Quote:

>On 16 Nov 2000 18:19:57 GMT, Brian Inglis



>> >Step 1: fseek() or fsetpos() to the end of the file, then use ftell() or
>> >fgetpos() to determine where that is and store the result.

>> Functions fseek() and ftell() may not work on files > 2GB on a 32
>> bit machine as the largest allowed offset is a signed integer.

>Yeah. So there's that. Now what <censored> lets a text file grow over 2 GB
>then expects to handle it with C.

The original poster has a "very large logfile" that he can't
handle using normal tools. Personally, I'd start with a head
-10000 and tail -10000 and work from there, currently dealing
with a looping daemon log file filling a filesystem problem.

[snip rest]

Thanks. Take care, Brian Inglis         Calgary, Alberta, Canada
--

                                use address above to reply
--



Fri, 09 May 2003 14:29:24 GMT  
 
 [ 6 post ] 

 Relevant Pages 

1. Reading a file backwards

2. Reading a text file backwards

3. Newbie question : read file backwards

4. Logfile Date conversion

5. Proper/safe way to write to a logfile

6. Unicode to Ansi for logfile

7. Backwards Compatibility or Not

8. NUMBER SYSTEM CONVERSION PRINTS BACKWARDS - HELP?

9. ctime() backwards

10. Making a C API backwards compatible

11. Moving Backwards through Link List

12. paging backwards in a file...? help

 

 
Powered by phpBB® Forum Software