Help - Insertion in large text files 
Author Message
 Help - Insertion in large text files

Hi all,

Recently at work I was given the task of performing proceesing on a
rather large text file. The task involved reading the file searching for
a particular string match, and replacing this found string with some
other calculated string. Sounds simple enough.

My non-elegant solution is to open the first file for read and open a
second file for write one EOF rename old with new.

This works fine for smaller files.

Here is my question:

Is there a way to accomplish this using only the one file. Reason is some
of my files are in excess of 1.5 gig! I cannot afford the disk space
being allocated for two copies of the files.

I should also note that upon the insertion, sometime the newer value is
longer than the original.



Fri, 13 Nov 1998 03:00:00 GMT  
 Help - Insertion in large text files


: Recently at work I was given the task of performing proceesing on a
: rather large text file. The task involved reading the file searching for
: a particular string match, and replacing this found string with some
: other calculated string. Sounds simple enough.
: Is there a way to accomplish this using only the one file. Reason is some
: of my files are in excess of 1.5 gig! I cannot afford the disk space
: being allocated for two copies of the files.
: I should also note that upon the insertion, sometime the newer value is
: longer than the original.

In principle you cannot modify a file 'in-place' when replacement
strings are larger than the originals.  However, if you can change
the application programs (those that eventually use the file) you
can add a 'patch' layer, where you indicate changes to specific
portions of the file.  As you read the file, you pass through the
patch layer, which re-applies the changes and so provides the
calling program with the 'modified' text.

--
Pieter A. Hintjens



Sat, 14 Nov 1998 03:00:00 GMT  
 Help - Insertion in large text files

: Is there a way to accomplish this using only the one file. Reason is some
: of my files are in excess of 1.5 gig! I cannot afford the disk space
: being allocated for two copies of the files.

Depends. If you need to insert n pieces in file, you can set up a queue

        begin....... <-write<-queue size n<-read<-  ...........end

So you read a piece, append it the buffer queue, and then write an
equal sized piece pulled from the front of the queue over what you
just read.

If the queue size exceeds available memory, you can put it in a
second file. Note that the combined sizes of the two files will be
approximately the same as the final file.

Also, this kind of constant seeking will probably drive your os nuts.
It should run, but possibly much slower than two sequential passes.
Also, if your program dies in midupdate, your file is destroyed.

There are more sophisticated file organisation than treating it as a
byte vector. They have more overhead, but it might be worth it.
--

the Queen who straits, the Queen of strife;|          Cupertino, California
with gasp of death or gift of breath       | (xxx)xxx-xxxx            95015
she brings the choice of birth or knife.   |         I don't use no smileys



Sat, 14 Nov 1998 03:00:00 GMT  
 Help - Insertion in large text files


: Is there a way to accomplish this using only the one file. Reason is some
: of my files are in excess of 1.5 gig! I cannot afford the disk space
: being allocated for two copies of the files.

I did something similar to this once: the answer is back one file off
to tape!  It is impossible to create "holes" in a file to add new data
when you're dealing with sequential files. You have to make a copy
somehow and rewrite the file.

For efficiency, you've got to do something to make the file
non-sequential.  Especially if doing this processing is the #1 overhead
for the problem. Use smaller files with different parts of the data, or
some type of tree or chain where you can rearrage pointers to parts of
the file simply.

If it has to be sequential, make it record sequential instead of line
sequential and build in enough slack to make changes easily.  Give
yourself a padding per record of M bytes, then recreate the file only
if that is full. With line sequential files, it is very hard to do
this! With record sequential files, you just read and overwrite fixed
length records.

Or buy another 1.5GB disk :)

Scott



Sat, 14 Nov 1998 03:00:00 GMT  
 
 [ 4 post ] 

 Relevant Pages 

1. Help: Need to display a large text file

2. algorithm for LARGE linked list insertion?

3. Large Insertion into a Tree Control

4. Truncating large text files in C

5. Creating separate text files from one large one

6. Dealing with large text files

7. Sorting strings from a large text file

8. How to read large text file?

9. How to open large text file

10. large text file read (4.5Mbytes)

11. CEdit::CharFromPos failing for large text, need help

12. How do I find current insertion point in a text window (not Edit class)

 

 
Powered by phpBB® Forum Software