How to access files with variable length records? 
Author Message
 How to access files with variable length records?

Is there an efficient way to organize a file containing records of
different lengths without padding them to the maximum size?  

What I'm thinking about here is an editor, where the lines are all
of different lengths...  and there must be a way to not use up 256
bytes of space for empty lines, which should use only a few bytes!

It needs to allow changing, inserting, and deleting records from a
file around 2Meg without visible delay.

------------------------------------------------------------------------

 FREELIB:  ftp://ftp.simtel.net/pub/simtelnet/msdos/a{*filter*}l/freeli22.zip
 Snippets: ftp://ftp.simtel.net/pub/simtelnet/msdos/a{*filter*}l/asnip11.zip
------------------------------------------------------------------------



Sat, 24 Apr 1999 03:00:00 GMT  
 How to access files with variable length records?

Quote:

>Is there an efficient way to organize a file containing records of
>different lengths without padding them to the maximum size?  
>What I'm thinking about here is an editor, where the lines are all
>of different lengths...  and there must be a way to not use up 256
>bytes of space for empty lines, which should use only a few bytes!
>It needs to allow changing, inserting, and deleting records from a
>file around 2Meg without visible delay.

  Can you say "index"?

Regards.

---



Sat, 24 Apr 1999 03:00:00 GMT  
 How to access files with variable length records?

Quote:

>Is there an efficient way to organize a file containing records of
>different lengths without padding them to the maximum size?  

>What I'm thinking about here is an editor, where the lines are all
>of different lengths...  and there must be a way to not use up 256
>bytes of space for empty lines, which should use only a few bytes!

>It needs to allow changing, inserting, and deleting records from a
>file around 2Meg without visible delay.

Several years I had thought about doing an editor and I had decided to use a
double-linked list. Each line would have a pointer to the next and previous
lines. This makes scrolling and cut/paste operations relatively painless, though
insertions with word wrap would still be 'expensive'.

HTH

Joe



Sat, 24 Apr 1999 03:00:00 GMT  
 How to access files with variable length records?


says...

Quote:
> Is there an efficient way to organize a file containing records of
> different lengths without padding them to the maximum size?  

> What I'm thinking about here is an editor, where the lines are all
> of different lengths...  and there must be a way to not use up 256
> bytes of space for empty lines, which should use only a few bytes!

> It needs to allow changing, inserting, and deleting records from a
> file around 2Meg without visible delay.

Probably the most obvious approach would be to use an array of the
offsets of the beginnings of the lines as an index into the file.  
When you need to insert a new line, you'd open up a spot in the index
by copying all the sucessive line's index entries down a spot, then
insert the offset of the end of the file, and write the new text
there.  To delete a line, simply delete its entry from the index.  To
change a line you have two possibilities, simply delete the old line
and insert a new one if the new line is longer than the old.  Simply
change characters of the old line is shorter or the same length as the
old one.  To rearrange lines simply swap their index entries around.

If you need a normal text file when you're done, you'll have to copy
the lines into a new file in the order specified in the index.  In
this case, the index simply allows you to delay this until all edits
are done, instead of doing it each time a line is edited.  

If you're doing something like a word processor, you can store the
index at the very end of the file, with the last word in the file
specifying the size of the index.

To read in the file, you open it, seek to the end, read a word, then
seek back as far as it specifies, and read in the index.

If you do this, you probably also want to keep a second index of
deleted lines, so if a new line is to be inserted that's no longer
than a currently deleted line, you can reuse the space.  As with most
databases with variable length records, you'll probably also want to
provide some means for cleaning things up by creating a new file with
all records in order and no delete records in the file.  You may also
want to consider rounding all lines to a multiple of, say, 4 bytes in
length.  This will typically help you avoid deleting records for minor
editing in a line, and make it easie to reuse lines that have been
deleted.

--
    Later,
    Jerry.



Sat, 24 Apr 1999 03:00:00 GMT  
 How to access files with variable length records?

Quote:

> Is there an efficient way to organize a file containing records of
> different lengths without padding them to the maximum size?

> What I'm thinking about here is an editor, where the lines are all
> of different lengths...  and there must be a way to not use up 256
> bytes of space for empty lines, which should use only a few bytes!

> It needs to allow changing, inserting, and deleting records from a
> file around 2Meg without visible delay.

> ------------------------------------------------------------------------

>  FREELIB:  ftp://ftp.simtel.net/pub/simtelnet/msdos/a{*filter*}l/freeli22.zip
>  Snippets: ftp://ftp.simtel.net/pub/simtelnet/msdos/a{*filter*}l/asnip11.zip
> ------------------------------------------------------------------------

Just a suggestion here, but the most obvious way would be to define a
single byte that would appear nowhere else as the delimiter.  If the
file is ASCII text, for example, you won't find 0FFH in it.  You could
then read the file a byte at a time, storing a contiguos string of
characters and counting the bytes until you run into the delimiter. You
can then manipulate the string just read to whatever you want to do with
it and pick up where you left off, overwriting the string storage area
with the new string.

Secondary suggestion:
If you knew that all of the records you wanted to manipulate were of
even character length, then you could read two bytes at a time.  If all
records were a length that was some multiple of 16 then you could read
16 bytes at a time, and so on.  Writing some particular signature into
one of the records of defined length could mark end of the record.  Say
with 8 byte checking:

           1          2          3          4           5
012345678|90123456|78901234|56789012|34567890|12345678|90123456|
This is a| test of| the eme|rgency b|roadcast| system |!!<eof> |

Now all you gotta do is read 8 byte strings and keep them contiguous
until you find the text "<eof>" in the record.  No matter how long the
record actually is, your slack space PER RECORD would only amount to 8
bytes.
--
                                      __
           _\\|//_                   |_/\
          (` o-o ')                ,--,;\)
 -------oooO-(_)-Oooo-------     ,-'-..._\
 Amateur Radio Station N4CRO     \_...._(_)
 ---------------------------      |c c  ) )  You mess with me,
                              ___ /`._ / /   you mess with my
                          -==[___]\/;  \/     Alludium  Q-36
                                `3-'| ` )    Explosive  Space
         Bill Stewart             <'/||\`>    Modulator ....
        2324 Dover Ave            __|::|
  Fort Myers, FL  33907  USA     (__.';|



Sat, 24 Apr 1999 03:00:00 GMT  
 How to access files with variable length records?


Quote:
Power) writes:


>>Is there an efficient way to organize a file containing records of
>>different lengths without padding them to the maximum size?  

[...]
>Several years I had thought about doing an editor and I had decided to
>use a double-linked list. Each line would have a pointer to the next
>and previous lines. This makes scrolling and cut/paste operations
>relatively painless, though insertions with word wrap would still be
>'expensive'.

If you want word wrap it makes more sense to have each element in the
list represent a "paragraph" instead of a line. In fact, you don't even
store line ends, you break and wrap them on the fly when they're
displayed.-Wm


Sun, 25 Apr 1999 03:00:00 GMT  
 How to access files with variable length records?


Quote:

>> Is there an efficient way to organize a file containing records of
>> different lengths without padding them to the maximum size?

>> What I'm thinking about here is an editor, where the lines are all
>> of different lengths...  and there must be a way to not use up 256
>> bytes of space for empty lines, which should use only a few bytes!

>> It needs to allow changing, inserting, and deleting records from a
>> file around 2Meg without visible delay.

<Bill's suggestions clipped>

Its not entirely clear what you need; I noticed, tho',  that you said
editor... if you mean a text editor, then editing individual lines
isn't usually the best approach. Most editors use the 'gap buffer'
approach : the file (or some substantial part thereof) is loaded into
a buffer large enough to hold the file section plus whatever might be
added to it in a short time. At the 'focus' (the area to be edited),
the buffer is split: the section past the editing focus is shifted up
about 32 characters (depending on the design), and new characters are
inserted into the gap between the sections. The display routine the
displays the buffer, eliminating the end of the gap and using CR
and/or LF as the line delimiter. Its surprisingly fast, and flexible -
the main limitation is the memory needed for the buffer, but creative
file handling can get around that easily with some tradoff in speed.
See _The Craft of Text Editing_ by Criag Finseth for details.

Of course, if this isn't what you wanted, please ignore...

#define KINSEY rand() % 7      ______   http://www.*-*-*.com/
BigTimeHardLineBadLuckFist{*filter*} \ bi / Schol-R-LEA;2 LCF ELF JAM BiWM
Want some catsup for your menu?  \/ "Like marmalade on burnt toast"
Trouble rather the tiger in his lair than the scholar among his books



Sun, 25 Apr 1999 03:00:00 GMT  
 How to access files with variable length records?

Quote:

> Is there an efficient way to organize a file containing records of
> different lengths without padding them to the maximum size?  

> What I'm thinking about here is an editor, where the lines are all
> of different lengths...  and there must be a way to not use up 256
> bytes of space for empty lines, which should use only a few bytes!

> It needs to allow changing, inserting, and deleting records from a
> file around 2Meg without visible delay.

Well, I think the typical solution is to load the entire file into
a memory block that is larger than your file.  Then you have a floating
"gap" (forget what it's really called) that moves as you move the
cursor around.  Something like:

This is a lin_e of text.
             ^ this is your cursor

This is what your buffer would look like:

+----------------------------------------------------------------+
!This is a lin...................................................!
!................................................................!
!................................................................!
!......................................................e of text.!
+----------------------------------------------------------------+
 [ if the cursor is moved to the right you would move the 'e'
   from the end of the buffer to the beginning of the buffer.]

So, basically you can insert any amount of text without ever having to
move more than a few hundred bytes (and that only when you are moving the
cursor around, not when you are typing.)

Of course you have a little more management headaches with this scheme.
Plus the situation where you've got 2 MB of RAM and 40 MB of document.

Admittedly that's not precisely what you asked for, but you know how it
is.
--

<a href="http://www.mcs.usu.edu/~kurto/">Me & my Atari Lynx</a> archive.



Sun, 25 Apr 1999 03:00:00 GMT  
 How to access files with variable length records?

: Is there an efficient way to organize a file containing records of
: different lengths without padding them to the maximum size?  

Doubly linked list... learn it, love it, live it.

Of course that's the simplest solution and for more performance
oriented disk storage and retrieval you'll want to use more
elaborate indexing schemes that don't have to traverse a linked
list searching for a certain record.

Try the book:

"Mixed Language Programming" by John Levine (c) 1994 M&T Books
ISBN 1-55851-332-9

Which gives you sample database applications and a lib (including
source) for ISAM (Indexed Sequential Access Method) database management
system.

David Springer

--

********* WEB GAMES ONLINE ENTERTAINMENT SYSTEM ***********
*                                                         *
*  A client/server system designed from the ground up to  *
*  support real time multi-player games on the Internet.  *
*  Game developers, players, and ISP's check it out at:   *
*                                                         *
************** http://www.eden.com/~springer **************



Tue, 27 Apr 1999 03:00:00 GMT  
 How to access files with variable length records?


: > Is there an efficient way to organize a file containing records of
: > different lengths without padding them to the maximum size?

: Probably the most obvious approach would be to use an array of the
: offsets of the beginnings of the lines as an index into the file.
: When you need to insert a new line, you'd open up a spot in the index
: by copying all the sucessive line's index entries down a spot, then
: insert the offset of the end of the file, and write the new text
: there.  To delete a line, simply delete its entry from the index.

  A very poor technique in general.

  The person asking the question, should consider organizing the text as
a series of linked lists or arbitrary size.  Each list should contain at
least a pointer to the next line, the length of the current line and the
body of the stored text.

  If he needs to navigate backwards, the list should be doubly linked.

  By using this kind of structure, the block moves Jerry Coffin
describes are avoided.  Insertion of lines becomes a simple matter of
appending a line to the end of the buffer and linking it with the
text file at the appropriate spot.  Deleting a line simply means
disconnecting one of the lines from the linked chain.

  Deleted lines should actually be added to a secondary linked list
structure so that their memory can be regained if new lines need to be
added.

--
<---->



Tue, 27 Apr 1999 03:00:00 GMT  
 How to access files with variable length records?



[ ... ]

Quote:
>   The person asking the question, should consider organizing the
> text as a series of linked lists or arbitrary size.  Each list
> should contain at least a pointer to the next line, the length of
> the current line and the body of the stored text.

This is unusually poor advice, even for you Scott.  First of all, it
mandates that all the text fit into memory.  Second, it makes the
indexing code considerably larger - typically two to three times the
size of a simple array.  Third, it optimizes what's actually a fairliy
unusual operation (inserting or deleting a line) while _dramatically_
hurting what's a much more common operation: navigating in the file.  
For instance, if you have a linked-list of lines and the user says "go
to line 1000", you have to start at the beginning of the linked list,
and navigate through 1000 links to get to the correct line.

Quote:
>   If he needs to navigate backwards, the list should be doubly
> linked.

Oh that's good, now the index is three times as big as it should be
instead of twice as big.  What happened to your love of efficiency?

Quote:
>   By using this kind of structure, the block moves Jerry Coffin
> describes are avoided.  Insertion of lines becomes a simple matter
> of appending a line to the end of the buffer and linking it with the
> text file at the appropriate spot.  Deleting a line simply means
> disconnecting one of the lines from the linked chain.

It's true.  Now, lets see some of your vaunted benchmarks showing how
much faster it is overall.  I've done them in the past, and concluded
that the linked list idea isn't just bad, it's truly awful.  I'm not
the only one either - for instance, if you look through the book
_Software Tools in Pascal_, you'll find a discussion of how poorly
things worked when the authors attempted to do exactly what you
suggest.

--
    Later,
    Jerry.



Wed, 28 Apr 1999 03:00:00 GMT  
 How to access files with variable length records?

Quote:



> [ ... ]

> >   The person asking the question, should consider organizing the
> > text as a series of linked lists or arbitrary size.  Each list
> > should contain at least a pointer to the next line, the length of
> > the current line and the body of the stored text.

> This is unusually poor advice, even for you Scott.  First of all, it
> mandates that all the text fit into memory.  Second, it makes the
> indexing code considerably larger - typically two to three times the
> size of a simple array.  Third, it optimizes what's actually a fairliy
> unusual operation (inserting or deleting a line) while _dramatically_
> hurting what's a much more common operation: navigating in the file.  
> For instance, if you have a linked-list of lines and the user says "go
> to line 1000", you have to start at the beginning of the linked list,
> and navigate through 1000 links to get to the correct line.

> >   If he needs to navigate backwards, the list should be doubly
> > linked.

> Oh that's good, now the index is three times as big as it should be
> instead of twice as big.  What happened to your love of efficiency?

> >   By using this kind of structure, the block moves Jerry Coffin
> > describes are avoided.  Insertion of lines becomes a simple matter
> > of appending a line to the end of the buffer and linking it with the
> > text file at the appropriate spot.  Deleting a line simply means
> > disconnecting one of the lines from the linked chain.

> It's true.  Now, lets see some of your vaunted benchmarks showing how
> much faster it is overall.  I've done them in the past, and concluded
> that the linked list idea isn't just bad, it's truly awful.  I'm not
> the only one either - for instance, if you look through the book
> _Software Tools in Pascal_, you'll find a discussion of how poorly
> things worked when the authors attempted to do exactly what you
> suggest.

> --
>     Later,
>     Jerry.

Lots of criticizing and no solutions offered.....Not useful...

Build an index at the start of the file. Small, file doesn't need to be
in memory..



Thu, 29 Apr 1999 03:00:00 GMT  
 How to access files with variable length records?

Quote:

>This is unusually poor advice, even for you Scott.  First of all, it
>mandates that all the text fit into memory.  Second, it makes the

 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here you are assuming that a pointer must point to a memory location,
which is definitely not the case for what Scott was talking about.
Sometimes I get the feeling people are just flaming Scott because he's
Scott, rather than the ideas he is passing are flawed.

--

Quote:
>:-P




Sat, 01 May 1999 03:00:00 GMT  
 How to access files with variable length records?

In article <Pine.SOL.3.91.961110163043.8006C-

[ ... ]

Quote:
> Lots of criticizing and no solutions offered.....Not useful...

I already made a suggestion about a reasoanble layout of an index, and
how to put it to use.  Scott criticized it based on what he perceived
as a shortcoming, and suggested a drastically inferior alternative.  I
then simply pointed out the inferiority of the alternative he
suggested.  If you want the useful suggestions, you have only to go
back to the post he was criticizing to start with.

Quote:
> Build an index at the start of the file. Small, file doesn't need to
> be in memory..

I prefer to see the index at the end of the file, as I orignally
outlined, but either will work.  There are a couple of advantages to
putting the index at the end of the file.  First of all, you can
typically hold the entire index in memory while you're editing the
file - if you write the index at the beginning, you often have to
completely rewrite the file when you add or delete lines, since the
size of the index has changed.  Second, if you do rewrite the file
each time, you can put a control-Z between the data and the index.  
This allows many programs that operate on text files to see only the
text part of the file, and ignore the index.  If the index is at the
beginning, you get a bunch of garbage before the data you want to look
at.

--
    Later,
    Jerry.



Sat, 01 May 1999 03:00:00 GMT  
 How to access files with variable length records?


says...

Quote:

> >This is unusually poor advice, even for you Scott.  First of all,
> > it mandates that all the text fit into memory.  Second, it makes
>      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> Here you are assuming that a pointer must point to a memory
> location, which is definitely not the case for what Scott was
> talking about.

Please re-read his post - he specifically talked about the location of
the text.

Quote:
> Sometimes I get the feeling people are just flaming Scott because
> he's Scott, rather than the ideas he is passing are flawed.

It could be that sometimes, some people are.  It's inapplicable in
this case.

--
    Later,
    Jerry.



Sat, 01 May 1999 03:00:00 GMT  
 
 [ 18 post ]  Go to page: [1] [2]

 Relevant Pages 

1. How to access files with variable length records?

2. Accessing variable-length records files in C-ISAM standard

3. Getting true length of a variable length record - IBM Mainframe

4. Finding Variable-Length Record Length

5. direct access file record length

6. record length for direct access files

7. processing variable length records in text file

8. Import dos file variable record length

9. Reading variable-length records from a file

10. Sorting a file with variable length records.

11. How to read a binary file with variable length record

12. Variable length file access

 

 
Powered by phpBB® Forum Software