I need to read a binary file 
Author Message
 I need to read a binary file

I need to read a binary file which contains data produced by a corpus
processor.
The file contains the words encountered in a text and the number of
occurrences of each word.
I need to write these informations in an ascii file (I can do).
The binary file is constructed in the following way:
-the word
-a string '\O'
-one byte
-the number of occurrences coded in 3 bytes
-4 bytes
and so on.

So if you could help me in anyway ...

Bernard



Wed, 18 Jun 1902 08:00:00 GMT  
 I need to read a binary file

Quote:

>I need to read a binary file which contains data produced by a corpus
>processor.
>The file contains the words encountered in a text and the number of
>occurrences of each word.
>I need to write these informations in an ascii file (I can do).
>The binary file is constructed in the following way:
>-the word
>-a string '\O'
>-one byte
>-the number of occurrences coded in 3 bytes
>-4 bytes
>and so on.

>So if you could help me in anyway ...

You don't mention which version of Turbo/Borland Pascal you plan to
use.  If it is 6.0, 7.0, or a version of TPW/BPW, then download
BLOCKIO.ZIP and supporting units UTIL and FILES that it needs from
http://users.fdn.com/~rdonais/tpascal.htm

The BLOCKIO unit implements a tBlockedFile object that provides
buffered i/o of typed and untyped files.  Since you will most likely
be reading the file in a sequential manner, you don't really need
any file buffering.  However, the advantage the unit will have over
a simple BlockRead is that you won't have to worry about a variable
being split across a block boundary.  This is done by specifying a
"record" as being the size of the largest word you would encounter
when you create the file object.  If you aren't sure, don't be
afraid to choose too large a number. The penalty for specifying too
large a size isn't that great.  The record size parameter only
causes each block to read that many extra bytes into memory.

Once you have created the object you can access any part of the file
simply by specifying the byte offset.  Probably the easiest way to
process the file would be to define a global longint that defines
the current byte position being processed.  Starting the offset at
zero and incrementing it as you recover each element.  You could
then process the entire file in 8k chunks with something like --

VAR DatF  : tpBlockedFile;
    DatPos: Longint;

...
    New(DatF, Init('FILENAME', 8192, 256, 1, ReadOnly+DenyWrite)
    While (DatPos < DatPos^.DatF.fPos) Do Begin
       GetNextRecord;
       ...
    End;
    Dispose(DatF, Done);

A function designed to read the next nul delimited "word" could be
something like ---

...

FUNCTION NextWord: String;
VAR s: String;
BEGIN
   S := StrPas(pChar(DatF^.At(DatPos)));
   Inc(DatPos, Succ(Length(s)); { step past string & nul-delimiter }
   NextWord := s;
END;

To read the three byte count stored in four bytes in little-endian
fashion

FUNCTION Occurrences: Longint;
BEGIN
    Occurrences := Longint(DatF^.At(DatPos)^) and $00FFFFFF;
    Inc(DatPos, 4);
END;

Or, to read the three byte count stored in four bytes in big-endian
fashion --

FUNCTION Occurrences: Longint;
VAT i: Longint;
BEGIN
    i := Longint(DatF^.At(DatPos)^);
    Inc(DatPos, 4);
    UTIL.Reverse(i, Sizeof(Longint);
    Occurrences := i and $00FFFFFF;
END;

That should be enough to get you started.  If you have any questions
about any of the units, or need help implementing something for an
earlier version of TP, don't hesitate to ask.

    ...red



Wed, 18 Jun 1902 08:00:00 GMT  
 
 [ 2 post ] 

 Relevant Pages 

1. Converting binary (text file) to compress binary file

2. problems reading binary file, and outputting to screen.

3. reading binary file and outputting to standard output (screen)

4. Can Excel read a Pascal binary file?

5. Reading binary files in pascal

6. Reading binary files in PASCAL ??

7. problem reading 4-byte reals from binary file

8. Need to read files from old pascal system

9. Need to read files from old pascal system

10. Need to read type 0 PCX file in TP30

11. Need source for reading animated-gif files!

12. need PASCAL routines for reading a data file in HEX

 

 
Powered by phpBB® Forum Software