Processing variable length/variable data files 
Author Message
 Processing variable length/variable data files

Inquiring on the best way to handle storage and parsing of this type of
file and data.  For example

xxxx    record identified
     xxx record length
ED01058Field....1Field........................2Field..3
FD02050Field......................1Field...2Field.....3

Each variable definition record is then parsed into its component fields
and used as needed for data analysis or printing.

Multiple datasets of each group are maintained.

I would be interested in any thoughts on this type of data analysis to
see how it compares to my thoughts.

Cliff Wiernik.



Mon, 21 Feb 2005 08:00:34 GMT  
 Processing variable length/variable data files
Hi,

On Wed, 04 Sep 2002 19:00:34 -0500, Clifford Wiernik

Quote:

>Inquiring on the best way to handle storage and parsing of this type of
>file and data.  For example

>xxxx    record identified
>     xxx record length
>ED01058Field....1Field........................2Field..3
>FD02050Field......................1Field...2Field.....3

>Each variable definition record is then parsed into its component fields
>and used as needed for data analysis or printing.

>Multiple datasets of each group are maintained.

>I would be interested in any thoughts on this type of data analysis to
>see how it compares to my thoughts.

You will get some better replies (than mine) on
comp.databases.revelation and comp.databases.pick newsgroups, where
variable length fields are common. Of course if the fields are
variable length, so are the records/rows.

However, for what it is worth, my 2 cents, .............  Rather than
specifying how long each record is, simply add a 'record marker' to
seperate records, then you simply search for the marker, and you have
the record/row. Similarly for each field, you will need a marker,
because potentially every field/column could be diff length.

Common use to do this is:

ASCII 255 (hex FF) as a record marker
ASCII 254 (hex FE) as a field marker
ASCII 253 (hex FD) as a value marker
ASCII 252 (hex FC) as a sub value marker marker

This is what RevSoft use; I don't know if doing it this way is
proprietary, I can't see how, people can structure their data the way
they see fit, can't they ?

Peter



Mon, 21 Feb 2005 08:34:48 GMT  
 Processing variable length/variable data files

Quote:
> Inquiring on the best way to handle storage and parsing of this type of
> file and data.  For example

> xxxx    record identified
>      xxx record length
> ED01058Field....1Field........................2Field..3
> FD02050Field......................1Field...2Field.....3

> Each variable definition record is then parsed into its component fields
> and used as needed for data analysis or printing.

> Multiple datasets of each group are maintained.

I had to write a parser for HL7 data (which is pretty complex) a few
years ago, and perhaps it might relate to your problem. Using OOP is a
nice way to handle it - create a base class for all of the fields and
then derive a class for each field type from that base class. Each of
those classes "know" how to handle the particular lengths, data
conversions, reporting, etc. that are required for that particular field
type, and the base class contributes functions to handle the tasks common
to all of the fields.

As you parse the data, instantiate appropriate objects for each field
encountered, and build the data model using an appropriate container
class (probably an array). I was working in C++, but I think that this
should be pretty straightforward in Clipper using Classy (which is
available on the Oasis), especially since Clipper will take care of
memory allocation and release.

Neil



Mon, 21 Feb 2005 10:24:59 GMT  
 Processing variable length/variable data files
How would I know where each field starts and stops if you had not told me?
On line one, is "ED01058"  some kind of code for field1 is 10 characters,
field 2 is about 25 characters?

If you are say that you have an app that can write field1, then field2 and
field3, then EOL - Try to turn it into a comma delimited file.

Mike


Quote:
> Inquiring on the best way to handle storage and parsing of this type of
> file and data.  For example

> xxxx    record identified
>      xxx record length
> ED01058Field....1Field........................2Field..3
> FD02050Field......................1Field...2Field.....3

> Each variable definition record is then parsed into its component fields
> and used as needed for data analysis or printing.

> Multiple datasets of each group are maintained.

> I would be interested in any thoughts on this type of data analysis to
> see how it compares to my thoughts.

> Cliff Wiernik.



Mon, 21 Feb 2005 22:40:08 GMT  
 Processing variable length/variable data files
The format is fixed by the vendor creating it.  First for characters is
line code, next 3 is length of data string.  Line code determines with
the remaining string of data is in a fixed, undelimited format.  I have
no control over the format but have definitions for each line
description.  The lines are output in a given type of sequence, but a
variable as to the number of each line.

I am looking on thoughts how to best handle this data given the fact
that dbf files are fixed format for each line.  I need to have random
access to the given group of data (it is actually output from a credit
bureau) and how to best parse the data.  Based on the line descriptor, I
have the format of the line, but it is format is not defined in the data
itself.  A manual descriptions the exact format of each line.

What is the best programming approach to this type of data.

Cliff.

Quote:

> How would I know where each field starts and stops if you had not told me?
> On line one, is "ED01058"  some kind of code for field1 is 10 characters,
> field 2 is about 25 characters?

> If you are say that you have an app that can write field1, then field2 and
> field3, then EOL - Try to turn it into a comma delimited file.

> Mike



>>Inquiring on the best way to handle storage and parsing of this type of
>>file and data.  For example

>>xxxx    record identified
>>     xxx record length
>>ED01058Field....1Field........................2Field..3
>>FD02050Field......................1Field...2Field.....3

>>Each variable definition record is then parsed into its component fields
>>and used as needed for data analysis or printing.

>>Multiple datasets of each group are maintained.

>>I would be interested in any thoughts on this type of data analysis to
>>see how it compares to my thoughts.

>>Cliff Wiernik.



Tue, 22 Feb 2005 09:28:18 GMT  
 Processing variable length/variable data files
Thanks.  Cliff
Quote:


>>Inquiring on the best way to handle storage and parsing of this type of
>>file and data.  For example

>>xxxx    record identified
>>     xxx record length
>>ED01058Field....1Field........................2Field..3
>>FD02050Field......................1Field...2Field.....3

>>Each variable definition record is then parsed into its component fields
>>and used as needed for data analysis or printing.

>>Multiple datasets of each group are maintained.

> I had to write a parser for HL7 data (which is pretty complex) a few
> years ago, and perhaps it might relate to your problem. Using OOP is a
> nice way to handle it - create a base class for all of the fields and
> then derive a class for each field type from that base class. Each of
> those classes "know" how to handle the particular lengths, data
> conversions, reporting, etc. that are required for that particular field
> type, and the base class contributes functions to handle the tasks common
> to all of the fields.

> As you parse the data, instantiate appropriate objects for each field
> encountered, and build the data model using an appropriate container
> class (probably an array). I was working in C++, but I think that this
> should be pretty straightforward in Clipper using Classy (which is
> available on the Oasis), especially since Clipper will take care of
> memory allocation and release.

> Neil



Tue, 22 Feb 2005 09:29:27 GMT  
 Processing variable length/variable data files

Quote:
>> I need to have random
>>access to the given group of data (it is actually output from a credit
>>bureau) and how to best parse the data.

Read it all in one go and then store in a DBF or array

If it changes frequently create an index dbf with the record starts in it.



Tue, 22 Feb 2005 18:18:09 GMT  
 Processing variable length/variable data files
Clifford,

If each line contain a line definition, I would use a program to convert it
to a fixed length record.  Then I would use the append SDF to get it into a
dbf.

The program I first used for PC "flat files" was Turbo Pascal (1985).  If it
was me, I would bring one of those back and modify it to do the job.  C
would be just as effective, more so if you have example files you have
built.  In Pascal, I can read bytes at a time into fields - as opposed to
entire record reads.

These TP programs are very fast and the only problem could be the size of
the output file.  When you go to a fixed record, you have to define the
largest size a field could be.  If it is 254 bytes for largest, that is what
it has to be even if most are 3 characters.

I have seen similar work done in pure Clipper using the low level file IO
functions.



Wed, 23 Feb 2005 00:45:02 GMT  
 Processing variable length/variable data files
That is essentially what we are doing with the text report version of
it.  Load it into 78 character records.  However, since it is a text
file we currently parse the current format.  If they change the layout
somewhat, are parsing, even though trying to adopt their approach, could
require revision.  Some records are blank.

In the actual data stream, max record currently is 288 characters, and
some are 10.  We could devise a scheme to say use 80-100 characters and
work with continuations.

Best I can determine to use for parsing is substr for extraction.

Data comes in one big data string, say 4000-20000 characters, which is
easy to parse into individual variable length records.  Then we need to
parse the variable length records into their component fields.

Thanks for the suggestions.

Quote:

> Clifford,

> If each line contain a line definition, I would use a program to convert it
> to a fixed length record.  Then I would use the append SDF to get it into a
> dbf.

> The program I first used for PC "flat files" was Turbo Pascal (1985).  If it
> was me, I would bring one of those back and modify it to do the job.  C
> would be just as effective, more so if you have example files you have
> built.  In Pascal, I can read bytes at a time into fields - as opposed to
> entire record reads.

> These TP programs are very fast and the only problem could be the size of
> the output file.  When you go to a fixed record, you have to define the
> largest size a field could be.  If it is 254 bytes for largest, that is what
> it has to be even if most are 3 characters.

> I have seen similar work done in pure Clipper using the low level file IO
> functions.



Wed, 23 Feb 2005 08:45:37 GMT  
 Processing variable length/variable data files
Clifford

Are there the same no. of fields in each record??
Do you actually need to break the record up for storage or only for
reporting??
Will you be creating indexes on the field data at all??

An approach might be to have a look up table of the record types that
returns the field lengths in an array.
Eg
aTypes := { { 'ED01', 58, {1,5,10,12,3,23 }, { 'FD20', 50, { 10,
5,10,11,4,29 } }

Then pass the extracted line into a function like below

function ConvertIt( cData )

local cCode, nIndex, aSize, anArray

cCode := Left( cData, 4 )
nIndex := aScan( aTypes,,, {|x| x[1] == cCode }
aSizes := aTypes[ nIndex ][3]

anArray := { cCode }

cData := substr( cData, 8 )    // Stip code & length
aadd( aSizes, aTypes[nIndex][2] )
nlen := len(aSizes) - 1
for i := 1 to nLen
    aadd(  anArray, substr( cData, 1, aSizes[i] )
    cData := substr( cData, aSizes[i] + 1 )
next

return anArray

Then do whatever with the returned array

--
HTH
Steve Quinn
http://www.tuxedo.org/~esr/faqs/smart-questions.html
'I want to move to Theory...Everything works in Theory'



Wed, 23 Feb 2005 09:56:57 GMT  
 Processing variable length/variable data files

Quote:

> Clifford

> Are there the same no. of fields in each record??

No, the number of fields in each record depends on the record type,
60-70 different record types with 1-60-70 fields within.

Quote:
> Do you actually need to break the record up for storage or only for
> reporting??

Need to break up for analysis, reporting, potentially storage(so that
easier to use for analysis and reporting and to control wasted space as
data strings are variable (short to long).

Quote:
> Will you be creating indexes on the field data at all??

Index on field data is only by data string(i.e. one report for one
prospective borrower.  One code for all data withing string.)  Then a
code to keep individual data records in order when parsed out as the
order of data records within total string has significance.

Strings could be saved to memo files but I am concerned about
reliability as I have had bad experiences in the past with windows
aborting and killing the application which can lead to loss of data
because the memo file was no longer accessible correctly.

Quote:

> An approach might be to have a look up table of the record types that
> returns the field lengths in an array.
> Eg

This is essentially what my thoughts have been.  No other way.

Thanks a lot.

Cliff.



Wed, 23 Feb 2005 20:49:22 GMT  
 
 [ 11 post ] 

 Relevant Pages 

1. processing variable length records in text file

2. Processing Variable Length Files

3. Variable length variable

4. Char variable of variable length

5. NEED HELP appending two Char variables of variable length

6. Variable Length Data Types

7. Variable length raw-byte data

8. Attempting to read variable length records as raw data

9. Returning variable length data from C extensions

10. Vstruct module: unpack heterogenous, variable-length binary data

11. Getting true length of a variable length record - IBM Mainframe

12. Finding Variable-Length Record Length

 

 
Powered by phpBB® Forum Software