Need help on record separator 
Author Message
 Need help on record separator

I need to parse data from files that have multi line records.

1. I know that each line will only be 80 characters long.
2. Fields are comma separated.
3. A record starts with a 2 digit number such as 14
4. The continuation line always begins with 88 but could end with
anything
5. The only way I know the record is complete if if the first two chars
of the next line is anything other than 88.

e.g.
14,12,mhoward,123456, ...
88,overdue,....
59,... (this is a new record)

So I want to find the full record beginning with 14 and format its
output.

Can anyone help me on setting up the record separator?

TIA

--== Sent via Deja.com http://www.*-*-*.com/
---Share what you know. Learn what you don't.---



Mon, 12 Nov 2001 03:00:00 GMT  
 Need help on record separator
Try the following script (UNTESTED):

BEGIN { FS = "," }

/^88,/ { if ( fields ) print_record()

        for ( i = 2; i <= NF; i++ )
                field[ ++fields ] = $i

        next

Quote:
}

/^[0-9]+,/ { fields = 0

        for ( i = 1; i <= NF; i++ )
                field[ ++fields ] = $i

Quote:
}

{ next }

END { if ( fields ) print_record() }

function print_record( i ) {

        for ( i = 1; i <= fields; i++ )
                printf( "%s%s", i == 1 ? "" : ", ", field[ i ] )

        print ""

Quote:
}

--
Best regards,
 _ __                      _    ,   _ _ _
' )  )     /         _/_  ' )  /   ' ) ) )
 /--' ____/___/> __  /     /--/     / / / __,_  __  o _   ______
/  \_(_) /_) (__/ (_<__   /  ( o   / ' (_(_) (_/ (_<_/_)_(_) / <_

Robert H. Morrison                      Tel:   +49 721 9628 167
Software Development, Basis Team        FAX:   +49 721 9628 149



Quote:
> I need to parse data from files that have multi line records.

> 1. I know that each line will only be 80 characters long.
> 2. Fields are comma separated.
> 3. A record starts with a 2 digit number such as 14
> 4. The continuation line always begins with 88 but could end with
> anything
> 5. The only way I know the record is complete if if the first two chars
> of the next line is anything other than 88.

> e.g.
> 14,12,mhoward,123456, ...
> 88,overdue,....
> 59,... (this is a new record)

> So I want to find the full record beginning with 14 and format its
> output.

> Can anyone help me on setting up the record separator?

> TIA

> --== Sent via Deja.com http://www.deja.com/ ==--
> ---Share what you know. Learn what you don't.---



Mon, 12 Nov 2001 03:00:00 GMT  
 Need help on record separator


Quote:
> I need to parse data from files that have multi line records.

> 1. I know that each line will only be 80 characters long.
> 2. Fields are comma separated.
> 3. A record starts with a 2 digit number such as 14
> 4. The continuation line always begins with 88 but could end with
> anything
> 5. The only way I know the record is complete if if the first two
> chars of the next line is anything other than 88.

> e.g.
> 14,12,mhoward,123456, ...
> 88,overdue,....
> 59,... (this is a new record)

> So I want to find the full record beginning with 14 and format its
> output.

> Can anyone help me on setting up the record separator?

If your awk supports regular expressions in RS (like gawk) AND you
don't need the two digits at the beginning of each line, try this:

BEGIN { RS = "\n([0-79][0-9]|[0-9][0-79]),"; FS = "(\n88)?," }
{
    print
    for (i = 1; i <= NF; ++i) print i, $i
    print "----------"

Quote:
}

If you do need the first two-digit field on each line, RS and FS won't
help. You'll need to build your own fields and records from sequential
lines. The script below mimics the one above.

BEGIN { FS = "," }
NR == 1 { save = "+" $0; next }
{
  if ( save ) {
    newrecord(substr(save, 2), field)
    save = ""
  }

  do {
    if ($1 == 88) {
      addfields($0, field)
    } else {
      save = "+" $0
      break
    }
  } while (getline $0 > 0)

Quote:
}

{
  print field["count"], field[0]
  for (i = 1; i <= field["count"]; ++i) print i, field[i]
  print "----------"
Quote:
}

function newrecord(string, array  , n) {
  for (n in array) delete array[n]
  n = split(string, array, FS)
  array["count"] = n
  array[0] = string
Quote:
}

function addfields(string, array  , last, n) {
  last = array["count"]
  n = split(string, addfields_temp_array, FS)
  array["count"] += n
  array[0] = array[0] "\n" string
  while (n > 0) {
    array[last + n] = addfields_temp_array[n]
    delete addfields_temp_array[n--]
  }

Quote:
}

NOTE: if you invoke this as  awk -f script datafile  the last record
might not be processed if it doesn't have an /^88,/ continuation line.
This happens because the final line == record is stored in the save
variable, which breaks the do-loop, thus avoiding calling getline,
which would give an end of file. Awk processes the penultimate record,
then proceeds to the next record. However, it reaches the end of file,
so it exits, thus casting the final record into oblivion.

Work-around: add an extra newline to the end of the data file. Like so,
echo >> datafile (or if you're using a Microsoft command interpreter on
DOS/Windows systems, make that  echo. >> datafile).

If anyone else can figure a code solution for handling the final record
that doesn't involve writing a function to process each record that
could be called from the END pattern-action (I'm trying to preserve the
ability to use other pattern-actions after each record has been patched
together), please fix my code.

--== Sent via Deja.com http://www.deja.com/ ==--
---Share what you know. Learn what you don't.---



Mon, 12 Nov 2001 03:00:00 GMT  
 
 [ 3 post ] 

 Relevant Pages 

1. how to specify a blank line as record separator

2. Impossible Record Separator

3. Bracked-R Active File Record Separator

4. is there a record separator? RS

5. Perl-style input record separator in Python?

6. tclX: scancontext: Can I change the record separator?

7. Need help with single record report

8. Need help in copying a record....

9. Need Help Saving record Before Procedure.

10. Need Help Printing A Specific Record

11. Process Records - Do not need help

12. Process Records - Need Help

 

 
Powered by phpBB® Forum Software