Parsing very long Strings 
Author Message
 Parsing very long Strings

Hello,

I have a question about parsing a very long string into about 30 fields
delimited by a semicolon.  This UNIX flat file lists each record on
a separate line with all 28 fields concatenated after each other. I have

the format specifications and know which character positions belong to
which field. I need to come out with a file that prints each field
separated by a semicolon delimiter. I attempted to use the code shown
below;
however only the first six fields are displayed along with the first
three
characters of the seventh field. I have also tried using the 'length

Quote:
> 72' option before the BEGIN statement, however I get bizarre output.

Any suggestions would be appreciated.

Also, If there is an easier method using sed or another scripting tool,
then that
would be fine also. Thanks.

awk 'BEGIN {OFS = ";" }
       { if (substr($2,1,1)=="A") {
           print substr($2, 1, 1), substr($2,2,6), substr($2,8,10),
substr($2,18,8),
           substr($2,26,1), substr($2,27,15), substr($2,42,8),
substr($2,50,3), substr($2,53,50),
           substr($2,103,120), substr($2,223,16), substr($2,239,16),
substr($2,255,5),
           substr($2,260,25), substr($2,285,3), substr($2,288,5),
substr($2,293,16),
           substr($2,309,16), substr($2,325,16), substr($2,341,10),
substr($2,351,15),
           substr($2,366,5), substr($2,371,3), substr($2,374,10),
substr($2,384,16),
           substr($2,400,16), substr($2,416,16)}}' text

Terry Dunn



Sun, 15 Oct 2000 03:00:00 GMT  
 Parsing very long Strings

Terry,

I created a single long string using the offsets indicated by your
substr function
calls and passed it to your exact program/script (cut and pasted from
the posting).
It failed completely with my "awk" (becaue it is a link to oawk - old
awk) but ran
fine with both "nawk" and "gawk".  The input line I made had a field 1
of "abc"
folowed by a space and a single string of lower case letter o, with
upper case alpha
and numeric characteres at each position (short sample):

  abc ABoooooCoooooooooD....

I did it this way because your substr specifies $2 as the string to work
on.  The result
is as expected with each output string starting with the UC alpha or
numeric character.
But this only gives 27 fields (?) output by the script, so is the $1
being counted too?

My question would be that you had indicated a source input with no
delimiters and 28
"fields" concatenated, but then in the script you are using substr on
$2.  If you have
only two strings, separated by white space, this should work, but if
there are ANY other
"spaces/tabs" in the input lines, awk will break these into more fields,
so maybe your
$2 in not the entire line??  Perhaps using $0 which is awk's way of
naming the entire
input line would work?

---
Bob McGowan
i'm:  bob dot mcgowan at artecon dot com

Quote:
-----Original Message-----

Posted At: Wednesday, April 29, 1998 10:13 AM
Posted To: awk
Conversation: Parsing very long Strings
Subject: Parsing very long Strings

Hello,

I have a question about parsing a very long string into about 30 fields
delimited by a semicolon.  This UNIX flat file lists each record on
a separate line with all 28 fields concatenated after each other. I have

the format specifications and know which character positions belong to
which field. I need to come out with a file that prints each field
separated by a semicolon delimiter. I attempted to use the code shown
below;
however only the first six fields are displayed along with the first
three
characters of the seventh field. I have also tried using the 'length
> 72' option before the BEGIN statement, however I get bizarre output.
Any suggestions would be appreciated.

Also, If there is an easier method using sed or another scripting tool,
then that
would be fine also. Thanks.

awk 'BEGIN {OFS = ";" }
       { if (substr($2,1,1)=="A") {
           print substr($2, 1, 1), substr($2,2,6), substr($2,8,10),
substr($2,18,8),
           substr($2,26,1), substr($2,27,15), substr($2,42,8),
substr($2,50,3), substr($2,53,50),
           substr($2,103,120), substr($2,223,16), substr($2,239,16),
substr($2,255,5),
           substr($2,260,25), substr($2,285,3), substr($2,288,5),
substr($2,293,16),
           substr($2,309,16), substr($2,325,16), substr($2,341,10),
substr($2,351,15),
           substr($2,366,5), substr($2,371,3), substr($2,374,10),
substr($2,384,16),
           substr($2,400,16), substr($2,416,16)}}' text

Terry Dunn



Sun, 15 Oct 2000 03:00:00 GMT  
 Parsing very long Strings

: Hello,

: I have a question about parsing a very long string into about 30 fields
: delimited by a semicolon.  This UNIX flat file lists each record on

Have you tried using gawk, rather than awk, may help.

--
Ian Stirling.   Designing a linux PDA, see  http://www.mauve.demon.co.uk/
----- ******* If replying by email, check notices in header ******* -----
He who lives in a glass house should not invite he who is without sin.



Mon, 16 Oct 2000 03:00:00 GMT  
 
 [ 3 post ] 

 Relevant Pages 

1. Parsing a long string

2. long lines, long string

3. regexp/regsub operates veeery long on long strings sometimes

4. parsing string for string

5. MIT scheme: string->list on long strings

6. MySQLdb and strings with quotes / long strings

7. Parsing K in K (long, final version)

8. Parsing K in K (long)

9. Parsing K in K (long, corrected version)

10. Parsing Logs...Please advise (long post)

11. Help me parse a long character

12. Help me parse a long character

 

 
Powered by phpBB® Forum Software