file parsing challenge 
Author Message
 file parsing challenge

Greetings all,
I have an interesting problem. I have a file that has some irregular
output. I must change this data into a simple comma delimited list.
Below is an example list of the output and an example of the proposed
results that I am looking for. I have a feeling that this challenge
might require both sed and awk or perhaps another language like perl.
I am at a bit of a loss here, does anyone have any ideas?
Data:
1,10-IDFdk1/01 23
2,5-IDFdk1/04 20 3-IDFdk1/02a 11 /04 11 2-IDFdk1/04 14
3,5-IDFdk1/03 18 /04 18 5-IDFdk1/02a 14
4,6-IDFdk1/01 14 /02 14 4-IDFdk1/03 12 /04 12
5,10-IDFdk1/01 18 /05 18
6,7-IDFdk1/01 20 3-IDFdk1/01 17
Explanation:
- Each result has a zone value a ss value and a score. For record 1:
zone=IDFdk1, ss=01, score = 23
- Each record can have up to three different valid results marked by
deciles (record 6 has 2 deciles whereas record 1 has 1 and record 2 has
3)
- The numbers before the dashes indicate a decile that for a row must
add to 10. In the first line the decile values are 7 and 3, in the last
row the decile value is 10.
- Unfortunately some of the records contain a tie for one or more of the
deciles (records 2 to 5). I need to separate these records and put them
on there own row so I can process them separately. Notice in the
results that the second field (after the first comma) contains either a
1 or a 2. The 2 is an indicator that the record was a tie.
Desired Results:
1
10,1,IDFdk1,01,23
2
5,1,IDFdk1,04,20
3,1,IDFdk1,02a,11
3,2,IDFdk1,04,11
2,1,IDFdk1,04,14
3
5,1,IDFdk1,03,18
5,2,IDFdk1,04,18
5,1,IDFdk1,02a,14
4
6,1,IDFdk1,01,14
6,2,IDFdk1,02,14
4,1,IDFdk1,03,12
4,2,IDFdk1,04,12
5
10,1,IDFdk1,01,18
10,2,IDFdk1,05,18
6
7,1,IDFdk1,01,20
3,1,IDFdk1,01,17
Thanks for any insight you may have into this challenge,
RZ

Sent via Deja.com http://www.*-*-*.com/
Before you buy.



Fri, 13 Sep 2002 03:00:00 GMT  
 file parsing challenge

<snip>

Quote:
>Data:
>1,10-IDFdk1/01 23
>2,5-IDFdk1/04 20 3-IDFdk1/02a 11 /04 11 2-IDFdk1/04 14
>3,5-IDFdk1/03 18 /04 18 5-IDFdk1/02a 14
>4,6-IDFdk1/01 14 /02 14 4-IDFdk1/03 12 /04 12
>5,10-IDFdk1/01 18 /05 18
>6,7-IDFdk1/01 20 3-IDFdk1/01 17
>Explanation:

<snip>

Quote:
> Desired Results:
> 1
> 10,1,IDFdk1,01,23
> 2
> 5,1,IDFdk1,04,20
> 3,1,IDFdk1,02a,11
> 3,2,IDFdk1,04,11
> 2,1,IDFdk1,04,14

<snip>

Each input line is a set of hierarchical records, possibly degenerate (a
single record for the line). In this case, parsing parts of each line
multiple times is necessary. The following awk script worked for me with
your sample input data. No doubt it could be done in a single
indecypherable perl statement with equally indecypherable command line
options.

BEGIN { FS = OFS = "," }
{
        print $1
        m = split($2, f, "[ /]+")
        for (i = 1; i <= m; i += 2) {
                if ((p = index(f[i], "-")) > 0) {
                        n = 1
                        x = substr(f[i], 1, p - 1)
                        y = substr(f[i], p + 1)
                        ++i
                } else {
                        ++n
                }
                print x, n, y, f[i], f[i + 1]
        }
        for (i in f) delete f[i]

Quote:
}

Sent via Deja.com http://www.deja.com/
Before you buy.


Fri, 13 Sep 2002 03:00:00 GMT  
 
 [ 2 post ] 

 Relevant Pages 

1. file parsing challenge

2. Makefile parsing (challenge)

3. file parse into hash and writing output with certain format to new file

4. Parsing txt file for make xml file

5. parsing large DNA files into smaller files

6. Help on Perl parsing file to locate and replace Include files in CGI app

7. Parsing binary files

8. Perl library for parsing AFM files?

9. help with parsing files

10. parsing binary files

11. How to parse a csv-file with optionally enclosed fields

12. WANTED: perl code to parse stanza files

 

 
Powered by phpBB® Forum Software