help with a tricky awk problem... 
Author Message
 help with a tricky awk problem...

I need some help consolidating a very large data file with awk. The
original file contains multiple observations for each day of the year.
For example:

DATE,HOUR,DATA_1,DATA_2,DATA_3
MAR-1-1994,100,0.003,0.00,4.325
MAR-1-1994,200,0.456,1.213,5.302
MAR-1-1994,300,0.000,0.00,0.293
MAR-2-1994,100,0.214,0.002,3.499
MAR-2-1994,200,0.968,0.023,0.100
MAR-2-1994,300,0.000,0.321,0.000

etc...

I am fairly comfortable grabbing and printing column data, but I need to
perform three different tasks on these observations that go well beyond
that. My first need is to print the sum of all DATA_1 observations for
each day (so that I have a single observation for each day). The second
goal is to print the largest number for each day from DATA_2 (again,  so
that I have a single observation for each day). The third goal is to
print the average value for each day from DATA_3.

The final product for this example should look something along the lines
of:

DATE,DATA_1,DATA_2,DATA_3
MAR-1-1994,0.459,1.213,3.3067
MAR-2-1994,1.182,0.321,1.1997

Any assistance with these issues is much appreciated,

Thanks,

Kirk

--
Kirk R. Wythers
Department of Forest Resources
University of Minnesota
1530 N. Cleveland Ave.
St. Paul, MN 55108
Tel: 612.625.2261
Fax: 612 625.5212



Sat, 23 Mar 2002 03:00:00 GMT  
 help with a tricky awk problem...


Quote:

>--------------D1EEC69DF67D108161848114
>Content-Type: text/plain; charset=us-ascii
>Content-Transfer-Encoding: 7bit

Please don't post this garbage.
(Helpful stuff is down at the end)

Quote:
>I need some help consolidating a very large data file with awk. The
>original file contains multiple observations for each day of the year.
>For example:

>DATE,HOUR,DATA_1,DATA_2,DATA_3
>MAR-1-1994,100,0.003,0.00,4.325
>MAR-1-1994,200,0.456,1.213,5.302
>MAR-1-1994,300,0.000,0.00,0.293
>MAR-2-1994,100,0.214,0.002,3.499
>MAR-2-1994,200,0.968,0.023,0.100
>MAR-2-1994,300,0.000,0.321,0.000

>etc...

>I am fairly comfortable grabbing and printing column data, but I need to
>perform three different tasks on these observations that go well beyond
>that. My first need is to print the sum of all DATA_1 observations for
>each day (so that I have a single observation for each day). The second
>goal is to print the largest number for each day from DATA_2 (again,  so
>that I have a single observation for each day). The third goal is to
>print the average value for each day from DATA_3.

>The final product for this example should look something along the lines
>of:

>DATE,DATA_1,DATA_2,DATA_3
>MAR-1-1994,0.459,1.213,3.3067
>MAR-2-1994,1.182,0.321,1.1997

Assuming your file is sorted on date, and that there are no commas in the
data (i.e., comma is a "good" field delimiter) the method would be:

BEGIN {OFS=FS=","}
$1 == "DATE" { next } # Is this necessary?
$1 != old1 { p(); tot3 = max4 = tot5 = num = 0 ; old1 = $1 }
{
tot3 += $3
if ($4 > max4) max4 = $4
tot5 += $5
num++

Quote:
}

END {p()}
function p() {
        if (!old1) return
        print old1,tot3,max4,tot5/num
        }


Sat, 23 Mar 2002 03:00:00 GMT  
 help with a tricky awk problem...


% I need some help consolidating a very large data file with awk. The
% original file contains multiple observations for each day of the year.
% For example:
%
% DATE,HOUR,DATA_1,DATA_2,DATA_3
% MAR-1-1994,100,0.003,0.00,4.325
% MAR-1-1994,200,0.456,1.213,5.302
% MAR-1-1994,300,0.000,0.00,0.293
% MAR-2-1994,100,0.214,0.002,3.499
%
% etc...

[...summing data_1, taking the max of data_2, and averaging data_3...]

% The final product for this example should look something along the lines
% of:
%
% DATE,DATA_1,DATA_2,DATA_3
% MAR-1-1994,0.459,1.213,3.3067
% MAR-2-1994,1.182,0.321,1.1997

Assuming the file is sorted by date, it's easy:
 BEGIN {
    OFS = FS = ","
    date = ""
    # take care of the header line
    getline
    print $1,$3,$4,$5
 }

 # for each new date, print out and re-initialise
 $1 != date {
     print_stuff();
     date = $1
     sum = max = avg = count = 0
 }
 # for each row, do the calculations
 {
   sum += $3
   if ($4 > max) max = $4
   avg += $5
   count++
 }
 # at the end, print what's left
 END { print_stuff() }

 function print_stuff() {
    if (date != "") {
       print date,sum,max,avg/count
    }
 }

If the data isn't sorted, you could stick it in an array indexed on date,
but that will be memory intensive and you'll just want to sort the
output anyway.
--

Patrick TJ McPhee
East York  Canada



Sun, 24 Mar 2002 03:00:00 GMT  
 help with a tricky awk problem...


...

Quote:
> BEGIN {
>    OFS = FS = ","
>    date = ""
>    # take care of the header line
>    getline
>    print $1,$3,$4,$5
> }

> # for each new date, print out and re-initialise
> $1 != date {
>     print_stuff();
>     date = $1
>     sum = max = avg = count = 0
> }
> # for each row, do the calculations
> {
>   sum += $3
>   if ($4 > max) max = $4
>   avg += $5
>   count++
> }
> # at the end, print what's left
> END { print_stuff() }

> function print_stuff() {
>    if (date != "") {
>       print date,sum,max,avg/count
>    }
> }

Amazing how much this solution looks like mine - maybe newsserver delay
caused you not to see my post.

I like my variable names better, though...



Sun, 24 Mar 2002 03:00:00 GMT  
 help with a tricky awk problem...

[...]

% Amazing how much this solution looks like mine - maybe newsserver delay
% caused you not to see my post.
%
% I like my variable names better, though...

Still haven't. There are some problems with a new news server at this end...
--

Patrick TJ McPhee
East York  Canada



Mon, 25 Mar 2002 03:00:00 GMT  
 
 [ 5 post ] 

 Relevant Pages 

1. A simple but tricky problem

2. tricky problem

3. A tricky resizing problem

4. Tricky createVRMLfromString problem

5. Tricky backtracking problem.

6. A tricky buttons problem

7. A tricky problem of buttons

8. Two tricky problems in a C application using Tk.

9. tricky problem with "place" command

10. Need help on a tricky thingy.

11. Arrays in awk/awk help please!

12. Help with Awk, totally new to AWK programing

 

 
Powered by phpBB® Forum Software