Summarize Columnar Data 
Author Message
 Summarize Columnar Data

As a struggling awk novice I have been unable to fully
grasp how to use arrays.  I have a "simple" problem that
hopefully has a simple solution.

I often work with large data files that typically contain
10,000 to 200,000 records consisting of a list of "items"
and their associated properties, such as the length of each
"item".  In these large data files there may be 10 to 100
unique "items" that are sprinkled throughout the file.
I would like to read and process these files to create a
summary list that contains the following information:

(1)  a numerical count of each "unique" item
(2)  the total number of all "unique" items
(3)  the total length of each "unique" item
(4)  the total length of all items

This is my current awk script where the "item" is in
Column 5 and the length is in Column 4:

  {     items[$5]++
   if ($4 >= 0 )
       sum += $4

Quote:
}

END  {
     print "--------------------------------------"
     print "INPUT FILENAME =      ",    FILENAME
     print "--------------------------------------"
     print " "
     print "  Item           Number          Item "
     print "  Name           Samples        Length"
     print "======================================"
#
    for ( each_unit in items )
#
    printf "%-10s %10.0f %15.2f\n", each_unit, items[each_unit], length
| "sort"
    print "======================================"
    printf "%5s %15.0f%16.2f\n","Total", NR, sum

Quote:
}

My current awk script does everything but tabulate the length
of each unique item.  Right now the variable named length in
my printf statement simply fills in a 0.00 value in my table because I
do not know how to accumulate the "length" of each item.

I would certainly appreciate any help that I can get.

Sincerely,


Sent via Deja.com http://www.*-*-*.com/
Before you buy.



Sat, 03 Aug 2002 03:00:00 GMT  
 Summarize Columnar Data
% As a struggling awk novice I have been unable to fully
% grasp how to use arrays.  I have a "simple" problem that
% hopefully has a simple solution.

[...]

%   {     items[$5]++
%    if ($4 >= 0 )
%        sum += $4
% }
% END  {
[...]

%     for ( each_unit in items )
% #
%     printf "%-10s %10.0f %15.2f\n", each_unit, items[each_unit], length
% | "sort"

[aside -- you need to close this sort for it to print where you want it to
   close("sort")

% My current awk script does everything but tabulate the length
% of each unique item.  Right now the variable named length in
% my printf statement simply fills in a 0.00 value in my table because I
% do not know how to accumulate the "length" of each item.

I don't know if I understand the problem, but the simple solution is to
use a second array for the length.
--

Patrick TJ McPhee
East York  Canada



Sun, 04 Aug 2002 03:00:00 GMT  
 
 [ 2 post ] 

 Relevant Pages 

1. Digitalk RDBI: How to retrieve non-columnar data

2. GVIM COBOL syntax hiliting columnar?

3. Columnar List

4. Transpose and summarize

5. thread-unsafe stuff: could you summarize?

6. $_GET variable variables, dynamic parsing (I'm not sure how to summarize this question)

7. Summarize array with respect to values in another...

8. Summarizing memory use

9. data validation and data format

10. Sorting multiple-line data to single line data

11. Extract data from radius's raw data

12. data validation and data format

 

 
Powered by phpBB® Forum Software