memory usage 
Author Message
 memory usage

Hello awk users,
Pasted below is a snippet of a script I want to use to
examine an output file from a packet counter.  I have two
class C networks that I need to get counts for.  The output
from the counter gives source and destination IP addresses
and a byte count.  After a week this file is about 350000
lines long.  I originally wanted to process this file once a
month (1.5 million lines?) My problem is that the script
eventually consumes all the memory in my system (real 56MB
and virtual 60MB) and quits with an "unable to allocate
memory" error.  I'm using two arrays of 255 variables each.
Once these are allocated, shouldn't memory consumption
stop?  Can anyone clue me in on how to optimize this?  

for(x=1; x<=255; x++)
        {
        ip205="208.31.205."  #reset IP address prefix
        ip208="208.31.208."
        {ip205=ip205 x""}    #increment IP addresses
        {ip208=ip208 x""}
        if($2 == ip205){              #compare to input address and
                {bytes205[x] += $4}   #add to total if match
                }
        if($3 == ip205){
                {bytes205[x] += $4}
                }
        if($2 == ip208){
                {bytes208[x] += $4}
                }
        if($3 == ip208){
                {bytes208[x] += $4}
                }
        }

Sample input:
timefield   sourceIPfield  destIPfield  bytesintransaction

THanks,
Parker



Sat, 31 Mar 2001 03:00:00 GMT  
 memory usage

Quote:

<snip>

> THanks,
> Parker

Don't see any problems with the code snippet you gave, although the
excess braces {} could be confusing to newbies and the code appears
quite inefficient.

I would look for problems elsewhere in the script, or check your version
of awk - IIRC early gawk 3.0 implementations suffered from a memory
leak.



Sat, 31 Mar 2001 03:00:00 GMT  
 memory usage
Thanks for the tip, my awk is 3.02.  I'll look for a newer
version. Yes, I know there must be a way to make the
comparison run faster. I tried nesting the comparisons in
IF..ELSE..IF statements so that the second, third and fourth
IFs would not execute if the first one matched, but on a
200000 line test it actually took 1 minute 20 seconds longer
to finish!  
Quote:


> <snip>

> > THanks,
> > Parker

> Don't see any problems with the code snippet you gave, although the
> excess braces {} could be confusing to newbies and the code appears
> quite inefficient.

> I would look for problems elsewhere in the script, or check your version
> of awk - IIRC early gawk 3.0 implementations suffered from a memory
> leak.



Sat, 31 Mar 2001 03:00:00 GMT  
 memory usage

Quote:

> Thanks for the tip, my awk is 3.02.  I'll look for a newer
> version. Yes, I know there must be a way to make the
> comparison run faster. I tried nesting the comparisons in
> IF..ELSE..IF statements so that the second, third and fourth
> IFs would not execute if the first one matched, but on a
> 200000 line test it actually took 1 minute 20 seconds longer
> to finish!



> > <snip>

> > > THanks,
> > > Parker

> > Don't see any problems with the code snippet you gave, although the
> > excess braces {} could be confusing to newbies and the code appears
> > quite inefficient.

> > I would look for problems elsewhere in the script, or check your version
> > of awk - IIRC early gawk 3.0 implementations suffered from a memory
> > leak.

I suppose I should give you some tips to make your program faster:

Original code snippet:

for(x=1; x<=255; x++)
        {
        ip205="208.31.205."  #reset IP address prefix
        ip208="208.31.208."
        {ip205=ip205 x""}    #increment IP addresses
        {ip208=ip208 x""}
        if($2 == ip205){              #compare to input address and
                {bytes205[x] += $4}   #add to total if match
                }
        if($3 == ip205){
                {bytes205[x] += $4}
                }
        if($2 == ip208){
                {bytes208[x] += $4}
                }
        if($3 == ip208){
                {bytes208[x] += $4}
                }
        }

This performs 255 loops on each record, 4 tests/loop, etc.

You could take advantage of the associative array feature of awk & do
this:

bytes[$2] += $4
bytes[$3] += $4

Awk arrays can index by character strings, as well as numeric.  In this
case, you are incrementing an array element with the indices of the 2nd
& 3rd field.  You may get many extraneous indicies, but (barring a bug
in your program - which was your original post) awk can handle an
enormous associative array.

To extract the counts of interest in your END clause, you could do
something like this:

for (i in bytes)
        if (match(i,"208.31.205|8"))  print i, bytes[i];

These will come out in a random order, since the awk associate array
stores the info in a hash table, so you can just pipe the output thru
sort to get them in sorted order.

Hope this helps.



Sat, 31 Mar 2001 03:00:00 GMT  
 memory usage
Yes, thanks, I'll try that optimization.  The latest version
I could find was 3.0.3 which seems to have solved the memory
leak problem. The script was able to process 350000 lines of
my test file.

thanks again,
Parker

Quote:


<snip>


<snip snip snip>

Quote:
> > > of awk - IIRC early gawk 3.0 implementations suffered from a memory
> > > leak.

> I suppose I should give you some tips to make your program faster:

> Original code snippet:
<snip>

> This performs 255 loops on each record, 4 tests/loop, etc.

> You could take advantage of the associative array feature of awk & do
> this:

> bytes[$2] += $4
> bytes[$3] += $4

> Awk arrays can index by character strings, as well as numeric.  In this
> case, you are incrementing an array element with the indices of the 2nd
<snip>
> These will come out in a random order, since the awk associate array
> stores the info in a hash table, so you can just pipe the output thru
> sort to get them in sorted order.

> Hope this helps.



Sat, 31 Mar 2001 03:00:00 GMT  
 memory usage

Quote:

> To extract the counts of interest in your END clause, you could do
> something like this:

> for (i in bytes)
>    if (match(i,"208.31.205|8"))
>     print i, bytes[i];

A Useless Use Of Match and a flawed regular expression.

It's only necessary to use the match function when one needs the values
that it returns and sets; namely, the position in the string where the
regular expression matches (or 0 if it doesn't), RSTART, and RLENGTH.
For the more usual case of just needing to know "Does this string
match this pattern?", the match operator (~) suffices:

    for (i in bytes)
        if (i ~ /^208\.31\.20[58]/)
            print i, bytes[i]

The regular expression

    208.31.205|8

is implicitly the same as

    (208.31.205|8)

and will match all of these strings:

    8
    2089319205
    foo8bar

You probably intended to express this:

    208\.31\.20(5|8)

But this pattern will still match these strings:

    123.208.31.205
    987.208.31.208

So it's important to anchor the pattern to the beginning of the
string, like this:

    ^208\.31\.20[58]

Note that, here, I've used the character class [58] instead of the
alternate single-character patterns (5|8).

--
Jim Monty

Tempe, Arizona USA



Sat, 31 Mar 2001 03:00:00 GMT  
 memory usage
Quote:

<snip>

> A Useless Use Of Match and a flawed regular expression.

<snip>

Thank you for the clarification, but Please please please, no UUO*   -  
I ran into enough of that in comp.unix.admin.



Sat, 31 Mar 2001 03:00:00 GMT  
 memory usage

Quote:

> [...] Can anyone clue me in on how to optimize this?

    $2 ~ /^208\.31\.20[58]/ { bytes[$2] += $4 }

    $3 ~ /^208\.31\.20[58]/ { bytes[$3] += $4 }

    END { for (ip in bytes) print ip, bytes }

This is a barebones script intended to suggest a more awk-ish
("match a pattern, do an action") solution to your problem.
It uses the stuff of which most good awk script are made; namely,
regular expression pattern matching and associative arrays, both
of which have previously been suggested to you by Jim Mellander

--
Jim Monty

Tempe, Arizona USA                      AVOID CLICHES LIKE THE PLAGUE!



Sat, 31 Mar 2001 03:00:00 GMT  
 
 [ 8 post ] 

 Relevant Pages 

1. Memory usage of PHP script/arrays and memory restriction/error

2. optimizing memory usage of large in memory structures

3. memory usage (how to debug a memory leak?)

4. Memory usage under J 5.1 /386 (PC)

5. RISC vs. CISC memory usage

6. SYNCDSM and paging/memory usage monitoring

7. Reducing memory usage

8. VW2.0 & Real Memory Usage

9. awk memory usage

10. Memory usage questions

11. measuring memory usage

12. Memory usage

 

 
Powered by phpBB® Forum Software