My first awk program 
Author Message
 My first awk program

Hi

I'm trying to get the hang of awk, I always seem to find myself having
to do some text manipulation tasks, usually with my favourite tool
(fortran) which is not really suited to that task.

So I set myself the task of sorting process on memory use, nicely
tabulating it as I go.  There are probably many better ways of doing
the same thing, but it's a non-trivial task.  I'm supposing I start
with the output from "ps ax -opid,vsz,rssize,comm " on my Digital Unix
box.  I installed GNU awk along the way (i.e I did read the FAQ!).

% ps ax -opid,vsz,rssize,comm | tail -5
 21860 1.70M 104K man
 22018 2.09M 152K sh
 23202 2.83M 1016 tcsh
 25416 1.80M 248K ps
 25594 1.62M 144K tail

Not really suited to a numeric sort on field 2 is it?  So I

% cat /tmp/awk_script
{i1=gsub("M", "" , $2)}
i1 > 0 {$2 = $2 * 1024 * 1024}
{i1=gsub("K", "" , $2)}
i1 > 0 {$2 = $2 * 1024}
{i1=gsub("M", "" , $3)}
i1 > 0 {$3 = $3 * 1024 * 1024}
{i1=gsub("K", "" , $3)}
i1 > 0 {$3 = $3 * 1024}
{printf ("%25s %12.0f %12.0f %12s\n",$4,$2,$3,$1)}

and now I use:

% ps ax -opid,vsz,rssize,comm | sed 1d | gawk -f ./awk_script \
| sort -n +2 | tail
                  rlogind      1709179       196608        22961
               xdaliclock      3795845       229376        18550
                       ps      1887437       253952        25657
                     more      2338324       319488        20348
                   xbuffy      4708106       368640        14811
                 kloadsrv       942080       458752            3
                     nsrd      3166700       565248          448
                     ctwm      4697620       819200        18052
                   kernel     75078042      4089446            0
                     Xdec     39426458      5976883          482

How did I do?  Basically I just change <xxx>K to xxx*1024 and <yyy>M
to yyy*1024*1024.  I succeeded, but maybe there is a much "neater"
way?  Without the need for the variable i?  Anone care to better my
effort?

Cheers
Kev



Thu, 01 Mar 2001 03:00:00 GMT  
 My first awk program

Quote:

>Hi

>I'm trying to get the hang of awk, I always seem to find myself having
>to do some text manipulation tasks, usually with my favourite tool
>(Fortran) which is not really suited to that task.

>So I set myself the task of sorting process on memory use, nicely
>tabulating it as I go.  There are probably many better ways of doing
>the same thing, but it's a non-trivial task.  I'm supposing I start
>with the output from "ps ax -opid,vsz,rssize,comm " on my Digital Unix
>box.  I installed GNU awk along the way (i.e I did read the FAQ!).

>% ps ax -opid,vsz,rssize,comm | tail -5
> 21860 1.70M 104K man
> 22018 2.09M 152K sh
> 23202 2.83M 1016 tcsh
> 25416 1.80M 248K ps
> 25594 1.62M 144K tail

>Not really suited to a numeric sort on field 2 is it?  So I

>% cat /tmp/awk_script
>{i1=gsub("M", "" , $2)}
>i1 > 0 {$2 = $2 * 1024 * 1024}
>{i1=gsub("K", "" , $2)}
>i1 > 0 {$2 = $2 * 1024}
>{i1=gsub("M", "" , $3)}
>i1 > 0 {$3 = $3 * 1024 * 1024}
>{i1=gsub("K", "" , $3)}
>i1 > 0 {$3 = $3 * 1024}
>{printf ("%25s %12.0f %12.0f %12s\n",$4,$2,$3,$1)}

>and now I use:

>% ps ax -opid,vsz,rssize,comm | sed 1d | gawk -f ./awk_script \
>| sort -n +2 | tail
>                  rlogind      1709179       196608        22961
>               xdaliclock      3795845       229376        18550
>                       ps      1887437       253952        25657
>                     more      2338324       319488        20348
>                   xbuffy      4708106       368640        14811
>                 kloadsrv       942080       458752            3
>                     nsrd      3166700       565248          448
>                     ctwm      4697620       819200        18052
>                   kernel     75078042      4089446            0
>                     Xdec     39426458      5976883          482

>How did I do?  Basically I just change <xxx>K to xxx*1024 and <yyy>M
>to yyy*1024*1024.  I succeeded, but maybe there is a much "neater"
>way?  Without the need for the variable i?  Anone care to better my
>effort?

Not bad for a first effort.

here's what I might have written for my awk script:

BEGIN{k=1024;m=k*k}
$2 ~ "K" {sub(/K/,"",$2);$2*=k}
$2 ~ "M" {sub(/M/,"",$2);$2*=m}
$3 ~ "K" {sub(/K/,"",$3);$3*=k}
$3 ~ "M" {sub(/M/,"",$3);$3*=m}
{printf "%25s %12d %12i %12s\n",$4,$2,$3,$1}

Figuring out the differences for yourself will be instructive for you.

Chuck Demas
Needham, Mass.

--
  Eat Healthy    |   _ _   | Nothing would be done at all,

  Die Anyway     |    v    | That no one could find fault with it.



Thu, 01 Mar 2001 03:00:00 GMT  
 My first awk program
: Hi

: I'm trying to get the hang of awk, I always seem to find myself having
: to do some text manipulation tasks, usually with my favourite tool
: (Fortran) which is not really suited to that task.

: So I set myself the task of sorting process on memory use, nicely
: tabulating it as I go.  There are probably many better ways of doing
: the same thing, but it's a non-trivial task.  I'm supposing I start
: with the output from "ps ax -opid,vsz,rssize,comm " on my Digital Unix
: box.  I installed GNU awk along the way (i.e I did read the FAQ!).

: % ps ax -opid,vsz,rssize,comm | tail -5
:  21860 1.70M 104K man
:  22018 2.09M 152K sh
:  23202 2.83M 1016 tcsh
:  25416 1.80M 248K ps
:  25594 1.62M 144K tail

: Not really suited to a numeric sort on field 2 is it?  So I

What about

awk 'BEGIN{RS="[ \n]"}
/M$/{$0=substr($0,1,length-1)*1048960}
/K$/{$0=substr($0,1,length-1)*1024}
//'

Or
gawk '{
gensub("\([0-9]\)*\.\([0-9]*\)M","\1\20000","g",$0)
gensub("\([0-9]\)*K","\1000","g",$0)
print

Quote:
}'

Note, with gawk at least, with the first, you can use
....
RT=="\n"{eol=1}
RT==" "{eol=0}
eol{print;next}
!eol&&/fish/{printf "Fish!"}
!eol{printf " "$0" "}

Which can occasionally work out more consicely than endless for loops,
iterating over $0.

--
See http://www.mauve.demon.co.uk/    |Linux PDA, cheap electronics/PC bits sale.
See_header,_for_UCE_policy___________|_____________________________Ian_Stirling.
Among a man's many good possessions, A good command of speech has no equal.



Thu, 01 Mar 2001 03:00:00 GMT  
 My first awk program

Quote:

> {i1=gsub("M", "" , $2)}
> i1 > 0 {$2 = $2 * 1024 * 1024}
> ...

> BEGIN{k=1024;m=k*k}
> $2 ~ "K" {sub(/K/,"",$2);$2*=k}
> $2 ~ "M" {sub(/M/,"",$2);$2*=m}
> ...

> awk 'BEGIN{RS="[ \n]"}
> /M$/{$0=substr($0,1,length-1)*1048960 [sic]}
> /K$/{$0=substr($0,1,length-1)*1024}
> ...

There's no need to explicitly remove the "K" and "M" suffixes from
the strings that represent the rssize (rsz) and vsize (vsz) values.
Here's why:

     The numeric value of a string is the value of the longest
     prefix of the string that looks numeric. (p. 45)

     The numeric value of an arbitrary string is the numeric value
     of its numeric prefix. (p. 192)

     (From _The AWK Programming Language_, ISBN 0-201-07981-X)

Quote:

> How did I do? ... I succeeded, but maybe there is a much "neater"
> way? Without the need for the variable i? Anone care to better my
> effort?

I typically reduce problems such as this one to, in essence, a single
print statement. The relationship of input line to output line is
one-to-one, and awk automagically takes care of the input tasks
for you (opening files and streams, reading and parsing lines of
text, etc.) Intuitively, for as simple a data transformation as this,
no explicit variable assignments should be necessary. Here's a simple
solution based on the input and "first awk script" that Kevin posted:

$ cat ps.awk
{ printf("%25s %12d %12d %12s\n", $4, ByteSize($2), ByteSize($3), $1) }

function ByteSize(n) {
    if (n ~ /K$/)
        return int(((n * 1024) + 500) / 1000) * 1000
    if (n ~ /M$/)
        return int(((n * 1048576) + 5000) / 10000) * 10000
    return n + 0

Quote:
}

$ cat ps.txt
 21860 1.70M 104K man
 22018 2.09M 152K sh
 23202 2.83M 1016 tcsh
 25416 1.80M 248K ps
 25594 1.62M 144K tail
$ awk -f ps.awk ps.txt | sort -k 2n
                     tail      1700000       147000        25594
                      man      1780000       106000        21860
                       ps      1890000       254000        25416
                       sh      2190000       156000        22018
                     tcsh      2970000         1016        23202
$

The user-defined function ByteSize() is full of magic numbers, but,
for such a simple script, that's OK. (Again, no unnecesary variable
assignments.) Also, I round the integers to maintain the proper
precision. Notice that I perform arithmetic with strings such as
"1.70M" and "144K". Ain't awk cool!

I recommend, Kevin, that you peruse the man page ps(1). I'm not a
Unix system administrator and I don't use ps often, but a quick
check of ps(1) on my local system revealed many options for
controlling the content and format of its output, including this
one:

     -m      Sort by memory usage, instead of by process ID.

Good luck.

--
Jim Monty

Tempe, Arizona USA



Mon, 05 Mar 2001 03:00:00 GMT  
 My first awk program


Quote:

>> {i1=gsub("M", "" , $2)}
>> i1 > 0 {$2 = $2 * 1024 * 1024}
>> ...


>> BEGIN{k=1024;m=k*k}
>> $2 ~ "K" {sub(/K/,"",$2);$2*=k}
>> $2 ~ "M" {sub(/M/,"",$2);$2*=m}
>> ...


>> awk 'BEGIN{RS="[ \n]"}
>> /M$/{$0=substr($0,1,length-1)*1048960 [sic]}
>> /K$/{$0=substr($0,1,length-1)*1024}
>> ...

>There's no need to explicitly remove the "K" and "M" suffixes from
>the strings that represent the rssize (rsz) and vsize (vsz) values.
>Here's why:

>     The numeric value of a string is the value of the longest
>     prefix of the string that looks numeric. (p. 45)

>     The numeric value of an arbitrary string is the numeric value
>     of its numeric prefix. (p. 192)

>     (From _The AWK Programming Language_, ISBN 0-201-07981-X)

Wow!!! You're right.  Thanks!!!  Maybe I should buy/read the book.

BEGIN{k=1024;m=k*k}
$2 ~ "K" {$2*=k}
$2 ~ "M" {$2*=m}
$3 ~ "K" {$3*=k}
$3 ~ "M" {$3*=m}
{printf "%25s %12d %12i %12s\n",$4,$2,$3,$1}

on

21860 1.70M 104K man
22018 2.09M 152K sh
23202 2.83M 1016 tcsh
25416 1.80M 248K ps
25594 1.62M 144K tail

produced:

                      man      1782579       106496        21860
                       sh      2191523       155648        22018
                     tcsh      2967470         1016        23202
                       ps      1887436       253952        25416
                     tail      1698693       147456        25594

Chuck Demas
Needham, Mass.

--
  Eat Healthy    |   _ _   | Nothing would be done at all,

  Die Anyway     |    v    | That no one could find fault with it.



Mon, 05 Mar 2001 03:00:00 GMT  
 My first awk program

Quote:

> Wow!!! You're right. Thanks!!! Maybe I should buy/read the book.

Uh, yeah! ;-)

Here's my "Customer Review" of _The AWK Programming Language_ on Amazon.com:

   <http://www.amazon.com/exec/obidos/ASIN/020107981X/qid%3D906081469/
   002-2148869-6346855>

   Customer Comments

   Average Customer Review: 5 out of 5 stars
   Number of Reviews: 5


   5 out of 5 stars

   The book that defines the language.

   Chapter 2 of The AWK Programming Language serves as the formal
   definition of the language, written by its creators. There is no other
   published standard or official document. So this book is the primary,
   definitive resource for all issues related to the awk programming
   language in general. Other awk resources (e.g., the GNU awk manual)
   should be regarded as secondary and treated as subordinate to The AWK
   Programming Language. It is especially important for contributors to
   the Usenet newsgroup comp.lang.awk to bear this in mind when posting
   general, version-independent information about awk to that newsgroup.

   Owning this book is a prerequisite to becoming an expert awk
   programmer. Though there are other resources available for learning
   awk, both in print and on the Internet, none of them are as succinct
   and as straighforward as The AWK Programming Language.

All comments/criticisms are welcome.

--
Jim Monty

Tempe, Arizona USA



Tue, 06 Mar 2001 03:00:00 GMT  
 My first awk program

Quote:

>    Chapter 2 of The AWK Programming Language serves as the formal
>    definition of the language, written by its creators. There is no other
>    published standard or official document. So this book is the primary,

No other published standard ? Arnold Robbins worked on the POSIX
standard for AWK. This is a published standard. Have a look at

  http://www.rdg.opengroup.org/unix/online.html

Quote:
>    Owning this book is a prerequisite to becoming an expert awk
>    programmer.

Reading it is also a prerequisite to becoming an expert awk programmer.
Understand it is an even better way to becoming an expert awk programmer.

+---------------------------------------------------------------------+
| Juergen Kahrs,       STN Atlas Elektronik GmbH,   D-28305 Bremen    |
| Simulation Division  Sebaldsbruecker Heerstr. 235 +49/421/457-2819  |
+----------- http://home.t-online.de/home/Juergen.Kahrs/ -------------+



Tue, 06 Mar 2001 03:00:00 GMT  
 My first awk program

Quote:


> > Chapter 2 of The AWK Programming Language serves as the formal
> > definition of the language, written by its creators. There is no other
> > published standard or official document.

> No other published standard? Arnold Robbins worked on the POSIX
> standard for AWK. This is a published standard. Have a look at

>   http://www.rdg.opengroup.org/unix/online.html

I stand corrected. And thank you for the pointer to the The Single UNIX
Specification, Version 2.

I do feel that _The AWK Programming Language_ is still, despite the
existence of the POSIX specification, the _de facto_ standard for the
language. Am I wrong? Given the current state of the language and its
many versions and flavors, is it more practical to regard the book or
the POSIX specification as the definitive guide to what awk is and how
it ought to work? How closely does the POSIX standard conform to what
is described in _The AWK Programming Language_?

Quote:
> > Owning this book is a prerequisite to becoming an expert awk
> > programmer.

> Reading it is also a prerequisite to becoming an expert awk programmer.
> Understand it is an even better way to becoming an expert awk programmer.

Good points. By suggesting that one should OWN the book, I was trying
to emphasize the importance of having _The AWK Programming Language_
handy as a reference when writing awk scripts and when answering
inquiries posted to comp.lang.awk. To put it bluntly, I think those
who post expert answers to comp.lang.awk should own TAPL, read TAPL,
and refer to TAPL before posting. But that's just my opinion.

--
Jim Monty

Tempe, Arizona USA



Tue, 06 Mar 2001 03:00:00 GMT  
 My first awk program

Quote:

> I do feel that _The AWK Programming Language_ is still, despite the
> existence of the POSIX specification, the _de facto_ standard for the
> language. Am I wrong? Given the current state of the language and its
> many versions and flavors, is it more practical to regard the book or
> the POSIX specification as the definitive guide to what awk is and how
> it ought to work?

I am not so happy with the POSIX spec. Just 3 points I noticed:

  - Arnold Robbins mentions a strange way numeric strings are
    compared in POSIX (see his Ref. Card). I did not understand
    the problem, but he says that POSIX is wrong there and none
    of the three free AWKs obeys POSIX in this case.
  - Interval expressions are an extremely useful extension of the
    regular expressions. POSIX has them, but only GNU AWK seems to
    implement them. With interval expressions, I managed to read
    a given number of bytes from a binary file, which is not trivial.
  - The above mentioned Unix Spec (XPG4) says that the POSIX spec
    of AWK has some problems and will be changed some time.

Quote:
> How closely does the POSIX standard conform to what
> is described in _The AWK Programming Language_?
>From what I read, there seem to be many subtle differences and

a few significant extensions in the POSIX spec. But I am probably
not the most competent one to be asked.

Quote:
> inquiries posted to comp.lang.awk. To put it bluntly, I think those
> who post expert answers to comp.lang.awk should own TAPL, read TAPL,
> and refer to TAPL before posting. But that's just my opinion.

You are right. But from observing my own behaviour I can tell you that
I am always tempted to underestimate TAPL because it is such a small
grey book.

________________________________________________________________________

Juergen Kahrs                                       Tel.  0421  249 666
Millstaetter Strasse 15                             Tel.  0421  457 2819
D 28359 Bremen                                      Fax   0421  457 3578
____________ http://home.t-online.de/home/Juergen.Kahrs/ _______________



Wed, 07 Mar 2001 03:00:00 GMT  
 
 [ 9 post ] 

 Relevant Pages 

1. awk first word on first line

2. An Awk Program to Create an Awk Program [Long]

3. AWK newbie is looking for a AWK help with his 1st program

4. AWK question: first line that matches?

5. AWK: Problem processing first line

6. first time awk

7. Smalltalk as first programming language

8. My first venture at programming

9. Gitano Software affiliate program - Product Scope 32 PRO bonus to first 25 signees*

10. Formula One 6.0 and First Impression 6.0 Beta Programs Announced

11. Eiffel as the first programming language

12. First Annual Conference on Patterns of Programming Languages

 

 
Powered by phpBB® Forum Software