Can AWK 
Author Message
 Can AWK

If I have an input file with various numbers in it, eg.

abc,def,1234.56
eee,ggg,6789.9

I'd like to be able to format the data so that it looks like:

abc,def,1234.56
eee,ggg,6789.90

In other words, I need to search all the numbers in a document and
then add a 0 to all the numbers that don't have two decimal positions.

Can I do that with AWK?

Thanks

--
Luis
(remove * to reply by mail)



Sun, 17 Aug 2003 21:40:12 GMT  
 Can AWK

Quote:

>If I have an input file with various numbers in it, eg.

>abc,def,1234.56
>eee,ggg,6789.9

>I'd like to be able to format the data so that it looks like:

>abc,def,1234.56
>eee,ggg,6789.90

>In other words, I need to search all the numbers in a document and
>then add a 0 to all the numbers that don't have two decimal positions.

/\.[0-9]$/ { $0 = $0 "0" }
77


Sun, 17 Aug 2003 21:55:48 GMT  
 Can AWK

Quote:

> If I have an input file with various numbers in it, eg.

> abc,def,1234.56
> eee,ggg,6789.9

> I'd like to be able to format the data so that it looks like:

> abc,def,1234.56
> eee,ggg,6789.90

> In other words, I need to search all the numbers in a document and
> then add a 0 to all the numbers that don't have two decimal positions.

> Can I do that with AWK?

Yes, but you'll have to be more specific.

If every line looks like the above, i.e., two words and a number
separated by commas, you can do it easily like this:

awk -F, '{ printf "%s,%s,%.2f\n",$1,$2,$3 }'

That is easily modified to handle any known and fixed number of
fields. A variable number of comma-separated fields with simple
numbers in various positions is somewhat harder, e.g.,

awk -F, 'BEGIN{ f[0]="%s"; f[1]="%.2f"; c[1]=","; c[0]="\n" }
   { for (i=0; i++<NF;) printf f[$i ~ /[0-9.]+/] c[i<NF], $i }'  

If the numbers can appear in completely arbitrary positions it's much
harder to do, you'll have to consider what you want to do with numbers
that already have more than two digits, with numbers that don't have a
decimal point at all, what should happen with "1.2.3" or "12AB", can
there be exponents and how should they be treated, &c.

Any restriction on the data you know will make it easier,
like if it's all comma-separated fields, is the number of fields
known or at least limited, &c.

--
Tapani Tarvainen



Sun, 17 Aug 2003 22:44:38 GMT  
 Can AWK

Quote:


>>If I have an input file with various numbers in it, eg.

>>abc,def,1234.56
>>eee,ggg,6789.9

>>I'd like to be able to format the data so that it looks like:

>>abc,def,1234.56
>>eee,ggg,6789.90

>>In other words, I need to search all the numbers in a document and
>>then add a 0 to all the numbers that don't have two decimal positions.

>/\.[0-9]$/ { $0 = $0 "0" }
>77

The OP said _all_ of the numbers, so more thorough/robust to use the opaque

awk 'gsub(/\.[0-9]$/, "&0") || 666' inputfile



Mon, 18 Aug 2003 02:41:24 GMT  
 Can AWK

...

Quote:
>>/\.[0-9]$/ { $0 = $0 "0" }
>>77

>The OP said _all_ of the numbers, so more thorough/robust to use the opaque

>awk 'gsub(/\.[0-9]$/, "&0") || 666' inputfile

I don't see the difference.  Unless I'm missing something - or you are
depending on something implementation-specific (*) - the gsub can match only
once, so I don't see how it could matter.  You're not trying to match
numbers in the middle of the line, are you?

(*) Note that there is some definitional-ambiguity and implementation-variance
about line-rescanning with gsub.  I.e., I'm never quite sure whether:

        gsub(foo,bar)

is exactly equivalent to:

        while (sub(foo,bar));



Mon, 18 Aug 2003 02:52:52 GMT  
 Can AWK

Quote:
> . . . You're not trying to match numbers in the middle of the line, are

you?

Bingo! I never believe OP's examples are comprehensive, but I do take their
narrative as given.

Quote:
> . . . I'm never quite sure whether:

>gsub(foo,bar)

>is exactly equivalent to:

>while (sub(foo,bar));

They're not the same. Simple test:

echo "x x x x x x x x x x x" | awk '{ gsub(/x +x/, "x"); print }'
echo "x x x x x x x x x x x" | awk '{ while(sub(/x +x/, "x")); print }'



Mon, 18 Aug 2003 03:48:08 GMT  
 Can AWK

Quote:



>> . . . You're not trying to match numbers in the middle of the line, are
>you?

>Bingo! I never believe OP's examples are comprehensive, but I do take their
>narrative as given.

That's a big "whatever" to me... - I don't think we ever really know what
their real problems are...

But your previous post contained:

    awk 'gsub(/\.[0-9]$/, "&0") || 666' inputfile

which I still think can only match once and only at the end of the line.
How can it ever match something in the middle of the line?

Quote:
>> . . . I'm never quite sure whether:

>>gsub(foo,bar)

>>is exactly equivalent to:

>>while (sub(foo,bar));

>They're not the same. Simple test:

>echo "x x x x x x x x x x x" | awk '{ gsub(/x +x/, "x"); print }'
>echo "x x x x x x x x x x x" | awk '{ while(sub(/x +x/, "x")); print }'

I think it is implementation-specific.  Some do, some don't.
I think it something of a "dark corner".


Mon, 18 Aug 2003 04:24:08 GMT  
 Can AWK

Quote:
>Yes, but you'll have to be more specific.

>If every line looks like the above, i.e., two words and a number
>separated by commas, you can do it easily like this:

>awk -F, '{ printf "%s,%s,%.2f\n",$1,$2,$3 }'

>That is easily modified to handle any known and fixed number of
>fields. A variable number of comma-separated fields with simple
>numbers in various positions is somewhat harder, e.g.,

>awk -F, 'BEGIN{ f[0]="%s"; f[1]="%.2f"; c[1]=","; c[0]="\n" }
>   { for (i=0; i++<NF;) printf f[$i ~ /[0-9.]+/] c[i<NF], $i }'  

>If the numbers can appear in completely arbitrary positions it's much
>harder to do, you'll have to consider what you want to do with numbers
>that already have more than two digits, with numbers that don't have a
>decimal point at all, what should happen with "1.2.3" or "12AB", can
>there be exponents and how should they be treated, &c.

>Any restriction on the data you know will make it easier,
>like if it's all comma-separated fields, is the number of fields
>known or at least limited, &c.

Originally all the input lines look something like:

Allan Gray Equity 2001/02/28 9876.54 1234.5 Allan Gray Unit Trust Mgmt
2998.64 2019.82

(all on one line)

To format that I use this awk command:

awk '/Allan Gray Equity/{print "AGEF,"$6","$4}' input.txt > output.txt

The output is:

AGEF,1234.5,2001/02/28

The problem I have is that some of the numbers in the input files only
have one decimal position ie 1234.5 in the above example and I need to
reformat these numbers to have two decimal positions, ie 1234.50

On a rare occasion the numbers won't have any decimal numbers, eg.
1234

How do I handle that? The required output would then be: 1234.00

Thanks.

--
Luis



Mon, 18 Aug 2003 05:07:32 GMT  
 Can AWK

Quote:




>>> . . . You're not trying to match numbers in the middle of the line, are
>>you?

>>Bingo! I never believe OP's examples are comprehensive, but I do take
their
>>narrative as given.

>That's a big "whatever" to me... - I don't think we ever really know what
>their real problems are...

>But your previous post contained:

>    awk 'gsub(/\.[0-9]$/, "&0") || 666' inputfile

>which I still think can only match once and only at the end of the line.
>How can it ever match something in the middle of the line?

Sorry. You're right. I was careless/clueless. It's nontrivial.

awk '(gsub(/\.[0-9](,|$)/,"&\b") && gsub(/,\b|\b$/,"0,") && gsub(/,$/,""))
|| 666' input

which won't work if there are embedded ASCII backspaces or records ending
with commas indicating empty fields.

...

Quote:
>>echo "x x x x x x x x x x x" | awk '{ gsub(/x +x/, "x"); print }'
>>echo "x x x x x x x x x x x" | awk '{ while(sub(/x +x/, "x")); print }'

>I think it is implementation-specific.  Some do, some don't.
>I think it something of a "dark corner".

At the moment from my WinNT4 box, gawk 3.0.6, BWK's awk95 and MKS awk all
give 'x x x x x x' and 'x', respectively for these two commands. I doubt
gawk on other platforms would give different results, leaving TAWK, mawk and
generic unix nawk. I'd be surprised if generic unix nawk differed from BWK's
one true awk. So which awk do you use, and what does it give?


Mon, 18 Aug 2003 05:56:45 GMT  
 Can AWK

<snip>

Quote:
>Originally all the input lines look something like:

>Allan Gray Equity 2001/02/28 9876.54 1234.5 Allan Gray Unit Trust Mgmt
>2998.64 2019.82

>(all on one line)

>To format that I use this awk command:

>awk '/Allan Gray Equity/{print "AGEF,"$6","$4}' input.txt > output.txt

...

Change this to

awk '/Allan Gray Equity/{printf("AGEF,%.2f,%s\n", $6, $4)}' input.txt >
output.txt



Mon, 18 Aug 2003 05:56:46 GMT  
 Can AWK

Quote:

> At the moment from my WinNT4 box, gawk 3.0.6, BWK's awk95 and MKS awk all
> give 'x x x x x x' and 'x', respectively for these two commands. I doubt
> gawk on other platforms would give different results, leaving TAWK, mawk and
> generic unix nawk. I'd be surprised if generic unix nawk differed from BWK's
> one true awk. So which awk do you use, and what does it give?

Win98 SE:

BWK awk (awk95.exe):

C:\users\csrabak\Work>echo "x x x x x x x x x x x" | awk "{ gsub(/x +x/,
"x"); print }"
"     x"

C:\users\csrabak\Work>echo "x x x x x x x x x x x" | awk "{ while(sub(/x
+x/, "x")); print }"
"     x"

MAWK for DOS 1.2.2:

C:\users\csrabak\Work>echo "x x x x x x x x x x x" | mawk "{ gsub(/x
+x/, "x"); print }"
"     x"

C:\users\csrabak\Work>echo "x x x x x x x x x x x" | mawk "{
while(sub(/x +x/, "x")); print }"
"     x"

C:\users\csrabak\Work>echo "x x x x x x x x x x x" | gawk "{ gsub(/x
+x/, "x"); print }"
"     x"

gawk version 3.0.6 (DJGPP compiled):

C:\users\csrabak\Work>echo "x x x x x x x x x x x" | gawk "{
while(sub(/x +x/, "x")); print }"
"     x"

Oddly enough, in Win98 is the true nawk which has the different
behaviour!

HTH

Cesar



Mon, 18 Aug 2003 08:03:20 GMT  
 Can AWK

Quote:



> >> . . . I'm never quite sure whether:

> >>gsub(foo,bar)

> >>is exactly equivalent to:

> >>while (sub(foo,bar));

> >They're not the same. Simple test:

> >echo "x x x x x x x x x x x" | awk '{ gsub(/x +x/, "x"); print }'
> >echo "x x x x x x x x x x x" | awk '{ while(sub(/x +x/, "x")); print }'

> I think it is implementation-specific.  Some do, some don't.
> I think it something of a "dark corner".

In case you worry about standards, POSIX.2 says that gsub will only
replace non-overlapping instances of the ERE, so the former should
print six x's and the latter just one.

--
Tapani Tarvainen



Mon, 18 Aug 2003 14:11:41 GMT  
 Can AWK

Quote:

> Allan Gray Equity 2001/02/28 9876.54 1234.5 Allan Gray Unit Trust Mgmt
> 2998.64 2019.82

> (all on one line)

> To format that I use this awk command:

> awk '/Allan Gray Equity/{print "AGEF,"$6","$4}' input.txt > output.txt

> The output is:

> AGEF,1234.5,2001/02/28

> The problem I have is that some of the numbers in the input files only
> have one decimal position ie 1234.5 in the above example and I need to
> reformat these numbers to have two decimal positions, ie 1234.50

Well that's easy, since you know the position of the number in question:

awk '/Allan Gray Equity/{printf "AGEF,%.2f,%s\n",$6,$4}' input.txt > output.txt

Look up the awk man page for printf arguments (note: it's essentially same
as shell's printf utility and C's printf() function).

--
Tapani Tarvainen



Mon, 18 Aug 2003 14:19:10 GMT  
 Can AWK

Quote:

...
>Win98 SE:

>BWK awk (awk95.exe):

>C:\users\csrabak\Work>echo "x x x x x x x x x x x" | awk "{ gsub(/x +x/,
>"x"); print }"
>"     x"

> C:\users\csrabak\Work>echo "x x x x x x x x x x x" | awk "{ while(sub(/x
> +x/, "x")); print }"
> "     x"

That's nice.

Now for a turorial on using awk under COMMAND.COM. The command line script

"{ gsub(/x +x/,"x"); print }"

parses under COMMAND.COM as

{ gsub(/x +x/,
x
); print }

(with newlines just showing the parts inside and outside double quotes),
which would be the equivalent of the script file

{ gsub(/x +x/,x); print }

which is NOT the equivalent of

{ gsub(/x +x/,"x"); print }

Lest you don't grasp the problem, my script was intended for a shell command
line (hence the _single_ quotes in my commands - reproduced below).

echo "x x x x x x x x x x x" | awk '{ gsub(/x +x/, "x"); print }'
echo "x x x x x x x x x x x" | awk '{ while(sub(/x +x/, "x")); print }'

Thus in my scripts, "x" was a literal text string. In your scripts, x was an
unitialized variable, thus equivalent to "". Rewrite your command lines as

echo x x x x x x x x x x x | awk "{ gsub(/x +x/, \"x\"); print }"

and

echo x x x x x x x x x x x | awk "{ while(sub(/x +x/, \"x\")); print }"

and try again.

Quote:
> MAWK for DOS 1.2.2:
...
> gawk version 3.0.6 (DJGPP compiled):

...

Same comments.

If you're going to participate intelligently in an awk newsgroup, you're
going to have to learn the differences between shell and non-shell command
line syntax.



Tue, 19 Aug 2003 02:55:06 GMT  
 Can AWK


Quote:
>Change this to

>awk '/Allan Gray Equity/{printf("AGEF,%.2f,%s\n", $6, $4)}' input.txt >
>output.txt

Thanks.

When I use the code you provided:

awk '/Allan Gray Equity/{printf("AGEF,%.2f,%s\n", $6, $4)}' input.txt

Quote:
> output.txt

the output I get is:

AGEF,s

When I use the code suggested by Tapani Tarvainen in this thread:

awk '/Allan Gray Equity/{printf "AGEF,%.2f,%s\n",$6,$4}' input.txt >
output.txt

I also get:

AGEF,s

Whare have I gone wrong?

Is it because I'm running Windows 98SE?



Tue, 19 Aug 2003 17:21:27 GMT  
 
 [ 17 post ]  Go to page: [1] [2]

 Relevant Pages 

1. CA Cans VO ?

2. It's not bad canned meat...

3. It's not bad canned meat...

4. It's not bad canned meat...

5. Using CGI module with 'canned queries'

6. It's not bad canned meat...

7. common mistakes in awk: comparing awk with C

8. Awk compilers / Awk to C converters

9. Arrays in awk/awk help please!

10. Help with Awk, totally new to AWK programing

11. awk process in awk ??

12. AWK newbie is looking for a AWK help with his 1st program

 

 
Powered by phpBB® Forum Software