Output from AWK script contains original input lines *AND* post-processed lines!! HELP!! 
Author Message
 Output from AWK script contains original input lines *AND* post-processed lines!! HELP!!

I have a AWK script that works correctly, *except* for the fact that the
output generated to "stdout" includes the original input text as well as the
prost-processed text!

So, for an input file "myfile.txt":
red
white
and
blue

If I have a script that is supposed to replace 'e' with 'x', then the output
generated by my AWK script is as follows, "out.txt":
red
rxd
white
whitx
and
and
blue
blux

Of course, what I want in "out.txt" is only:
rxd
whitx
and
blux

I use GAWK and the command line looks like this:
gawk -f myscript.awk myfile.txt > out.txt

What am I doing wrong?!?!?  I've read and look through all the documentation
I could find!  HELP!! ;-)

I have attached the actual script for completeness, but I believe I've done
everything correctly inside the script.

Thank you.

Attached Script:

#usage: gawk -f clean_dsp.awk input.dsp > output.dsp

IGNORECASE = 1
{
   # substitute project output path to working directory
   gsub(/out:\"\.\.\\\.\.\\\.\.\\\.\.\\output\\bin/,"out:\"\.");

   # if line matches regular expressions below, then remove from output file
   if (($0 !~ /# PROP Output_Dir/) && ($0 !~ /# PROP Intermediate_Dir/))
      print $0;

Quote:
}



Tue, 28 Oct 2003 04:24:14 GMT  
 Output from AWK script contains original input lines *AND* post-processed lines!! HELP!!

Quote:

> I have a AWK script that works correctly, *except* for the fact that the
> output generated to "stdout" includes the original input text as well as the
> prost-processed text!

> So, for an input file "myfile.txt":
> red
> white
> and
> blue

> If I have a script that is supposed to replace 'e' with 'x', then the output
> generated by my AWK script is as follows, "out.txt":
> red
> rxd
> white
> whitx
> and
> and
> blue
> blux

> Of course, what I want in "out.txt" is only:
> rxd
> whitx
> and
> blux

> I use GAWK and the command line looks like this:
> gawk -f myscript.awk myfile.txt > out.txt

> What am I doing wrong?!?!?  I've read and look through all the documentation
> I could find!  HELP!! ;-)

> I have attached the actual script for completeness, but I believe I've done
> everything correctly inside the script.

> Thank you.

> Attached Script:

> #usage: gawk -f clean_dsp.awk input.dsp > output.dsp

> IGNORECASE = 1
> {
>    # substitute project output path to working directory
>    gsub(/out:\"\.\.\\\.\.\\\.\.\\\.\.\\output\\bin/,"out:\"\.");

>    # if line matches regular expressions below, then remove from output file
>    if (($0 !~ /# PROP Output_Dir/) && ($0 !~ /# PROP Intermediate_Dir/))
>       print $0;
> }

Hello,

sed -e 's/e/x/g' myfile.txt

or (to stay on topic)

awk '{sub ("e","x",$1);print $0}' myfile.txt

Michael Heiming



Tue, 28 Oct 2003 07:10:31 GMT  
 Output from AWK script contains original input lines *AND* post-processed lines!! HELP!!


% I have a AWK script that works correctly, *except* for the fact that the
% output generated to "stdout" includes the original input text as well as the
% prost-processed text!

[...]

% #usage: gawk -f clean_dsp.awk input.dsp > output.dsp
%
% IGNORECASE = 1

This might just look like an assignment, but it's also an expression,
and it evaluates to 1. Since there's no associated action, it takes
the default action, which is to print $0.

You probably want something like

 BEGIN { IGNORECASE = 1 }

but if there's some reason to reset IGNORECASE on each input record,
you should either make it part of an action without a pattern:

 { IGNORECASE = 1 }

You could also keep it as a pattern and assign it either an empty
action, or some action you would like to have executed
on each line. That would be bad unless your goal was to be confusing,
which is not a bad goal, just not something that everyone aspires to.

--

Patrick TJ McPhee
East York  Canada



Tue, 28 Oct 2003 11:49:35 GMT  
 Output from AWK script contains original input lines *AND* post-processed lines!! HELP!!

Quote:
> I have a AWK script that works correctly, *except* for the fact that the
> output generated to "stdout" includes the original input text as well as the
> prost-processed text!
...
> What am I doing wrong?!?!?  I've read and look through all the documentation
> I could find!  HELP!! ;-)
...
> I have attached the actual script for completeness, but I believe I've done
> everything correctly inside the script.
...
> #usage: gawk -f clean_dsp.awk input.dsp > output.dsp

> IGNORECASE = 1
> {
>    # substitute project output path to working directory
>    gsub(/out:\"\.\.\\\.\.\\\.\.\\\.\.\\output\\bin/,"out:\"\.");

>    # if line matches regular expressions below, then remove from output file
>    if (($0 !~ /# PROP Output_Dir/) && ($0 !~ /# PROP Intermediate_Dir/))
>       print $0;
> }

Compare this (untested):

# Usage: gawk --re-interval -f clean_dsp.awk input.dsp >output.dsp

BEGIN {
    IGNORECASE = 1

Quote:
}

/# PROP (Output|Intermediate)_Dir/ {
    next

Quote:
}

{
    sub(/out:"(\.\.\\){4}output\\bin/, "out:\"\.")
    print

Quote:
}

To me, this version is more readable and immediately understandable,
even without the comments. And it produces correct output (i.e., the
assignment-operation-as-actionless-pattern bug is fixed).

(N.B. The switch from gsub() to sub() is intentional. Just a guess.
Maybe you DO want to substitute globally.)

--
Jim Monty

Tempe, Arizona USA



Tue, 28 Oct 2003 14:23:35 GMT  
 Output from AWK script contains original input lines *AND* post-processed lines!! HELP!!
You neglected the braces around the 'if' clause so the 'print $0'
executes every line. As best as I can tell you meant to say:

 #usage: gawk -f clean_dsp.awk input.dsp > output.dsp

 IGNORECASE = 1
 {
    # substitute project output path to working directory
    gsub(/out:\"\.\.\\\.\.\\\.\.\\\.\.\\output\\bin/,"out:\"\.");

    # if line matches regular expressions below, then remove from output
file
    if (($0 !~ /# PROP Output_Dir/) && ($0 !~ /# PROP
Intermediate_Dir/)) {
       print $0;
    }
 }

Quote:

> I have a AWK script that works correctly, *except* for the fact that the
> output generated to "stdout" includes the original input text as well as the
> prost-processed text!

> So, for an input file "myfile.txt":
> red
> white
> and
> blue

> If I have a script that is supposed to replace 'e' with 'x', then the output
> generated by my AWK script is as follows, "out.txt":
> red
> rxd
> white
> whitx
> and
> and
> blue
> blux

> Of course, what I want in "out.txt" is only:
> rxd
> whitx
> and
> blux

> I use GAWK and the command line looks like this:
> gawk -f myscript.awk myfile.txt > out.txt

> What am I doing wrong?!?!?  I've read and look through all the documentation
> I could find!  HELP!! ;-)

> I have attached the actual script for completeness, but I believe I've done
> everything correctly inside the script.

> Thank you.

> Attached Script:

> #usage: gawk -f clean_dsp.awk input.dsp > output.dsp

> IGNORECASE = 1
> {
>    # substitute project output path to working directory
>    gsub(/out:\"\.\.\\\.\.\\\.\.\\\.\.\\output\\bin/,"out:\"\.");

>    # if line matches regular expressions below, then remove from output file
>    if (($0 !~ /# PROP Output_Dir/) && ($0 !~ /# PROP Intermediate_Dir/))
>       print $0;
> }



Tue, 28 Oct 2003 21:36:21 GMT  
 Output from AWK script contains original input lines *AND* post-processed lines!! HELP!!

Quote:

> You neglected the braces around the 'if' clause so the 'print $0'
> executes every line. As best as I can tell you meant to say:

>  #usage: gawk -f clean_dsp.awk input.dsp > output.dsp

>  IGNORECASE = 1
>  {
>     # substitute project output path to working directory
>     gsub(/out:\"\.\.\\\.\.\\\.\.\\\.\.\\output\\bin/,"out:\"\.");

>     # if line matches regular expressions below, then remove from output
> file
>     if (($0 !~ /# PROP Output_Dir/) && ($0 !~ /# PROP
> Intermediate_Dir/)) {
>        print $0;
>     }
>  }

Sorry,  you missed the obvious problem.  Your braces are redundant
(as are the $0 and semicolon on the print statement) and
don't fix the problem.  Awk consists of

      pattern { action }

statements.  If the pattern statement is true,  the action is performed.
If a pattern statement is true and no action statement is encountered,
the default action "print" is performed.  The simplest pattern statement
is "1":

      $ cat abc
      abc
      def
      ghi
      $ awk 1 abc
      abc
      def
      ghi

The offending line in the above script is:

      IGNORECASE = 1

This constitutes a pattern that is always true:

      $ awk 'IGNORECASE = 1' abc
      abc
      def
      ghi
      $ awk 'IGNORECASE = 0' abc
      $awk 'IGNORECASE = !IGNORECASE' abc
      abc
      ghi

You need to place variable settings in a block where they will be
executed only once - the BEGIN block:

      BEGIN { IGNORECASE = 1}
      {
      ...
      }

--
Dan Mercer

- Show quoted text -

Quote:


>> I have a AWK script that works correctly, *except* for the fact that the
>> output generated to "stdout" includes the original input text as well as the
>> prost-processed text!

>> So, for an input file "myfile.txt":
>> red
>> white
>> and
>> blue

>> If I have a script that is supposed to replace 'e' with 'x', then the output
>> generated by my AWK script is as follows, "out.txt":
>> red
>> rxd
>> white
>> whitx
>> and
>> and
>> blue
>> blux

>> Of course, what I want in "out.txt" is only:
>> rxd
>> whitx
>> and
>> blux

>> I use GAWK and the command line looks like this:
>> gawk -f myscript.awk myfile.txt > out.txt

>> What am I doing wrong?!?!?  I've read and look through all the documentation
>> I could find!  HELP!! ;-)

>> I have attached the actual script for completeness, but I believe I've done
>> everything correctly inside the script.

>> Thank you.

>> Attached Script:

>> #usage: gawk -f clean_dsp.awk input.dsp > output.dsp

>> IGNORECASE = 1
>> {
>>    # substitute project output path to working directory
>>    gsub(/out:\"\.\.\\\.\.\\\.\.\\\.\.\\output\\bin/,"out:\"\.");

>>    # if line matches regular expressions below, then remove from output file
>>    if (($0 !~ /# PROP Output_Dir/) && ($0 !~ /# PROP Intermediate_Dir/))
>>       print $0;
>> }

Opinions expressed herein are my own and may not represent those of my employer.


Tue, 28 Oct 2003 23:27:09 GMT  
 Output from AWK script contains original input lines *AND* post-processed lines!! HELP!!
Thank you all!!!

After seeing your responses about the fact that the IGNORECASE = 1 is parsed
as an expression requiring an action, it hit me right between the eyes!!!  I
kept thinking of IGNORECASE as some sort of special internal AWK variable or
keyword that did not have to comply with the language "expr { action }"
rule.

Of course now, my problem makes total sense given the simple and elegant AWK
language defintion... ;-)

Again, thank you all.  Especially to Jim Monty for his elegant AWK script to
do exactly the same thing as my script but much cleaner and simpler to
understand.  As a "newbie" I am still rough around the edges! ;-)



Wed, 29 Oct 2003 02:49:04 GMT  
 Output from AWK script contains original input lines *AND* post-processed lines!! HELP!!
Quote:


> ...
>>#usage: gawk -f clean_dsp.awk input.dsp > output.dsp

>>IGNORECASE = 1
>>{
>>   # substitute project output path to working directory
>>   gsub(/out:\"\.\.\\\.\.\\\.\.\\\.\.\\output\\bin/,"out:\"\.");

>>    # if line matches regular expressions below, then remove from output
file
>>   if (($0 !~ /# PROP Output_Dir/) && ($0 !~ /# PROP Intermediate_Dir/))
>>      print $0;
>>}

>Compare this (untested):

># Usage: gawk --re-interval -f clean_dsp.awk input.dsp >output.dsp

>BEGIN {
>    IGNORECASE = 1
>}

>/# PROP (Output|Intermediate)_Dir/ {
>    next
>}

>{
>    sub(/out:"(\.\.\\){4}output\\bin/, "out:\"\.")
>    print
>}

...

The --re-interval could be replaced with --posix (less typing is good).
Also, there's no need to escape the period in the replacement pattern.
Finally, '/x/ { next } { ...; print }' could be compressed to '!/x/ { ...;
print }' - depends on whether you want your code to read 'ignore these lines
and do this with the others' or 'on all lines except these do this'.

#usage: gawk --posix -f clean_dsp.awk input.dsp > output.dsp
BEGIN { IGNORECASE = 1 }
! /# PROP (Output|Intermediate)_Dir/ {
  sub(/out:"((\.){2}\\){4}output\\bin/, "out:\".")
  print

- Show quoted text -

Quote:
}



Wed, 29 Oct 2003 06:02:29 GMT  
 Output from AWK script contains original input lines *AND* post-processed lines!! HELP!!

Quote:



> > ...
> > > #usage: gawk -f clean_dsp.awk input.dsp > output.dsp

> > > IGNORECASE = 1
> > > {
> > >    # substitute project output path to working directory
> > >    gsub(/out:\"\.\.\\\.\.\\\.\.\\\.\.\\output\\bin/,"out:\"\.");

> > >    # if line matches regular expressions below, then remove from output file
> > >    if (($0 !~ /# PROP Output_Dir/) && ($0 !~ /# PROP Intermediate_Dir/))
> > >       print $0;
> > > }

> > Compare this (untested):

> > # Usage: gawk --re-interval -f clean_dsp.awk input.dsp >output.dsp

> > BEGIN {
> >     IGNORECASE = 1
> > }

> > /# PROP (Output|Intermediate)_Dir/ {
> >     next
> > }

> > {
> >     sub(/out:"(\.\.\\){4}output\\bin/, "out:\"\.")
> >     print
> > }
> ...

> The --re-interval could be replaced with --posix ...

Not without breaking the script. Setting --posix renders the
IGNORECASE variable meaningless:

 $ echo FOO | gawk '/foo/' IGNORECASE=1
 FOO
 $ echo FOO | gawk --re-interval '/fo{2}/' IGNORECASE=1
 FOO
 $ echo FOO | gawk --posix '/fo{2}/' IGNORECASE=1
 gawk: cmd. line:2: warning: IGNORECASE not supported in compatibility mode
 $

Even if setting --posix didn't break Mario's script immediately,
it would still be a bad idea to turn off all GNU Awk extensions
and turn on a bunch of POSIX-only features just to enable a single
feature that is, unfortunately, disabled by default; namely, regular
expression intervals.

Quote:
> ... (less typing is good).

> Also, there's no need to escape the period in the replacement pattern.

Thanks for catching this egregious bug in Mario's script. Hopefully,
he can fix it without excessive typing.

Quote:
> Finally, '/x/ { next } { ...; print }' could be compressed to '!/x/ { ...;
> print }' - depends on whether you want your code to read 'ignore these lines
> and do this with the others' or 'on all lines except these do this'.

I was purposely demonstrating how to avoid negation. Mario had
written

  if NOT condition AND NOT condition

and I had to stop and think about it (and read the accompanying
comment) before I understood what it was doing. As I said (and you
failed to quote in your follow-up): "To me, this version is more
readable and immediately understandable, even without the comments."

Quote:
> #usage: gawk --posix -f clean_dsp.awk input.dsp > output.dsp
> BEGIN { IGNORECASE = 1 }
> ! /# PROP (Output|Intermediate)_Dir/ {
>   sub(/out:"((\.){2}\\){4}output\\bin/, "out:\".")
>   print
> }

You didn't specify whether this was tested or untested.

Quote:

> > > Again, thank you all. Especially to Jim Monty for his elegant AWK
> > > script to do exactly the same thing as my script but much cleaner
> > > and simpler to understand.

Abj gung Tbbtyr unf erfgberq gur napvrag nepuvirf bs HFRARG arjf
negvpyrf qngvat onpx gb 1995, abfgnytvn ohssf pna ernq sbe gurzfryirf
jung pbzc.ynat.njx jnf yvxr orsber gur gbcvp bs gur arjftebhc orpnzr
lbh ohfgvat zl onyyf rirel gvzr V cbfg na negvpyr. V gubhtug vg
unq raqrq, ohg vg frrzf Znevb'f rkcerffvba bs tengvghqr gb zr jnf
zber guna lbhe rttfuryy-yvxr rtb pbhyq gnxr.

--
Jim Monty

Tempe, Arizona USA



Wed, 29 Oct 2003 10:20:49 GMT  
 Output from AWK script contains original input lines *AND* post-processed lines!! HELP!!

Quote:


...
>>The --re-interval could be replaced with --posix ...

>Not without breaking the script. Setting --posix renders the
>IGNORECASE variable meaningless:

You're right. Sorry.

...

Quote:
>> Also, there's no need to escape the period in the replacement pattern.

>Thanks for catching this egregious bug in Mario's script. Hopefully,
>he can fix it without excessive typing.

Minor: backslashes can and do cause frequent headaches when switching
between command line and file-based scripts. Using as few backslashes as
possible is good because it'll lead to fewer bugs in the long haul.

...

Quote:
>I was purposely demonstrating how to avoid negation. Mario had
>written

>   if NOT condition AND NOT condition

>and I had to stop and think about it (and read the accompanying
>comment) before I understood what it was doing. As I said (and you
>failed to quote in your follow-up): "To me, this version is more
>readable and immediately understandable, even without the comments."

...

I didn't question the readability, just the length.

if NOT x AND NOT y then z

can be rewritten as (yours)

if x or y then SKIP
z

or (mine)

if NOT (x OR y) then z

It's a matter of taste for small scripts. When scripts get larger, fewer
pattern-action pairs are usually better.



Sat, 01 Nov 2003 06:19:37 GMT  
 
 [ 10 post ] 

 Relevant Pages 

1. awk -- pattern match a line and the line that follows

2. processing one line in a file at a time using awk

3. awk/gawk stop after processing 40mill lines

4. AWK: Problem processing first line

5. Processing Command-Line Input

6. command line input and output

7. On-line monitoring a process output

8. Passing command line variables to AWK in shell script

9. AWK script to count LOC (lines of code)

10. awk sort script giving me duplicate lines

11. AWK script to count LOC (lines of code)

12. long lines in AWK script

 

 
Powered by phpBB® Forum Software