Replacing digits in first field 
Author Message
 Replacing digits in first field

Hello,

I have a file containing:
  43575        NS      C1DW : 004000 --FF
  53580        NS      C1DW : 004000 FD--
  63685        NS      C1DW : 004001 --FE
  74090        NS      C1DW : 004001 FD--
  84085        NS      C1DW : 004002 --FD
  95100        NS      C1DW : 004002 FD--
 106105        NS      C1DW : 004003 --FC
 120910        NS      C1DW : 004003 FD--

Now, I want to replace the digits in the first field (or every digit
before the "NS") with an 'X', but I want to keep the spacing the same:
  XXXXX        NS      C1DW : 004000 --FF
  XXXXX        NS      C1DW : 004000 FD--
  XXXXX        NS      C1DW : 004001 --FE
  XXXXX        NS      C1DW : 004001 FD--
  XXXXX        NS      C1DW : 004002 --FD
  XXXXX        NS      C1DW : 004002 FD--
 XXXXXX        NS      C1DW : 004003 --FC
 XXXXXX        NS      C1DW : 004003 FD--

I have tried it with the following AWK-script:
awk '{
  gsub(/[0-9]/, "X", $1)
  print

Quote:
}'

The problem with this script is that by modifying $1 the spacing between
fields is replaced by one space:
XXXXX NS C1DW : 004000 --FF
XXXXX NS C1DW : 004000 FD--
XXXXX NS C1DW : 004001 --FE
XXXXX NS C1DW : 004001 FD--
XXXXX NS C1DW : 004002 --FD
XXXXX NS C1DW : 004002 FD--
XXXXXX NS C1DW : 004003 --FC
XXXXXX NS C1DW : 004003 FD--

Three questions from me:
1) Why is the spacing changed?

2) What would be the shortest AWK-solution? I was thinking of splitting
$0 into an array with FS="" (thus every character separated into an
array), then parsing this array and printing. This looks like a complex
procedure for what looks like a simple problem.

3) Is there another simple solution with sed or tr? (preferably not
perl!)

Thanks in advance,
--
    Matthijs van Aalten



Sat, 25 May 2002 03:00:00 GMT  
 Replacing digits in first field


Quote:
> I have a file containing:
>   43575        NS      C1DW : 004000 --FF
>   53580        NS      C1DW : 004000 FD--
>   63685        NS      C1DW : 004001 --FE
>   74090        NS      C1DW : 004001 FD--
>   84085        NS      C1DW : 004002 --FD
>   95100        NS      C1DW : 004002 FD--
>  106105        NS      C1DW : 004003 --FC
>  120910        NS      C1DW : 004003 FD--

> Now, I want to replace the digits in the first field (or every digit
> before the "NS") with an 'X', but I want to keep the spacing the same:
>   XXXXX        NS      C1DW : 004000 --FF
>   XXXXX        NS      C1DW : 004000 FD--
>   XXXXX        NS      C1DW : 004001 --FE
>   XXXXX        NS      C1DW : 004001 FD--
>   XXXXX        NS      C1DW : 004002 --FD
>   XXXXX        NS      C1DW : 004002 FD--
>  XXXXXX        NS      C1DW : 004003 --FC
>  XXXXXX        NS      C1DW : 004003 FD--

> I have tried it with the following AWK-script:
> awk '{
>   gsub(/[0-9]/, "X", $1)
>   print
> }'

> The problem with this script is that by modifying $1 the spacing
between
> fields is replaced by one space:
> Three questions from me:
> 1) Why is the spacing changed?

Because you change $1. This causes awk to print out every field
separated by the output field separator, OFS (defaults to a single
space).

Quote:
> 2) What would be the shortest AWK-solution?

Tamper with $0 instead. It's not trivial as far as I can see. Maybe if
you know where you have NS you can use substr to get a copy of the part
before NS and then make the gsub on that copy and then print the copy
and the rest of $0 (using substr again).

Quote:
> 3) Is there another simple solution with sed or tr? (preferably not
> perl!)

I think tr would risk change things after NS. sed might be better to
concentrate on just the part before NS but I can't really see how, but
that's because I don't know much about sed.

/Peter
--
-= Spam safe(?) e-mail address: pez68 at netscape.net =-

Sent via Deja.com http://www.deja.com/
Before you buy.



Sat, 25 May 2002 03:00:00 GMT  
 Replacing digits in first field


Quote:
>Hello,

>I have a file containing:
>  43575        NS      C1DW : 004000 --FF
>  53580        NS      C1DW : 004000 FD--
>  63685        NS      C1DW : 004001 --FE
>  74090        NS      C1DW : 004001 FD--
>  84085        NS      C1DW : 004002 --FD
>  95100        NS      C1DW : 004002 FD--
> 106105        NS      C1DW : 004003 --FC
> 120910        NS      C1DW : 004003 FD--

>Now, I want to replace the digits in the first field (or every digit
>before the "NS") with an 'X', but I want to keep the spacing the same:
>  XXXXX        NS      C1DW : 004000 --FF
>  XXXXX        NS      C1DW : 004000 FD--
>  XXXXX        NS      C1DW : 004001 --FE
>  XXXXX        NS      C1DW : 004001 FD--
>  XXXXX        NS      C1DW : 004002 --FD
>  XXXXX        NS      C1DW : 004002 FD--
> XXXXXX        NS      C1DW : 004003 --FC
> XXXXXX        NS      C1DW : 004003 FD--

>I have tried it with the following AWK-script:
>awk '{
>  gsub(/[0-9]/, "X", $1)
>  print
>}'

>The problem with this script is that by modifying $1 the spacing between
>fields is replaced by one space:
>XXXXX NS C1DW : 004000 --FF
>XXXXX NS C1DW : 004000 FD--
>XXXXX NS C1DW : 004001 --FE
>XXXXX NS C1DW : 004001 FD--
>XXXXX NS C1DW : 004002 --FD
>XXXXX NS C1DW : 004002 FD--
>XXXXXX NS C1DW : 004003 --FC
>XXXXXX NS C1DW : 004003 FD--

>Three questions from me:
>1) Why is the spacing changed?

Assigning values to fields causes awk to reparse the record.

Quote:
>2) What would be the shortest AWK-solution? I was thinking of splitting
>$0 into an array with FS="" (thus every character separated into an
>array), then parsing this array and printing. This looks like a complex
>procedure for what looks like a simple problem.

gawk '{sub(/^  [0-9][0-9][0-9][0-9][0-9]/, "  XXXXX");
       sub(/^ [0-9][0-9][0-9][0-9][0-9][0-9]/, " XXXXXX");
       print}' infile

Note this doesn't use a field, it uses $0  :-)

Or, one could do it this way:

gawk '{a=substr($0,1,7);b=substr($0,8);
       gsub(/[0-9]/,"X",a);print a b}' infile

Quote:
>3) Is there another simple solution with sed or tr? (preferably not
>perl!)

sed -e 's/^  [0-9]\{5\}/  XXXXX/;s/^ [0-9]\{6\}/ XXXXXX/' infile

One could probably write a loop thing for a sed solution, but I'll
let someone else do it.

Chuck Demas
Needham, Mass.

--
  Eat Healthy    |   _ _   | Nothing would be done at all,

  Die Anyway     |    v    | That no one could find fault with it.



Sat, 25 May 2002 03:00:00 GMT  
 Replacing digits in first field

Quote:
> I have a file containing:
>   43575        NS      C1DW : 004000 --FF
>   53580        NS      C1DW : 004000 FD--
>   63685        NS      C1DW : 004001 --FE
>   74090        NS      C1DW : 004001 FD--
>   84085        NS      C1DW : 004002 --FD
>   95100        NS      C1DW : 004002 FD--
>  106105        NS      C1DW : 004003 --FC
>  120910        NS      C1DW : 004003 FD--

> Now, I want to replace the digits in the first field (or every digit
> before the "NS") with an 'X', but I want to keep the spacing the same:
>   XXXXX        NS      C1DW : 004000 --FF
>   XXXXX        NS      C1DW : 004000 FD--
>   XXXXX        NS      C1DW : 004001 --FE
>   XXXXX        NS      C1DW : 004001 FD--
>   XXXXX        NS      C1DW : 004002 --FD
>   XXXXX        NS      C1DW : 004002 FD--
>  XXXXXX        NS      C1DW : 004003 --FC
>  XXXXXX        NS      C1DW : 004003 FD--

> [...]

> 3) Is there another simple solution with sed or tr? (preferably not
> perl!)

$ cat datafile
  43575        NS      C1DW : 004000 --FF
  53580        NS      C1DW : 004000 FD--
  63685        NS      C1DW : 004001 --FE
  74090        NS      C1DW : 004001 FD--
  84085        NS      C1DW : 004002 --FD
  95100        NS      C1DW : 004002 FD--
 106105        NS      C1DW : 004003 --FC
 120910        NS      C1DW : 004003 FD--
$ perl -pe 's/\d+/"X" x length $&/e' datafile
  XXXXX        NS      C1DW : 004000 --FF
  XXXXX        NS      C1DW : 004000 FD--
  XXXXX        NS      C1DW : 004001 --FE
  XXXXX        NS      C1DW : 004001 FD--
  XXXXX        NS      C1DW : 004002 --FD
  XXXXX        NS      C1DW : 004002 FD--
 XXXXXX        NS      C1DW : 004003 --FC
 XXXXXX        NS      C1DW : 004003 FD--
$ perl -pe 's/^(\s*)(\d+)/$1 . ("X" x length $2)/e' datafile
  XXXXX        NS      C1DW : 004000 --FF
  XXXXX        NS      C1DW : 004000 FD--
  XXXXX        NS      C1DW : 004001 --FE
  XXXXX        NS      C1DW : 004001 FD--
  XXXXX        NS      C1DW : 004002 --FD
  XXXXX        NS      C1DW : 004002 FD--
 XXXXXX        NS      C1DW : 004003 --FC
 XXXXXX        NS      C1DW : 004003 FD--
$

It's just TOO easy in Perl!

--
Jim Monty

Tempe, Arizona USA



Sat, 25 May 2002 03:00:00 GMT  
 Replacing digits in first field

Quote:

> > 3) Is there another simple solution with sed or tr? (preferably not
> > perl!)

> $ cat datafile
>   43575        NS      C1DW : 004000 --FF
>   53580        NS      C1DW : 004000 FD--
>   63685        NS      C1DW : 004001 --FE
>   74090        NS      C1DW : 004001 FD--
>   84085        NS      C1DW : 004002 --FD
>   95100        NS      C1DW : 004002 FD--
>  106105        NS      C1DW : 004003 --FC
>  120910        NS      C1DW : 004003 FD--
> $ perl -pe 's/\d+/"X" x length $&/e' datafile
> $ perl -pe 's/^(\s*)(\d+)/$1 . ("X" x length $2)/e' datafile
> It's just TOO easy in Perl!

GRRRR!!!

I know that about everything I want is possible in Perl, but I don't
want to learn another scriptlanguage when about everything I want is
also possible in AWK... but sometimes, like with this problem, I don't
know how.

And the AWK-solution I'm using now is just as easy and short as the
Perl-way...

Regards,
--
    Matthijs van Aalten



Sun, 26 May 2002 03:00:00 GMT  
 Replacing digits in first field

Quote:



> ...
> > I have a file containing:
> >   43575        NS      C1DW : 004000 --FF
> >   53580        NS      C1DW : 004000 FD--
> ...
> > Now, I want to replace the digits in the first field (or every digit
> > before the "NS") with an 'X', but I want to keep the spacing the same:
> >   XXXXX        NS      C1DW : 004000 --FF
> >   XXXXX        NS      C1DW : 004000 FD--
> ...
> > 2) What would be the shortest AWK-solution? I was thinking of splitting
> > $0 into an array with FS="" (thus every character separated into an
> > array), then parsing this array and printing. This looks like a complex
> > procedure for what looks like a simple problem.

> Use "NS" as input and output field separator. You wanted short, so minimal
> whitespace.

> gawk 'BEGIN{FS=OFS="NS"} gsub(/[0-9]/,"X",$1)+1' infile

Thanks! Exactly what I was looking for! Someone else mailed a similar
solution but this one is even shorter. The '+1' after gsub is to always
have a positive result so the inputline is always printed? Smart, real
smart...

Regards,
--
    Matthijs van Aalten



Sun, 26 May 2002 03:00:00 GMT  
 Replacing digits in first field


OT post by someone else:
...

Quote:
>> $ perl -pe 's/\d+/"X" x length $&/e' datafile
>> $ perl -pe 's/^(\s*)(\d+)/$1 . ("X" x length $2)/e' datafile
>> It's just TOO easy in Perl!

>GRRRR!!!

>I know that about everything I want is possible in Perl, but I don't
>want to learn another scriptlanguage when about everything I want is
>also possible in AWK... but sometimes, like with this problem, I don't
>know how.

In my day, they always said that it was impossible to tell a TECO macro
apart from line noise.  The same is true of Perl.


Sun, 26 May 2002 03:00:00 GMT  
 Replacing digits in first field

Quote:

> > Use "NS" as input and output field separator. You wanted short, so minimal
> > whitespace.

> > gawk 'BEGIN{FS=OFS="NS"} gsub(/[0-9]/,"X",$1)+1' infile

> Thanks! Exactly what I was looking for! Someone else mailed a similar
> solution but this one is even shorter. The '+1' after gsub is to always
> have a positive result so the inputline is always printed? Smart, real
> smart...

It is most certainly NOT smart; it is cleverness of the worst kind.
Why purposefully obfuscate such a rudimentary operation in awk?
The correct code is obvious and natural:

    { gsub(/[0-9]/, "X", $1) }

No purpose is served by artificially forcing to "true" (using
floating point arithmetic!) a substitution command in an actionless
pattern rather than simply putting the substitution command in
a patternless action, as God^H^H^HMessrs. Aho, Kernighan, and
Weinberger intended.

--
Jim Monty

Tempe, Arizona USA



Mon, 27 May 2002 03:00:00 GMT  
 Replacing digits in first field



% OT post by someone else:
% ...
% >> $ perl -pe 's/\d+/"X" x length $&/e' datafile
% >> $ perl -pe 's/^(\s*)(\d+)/$1 . ("X" x length $2)/e' datafile
% >> It's just TOO easy in Perl!
% >
% >GRRRR!!!
% >
% >I know that about everything I want is possible in Perl, but I don't
% >want to learn another scriptlanguage when about everything I want is
% >also possible in AWK... but sometimes, like with this problem, I don't
% >know how.
%
% In my day, they always said that it was impossible to tell a TECO macro
% apart from line noise.  The same is true of Perl.

TECO was very powerful, though.
--

Patrick TJ McPhee
East York  Canada



Mon, 27 May 2002 03:00:00 GMT  
 Replacing digits in first field

Quote:



> > > Use "NS" as input and output field separator. You wanted short, so minimal
> > > whitespace.

> > > gawk 'BEGIN{FS=OFS="NS"} gsub(/[0-9]/,"X",$1)+1' infile

> > Thanks! Exactly what I was looking for! Someone else mailed a similar
> > solution but this one is even shorter. The '+1' after gsub is to always
> > have a positive result so the inputline is always printed? Smart, real
> > smart...

> It is most certainly NOT smart; it is cleverness of the worst kind.
> Why purposefully obfuscate such a rudimentary operation in awk?
> The correct code is obvious and natural:

>     { gsub(/[0-9]/, "X", $1) }

> No purpose is served by artificially forcing to "true" (using
> floating point arithmetic!) a substitution command in an actionless
> pattern rather than simply putting the substitution command in
> a patternless action, as God^H^H^HMessrs. Aho, Kernighan, and
> Weinberger intended.

Uh, *here* is the correct code:

    { gsub(/[0-9]/, "X", $1); print }

As embarrassing as that mistake is to me, I stand by my argument
that

    gsub(/[0-9]/, "X", $1) + 1

is bad practice.

--
Jim Monty

Tempe, Arizona USA



Mon, 27 May 2002 03:00:00 GMT  
 Replacing digits in first field

Quote:



>>>Use "NS" as input and output field separator. You wanted
>>>short, so minimal whitespace.

>>> gawk 'BEGIN{FS=OFS="NS"} gsub(/[0-9]/,"X",$1)+1' infile

>>Thanks! Exactly what I was looking for! Someone else
>>mailed a similar solution but this one is even shorter.
>>The '+1' after gsub is to always have a positive result
>>so the inputline is always printed? Smart, real smart...

>It is most certainly NOT smart; it is cleverness of the
>worst kind. Why purposefully obfuscate such a rudimentary
>operation in awk?

>The correct code is obvious and natural:
>    { gsub(/[0-9]/, "X", $1) }

This 'correct' code produces no output. Add a print
statement after the gsub().

Matthijs asked specifically for _shortest_ awk solution.
Adding the 1 to the gsub() result in the pattern gives the
_shortest_ solution. I agree that actionless patterns are
obscure and should be avoided in general.

Quote:
>No purpose is served by artificially forcing to "true"
>(using floating point arithmetic!) a substitution command
>in an actionless pattern rather than simply putting the
>substitution command in a patternless action, as
>God^H^H^HMessrs. Aho, Kernighan, and Weinberger intended.

Aha! I'd guess Jim's giving us another attempt at sarcasm.
Quite right. Change the '+1' to '||1'.

* Sent from AltaVista http://www.altavista.com Where you can also find related Web Pages, Images, Audios, Videos, News, and Shopping.  Smart is Beautiful



Mon, 27 May 2002 03:00:00 GMT  
 Replacing digits in first field

Quote:


> > No purpose is served by artificially forcing to "true"
> > (using floating point arithmetic!) a substitution command
> > in an actionless pattern rather than simply putting the
> > substitution command in a patternless action, as
> > God^H^H^HMessrs. Aho, Kernighan, and Weinberger intended.

> Aha! I'd guess Jim's giving us another attempt at sarcasm.

No sarcasm intended.

Quote:
> Quite right. Change the '+1' to '||1'.

You're kidding, right? You don't seriously believe that trading a
Useless Use Of Floating Point Arithmetic for a Useless Use Of A
Comparison Expression somehow improves matters, do you? Besides,
it makes the script no longer the "shortest solution" as required.

--
Jim Monty

Tempe, Arizona USA



Mon, 27 May 2002 03:00:00 GMT  
 Replacing digits in first field

writes:

Quote:

>> Aha! I'd guess Jim's giving us another attempt at sarcasm.

>No sarcasm intended.

>> Quite right. Change the '+1' to '||1'.

>You're kidding, right?

Well, duh!!

(Now who's being irony impaired?)



Tue, 28 May 2002 03:00:00 GMT  
 Replacing digits in first field

Quote:



> >The correct code is obvious and natural:
> >    { gsub(/[0-9]/, "X", $1) }

> This 'correct' code produces no output. Add a print
> statement after the gsub().

> Matthijs asked specifically for _shortest_ awk solution.

Correct! And I'm still thankfull for the solution.

Quote:
> Adding the 1 to the gsub() result in the pattern gives the
> _shortest_ solution. I agree that actionless patterns are
> obscure and should be avoided in general.

Could someone explain this to me? I didn't understand at first how the
'shortest' solution worked ("Where is the print-statement? The braces
are missing!") so I looked into the man-page of AWK:
+ A pattern-action statement has the form:
+    pattern { action }
+ A missing { action } means print the line; a missing
+ pattern always matches.  Pattern-action statements are
+ separated by new-lines or semicolons.

So you can question readability, but 'obscure'? Looks like actionless
patterns is a documented feature.

Obscure or not, at least I learned a bit more about AWK. I still have
too much a programmer's view on AWK and not enough pattern-matching
experience...

--
    Matthijs van Aalten
    Philips Consumer Electronics IC-lab



Tue, 28 May 2002 03:00:00 GMT  
 Replacing digits in first field


Quote:



> > >The correct code is obvious and natural:
> > >    { gsub(/[0-9]/, "X", $1) }

> > This 'correct' code produces no output. Add a print
> > statement after the gsub().

> > Matthijs asked specifically for _shortest_ awk solution.

> Correct! And I'm still thankfull for the solution.

> > Adding the 1 to the gsub() result in the pattern gives the
> > _shortest_ solution. I agree that actionless patterns are
> > obscure and should be avoided in general.

> So you can question readability, but 'obscure'? Looks like actionless
> patterns is a documented feature.

Well, a matter of definition. I can't say that I find actionless
patterns as such hard to read or maintain. But cleverness lika using a
gsub and add 1 as a pattern just to avoid the print statement? Nah! I
think it was just a way to show that awk could do it with less bytes
than the perl suggestions that were posted. But all it really showed
was that you can obfuscate awk code if you want to. It also hid the
really good thing with the suggested awk solution; To use NS as the
field separator both for input and output. Doing that meant that the
reparsing of the fields didn't mangle the line.

Quote:
> Obscure or not, at least I learned a bit more about AWK. I still have
> too much a programmer's view on AWK and not enough pattern-matching
> experience...

Even more reason not to start out with "clever" pattern/action
constructs.

/Peter
--
-= Spam safe(?) e-mail address: pez68 at netscape.net =-

Sent via Deja.com http://www.deja.com/
Before you buy.



Tue, 28 May 2002 03:00:00 GMT  
 
 [ 35 post ]  Go to page: [1] [2] [3]

 Relevant Pages 

1. Start with a field on a form other than the first field

2. Is there a simple way to left-truncate a 9-digit numeric field to a 5-digit field?

3. testing the first digit of a number

4. Limited to 17 digits in a dbase field

5. 19-digit numeric fields

6. Fitting more than 18 digits in numeric field.

7. Teachers still using two digit date fields

8. The field width is too small for the number of fractional digits

9. entry field, to get only digits

10. Help: Replacing the first occurrence of a string in a file

11. Replacing first line in a file

12. replace a field in the input file

 

 
Powered by phpBB® Forum Software