I thought I was an AWK wiz until... 
Author Message
 I thought I was an AWK wiz until...

I've got this string that is variable space delimited. Looks like:

AA   35  0    7    9 2  5  "TEXT     OUT     HERE"

Quotes are shown for clairity. They do not exsist in the data stream.
It is 7 fixed alphanumeric fields with the 8th field being a variable length
string with imbedded spaces. So of course AWK NF = 10 when this record is
encountered.

What I need from the record is the quoted part of the string - with the same
number of spaces as the original.

So I can't just use NF to stick $7 $8 $9 back together with the FS
character. I quess I could match the first 7, find the length and then
SUBSTR the rest? Anybody have the magic (expression?) bullet for this?



Sun, 22 Apr 2001 03:00:00 GMT  
 I thought I was an AWK wiz until...

Quote:

>I've got this string that is variable space delimited. Looks like:

>AA   35  0    7    9 2  5  "TEXT     OUT     HERE"

>Quotes are shown for clairity. They do not exsist in the data stream.
>It is 7 fixed alphanumeric fields with the 8th field being a variable length
>string with imbedded spaces. So of course AWK NF = 10 when this record is
>encountered.

>What I need from the record is the quoted part of the string - with the same
>number of spaces as the original.

>So I can't just use NF to stick $7 $8 $9 back together with the FS
>character. I quess I could match the first 7, find the length and then
>SUBSTR the rest? Anybody have the magic (expression?) bullet for this?

This seems to work (but I counted 7 prior space delimited fields):

gawk '{sub(/^[^ ]+ *[^ ]+ *[^ ]+ *[^ ]+ *[^ ]+ *[^ ]+ *[^ ]+ */,"");
       print}' infile

You could also define the string first:

gawk '{a=$0;
       sub(/^[^ ]+ *[^ ]+ *[^ ]+ *[^ ]+ *[^ ]+ *[^ ]+ *[^ ]+ */,"",a);
       print a}' infile

On this infile:

AA   35  0    7    9 2  5  "TEXT     OUT     HERE"

It produced this output:

"TEXT     OUT     HERE"

If the fields are fixed, and you know where the field you're looking
for starts, then you could use the substr to get it and the remainder
of the line.

Chuck Demas
Needham, Mass.

--
  Eat Healthy    |   _ _   | Nothing would be done at all,

  Die Anyway     |    v    | That no one could find fault with it.



Mon, 23 Apr 2001 03:00:00 GMT  
 I thought I was an AWK wiz until...
I would try cut command, something like
cut -d' ' -f8-
might work for you


Quote:

>I've got this string that is variable space delimited. Looks like:

>AA   35  0    7    9 2  5  "TEXT     OUT     HERE"

>Quotes are shown for clairity. They do not exsist in the data stream.
>It is 7 fixed alphanumeric fields with the 8th field being a variable
le
>ngth
>string with imbedded spaces. So of course AWK NF = 10 when this record
i
>s
>encountered.

>What I need from the record is the quoted part of the string - with the
>same
>number of spaces as the original.

>So I can't just use NF to stick $7 $8 $9 back together with the FS
>character. I quess I could match the first 7, find the length and then
>SUBSTR the rest? Anybody have the magic (expression?) bullet for this?



Mon, 23 Apr 2001 03:00:00 GMT  
 I thought I was an AWK wiz until...
How about '{ print substr($0, index($0, $8)) }' ?  This will not work
though if any of the previous fields (i.e. $1 .. $7) is identical to
$8; I don't know whether this is the case in your input files.

        Gert

Quote:

>I've got this string that is variable space delimited. Looks like:

>AA   35  0    7    9 2  5  "TEXT     OUT     HERE"

>Quotes are shown for clairity. They do not exsist in the data stream.
>It is 7 fixed alphanumeric fields with the 8th field being a variable length
>string with imbedded spaces. So of course AWK NF = 10 when this record is
>encountered.

>What I need from the record is the quoted part of the string - with the same
>number of spaces as the original.

>So I can't just use NF to stick $7 $8 $9 back together with the FS
>character. I quess I could match the first 7, find the length and then
>SUBSTR the rest? Anybody have the magic (expression?) bullet for this?

--

============================================================================
Gert Durieux

Departement Germaanse Taal- en Letterkunde
Universiteitsplein 1 - A 1.27                   Phone   + 32 3 820 27 66
B-2610 Wilrijk                                  Fax     + 32 3 820 27 61
============================================================================



Mon, 23 Apr 2001 03:00:00 GMT  
 I thought I was an AWK wiz until...
How about the following script:

/"/ {

    start = 0
    slen  = 0

    for ( i = 1; i <= NF; i++ )
    {
        if ( match( $i, "^\"" ))
            start = index( $0, $i )
        else if ( match( $i, "\"$" ))
        {
            slen = index( $0, $i ) + RLENGTH
            print substr( $0, start, slen )
        }
    }

Quote:
}

{ next }

It has the advantage that is supports multiple sets of quoted
strings and it doesn't matter where the quoted strings occur
on the line!

--
Best regards,
 _ __                      _    ,   _ _ _
' )  )     /         _/_  ' )  /   ' ) ) )
 /--' ____/___/> __  /     /--/     / / / __,_  __  o _   ______
/  \_(_) /_) (__/ (_<__   /  ( o   / ' (_(_) (_/ (_<_/_)_(_) / <_

Robert H. Morrison                      Tel:   +49 721 9628 167
Software Development, Basis Team        FAX:   +49 721 9628 149



Quote:
> I've got this string that is variable space delimited. Looks like:

> AA   35  0    7    9 2  5  "TEXT     OUT     HERE"

> Quotes are shown for clairity. They do not exsist in the data stream.
> It is 7 fixed alphanumeric fields with the 8th field being a variable
length
> string with imbedded spaces. So of course AWK NF = 10 when this record is
> encountered.

> What I need from the record is the quoted part of the string - with the
same
> number of spaces as the original.

> So I can't just use NF to stick $7 $8 $9 back together with the FS
> character. I quess I could match the first 7, find the length and then
> SUBSTR the rest? Anybody have the magic (expression?) bullet for this?



Mon, 23 Apr 2001 03:00:00 GMT  
 I thought I was an AWK wiz until...

Quote:

> I've got this string that is variable space delimited. Looks like:

> AA   35  0    7    9 2  5  "TEXT     OUT     HERE"

> Quotes are shown for clairity. They do not exsist in the data stream.
> It is 7 fixed alphanumeric fields with the 8th field being a variable length
> string with imbedded spaces. So of course AWK NF = 10 when this record is
> encountered.

> What I need from the record is the quoted part of the string - with the same
> number of spaces as the original.

> So I can't just use NF to stick $7 $8 $9 back together with the FS
> character. I quess I could match the first 7, find the length and then
> SUBSTR the rest? Anybody have the magic (expression?) bullet for this?

How's this?

$ echo "AA   35  0    7    9 2  5  \"TEXT     OUT     HERE\"" |

Quote:
>    gawk 'BEGIN { FIELDWIDTHS = "2 5 3 5 5 2 3 2 256" }
>                { for (i=1;i<10;i++) printf "%d: <%s>\n",i, $i }'

1: <AA>
2: <   35>
3: <  0>
4: <    7>
5: <    9>
6: < 2>
7: <  5>
8: <  >
9: <"TEXT     OUT     HERE">

Opinions expressed herein are my own and may not represent those of my employer.



Mon, 23 Apr 2001 03:00:00 GMT  
 I thought I was an AWK wiz until...
This ought to work for you (tested with your 1 sample line OK)

{
 match($0,/[a-zA-Z0-9]+ +[a-zA-Z0-9]+ +[a-zA-Z0-9]+ +[a-zA-Z0-9]+
+[a-zA-Z0-9]+ +[a-zA-Z0-9]+ +[a-zA-Z0-9]+ +/)
 print substr($0,RLENGTH+1)

Quote:
}

or if field 1 is alphabetic only and fields 2-7 are numeric only in the
reaminder of your data, you can do

{
 match($0,/[a-zA-Z]+ +[0-9]+ +[0-9]+ +[0-9]+ +[0-9]+ +[0-9]+ +[0-9]+ +/)
 print substr($0,RLENGTH+1)

Quote:
}

Cesar
--
Please remove the UPPERCASE characters from my e-mail address for the real
thing
Quote:

>I would try cut command, something like
>cut -d' ' -f8-
>might work for you


>>I've got this string that is variable space delimited. Looks like:

>>AA   35  0    7    9 2  5  "TEXT     OUT     HERE"

>>Quotes are shown for clairity. They do not exsist in the data stream.
>>It is 7 fixed alphanumeric fields with the 8th field being a variable
>le
>>ngth
>>string with imbedded spaces. So of course AWK NF = 10 when this record
>i
>>s
>>encountered.

>>What I need from the record is the quoted part of the string - with the
>>same
>>number of spaces as the original.

>>So I can't just use NF to stick $7 $8 $9 back together with the FS
>>character. I quess I could match the first 7, find the length and then
>>SUBSTR the rest? Anybody have the magic (expression?) bullet for this?



Mon, 23 Apr 2001 03:00:00 GMT  
 I thought I was an AWK wiz until...
.
    If the first seven fields are fixed-length and only the last
field is variable, I think you'd be better to go with cut instead
of awk since you know where the data starts. Being lazy, I'd use...

cut -c28-1000 funk.dat   ## ...and hope that none of the lines
                          # are over 1000 characters in length.

    If you're especially {*filter*}or you need it to work the first
time, find the maximum line length before you start the cut and
then use that number for the end of the range.

    Matt

Quote:

> I've got this string that is variable space delimited. Looks like:

> AA   35  0    7    9 2  5  "TEXT     OUT     HERE"

> Quotes are shown for clairity. They do not exsist in the data stream.
> It is 7 fixed alphanumeric fields with the 8th field being a variable length
> string with imbedded spaces. So of course AWK NF = 10 when this record is
> encountered.

> What I need from the record is the quoted part of the string - with the same
> number of spaces as the original.

> So I can't just use NF to stick $7 $8 $9 back together with the FS
> character. I quess I could match the first 7, find the length and then
> SUBSTR the rest? Anybody have the magic (expression?) bullet for this?



Mon, 23 Apr 2001 03:00:00 GMT  
 I thought I was an AWK wiz until...
If you want to remove the first 28 characters per line, and do not know the
length
of lines then also

sed 's/.\{28\}//' infile > outfile

should do the trick.

not tested
 LMS
free sed/awk book:
      ftp://ftp.u-aizu.ac.jp/u-aizu/doc/Tech-Report/1997/97-2-007.ps.gz
      ftp://ftp.u-aizu.ac.jp/u-aizu/doc/Tech-Report/1997/97-2-007.tar.gz



Tue, 24 Apr 2001 03:00:00 GMT  
 I thought I was an AWK wiz until...
Or just

cut -c28- infile >outfile

Cesar

--
Please remove the UPPERCASE characters from my e-mail address for the real
thing

Quote:

>If you want to remove the first 28 characters per line, and do not know the
>length
>of lines then also

>sed 's/.\{28\}//' infile > outfile

>should do the trick.

>not tested
> LMS
>free sed/awk book:
>      ftp://ftp.u-aizu.ac.jp/u-aizu/doc/Tech-Report/1997/97-2-007.ps.gz
>      ftp://ftp.u-aizu.ac.jp/u-aizu/doc/Tech-Report/1997/97-2-007.tar.gz



Tue, 24 Apr 2001 03:00:00 GMT  
 I thought I was an AWK wiz until...

Quote:

>I've got this string that is variable space delimited. Looks like:

>AA   35  0    7    9 2  5  "TEXT     OUT     HERE"

>Quotes are shown for clairity. They do not exsist in the data stream.
>It is 7 fixed alphanumeric fields with the 8th field being a variable
length
>string with imbedded spaces. So of course AWK NF = 10 when this record is
>encountered.

>What I need from the record is the quoted part of the string - with the
same
>number of spaces as the original.

>So I can't just use NF to stick $7 $8 $9 back together with the FS
>character. I quess I could match the first 7, find the length and then
>SUBSTR the rest? Anybody have the magic (expression?) bullet for this?

I ran into a similar situation.  What I ended up doing to solve it was
defining a
new variable that contained the random format.  Something like:

{random=substr($0,29,21)    \
print $4,$5,random}

I had the advantage of having fixed record length files.  I'm not sure how a
variable
record length would be handled.  But in the worst case you could write the
file to
a fixed record length, OR even better, change the format to comma separated
variable.  With the latter you can use:

nawk -F, '{print $4,$5,$7}'  file

That would handle it very neatly.  I've done all three just depends on how
big the
file is (if you have to edit it) or how it's generated (can you change the
output format).

Good luck.
Erwin



Sat, 28 Apr 2001 03:00:00 GMT  
 
 [ 11 post ] 

 Relevant Pages 

1. I am thinking about becoming a Head Hunter

2. I am thinking of if this is possible

3. Messy Messy..Making awk think a string is a list of files

4. App.Wiz.One-to-many functionality ??

5. looking for a wiz

6. any Financial Wiz Tclers out there?

7. To think or not to think ;-)

8. Thinking about thinking

9. Thinking about thinking

10. Thinking about thinking

 

 
Powered by phpBB® Forum Software