last field of array/case insensitive pattern matching query 
Author Message
 last field of array/case insensitive pattern matching query

I'm trying to write a script that will strip out unneccessary cookies
from the end of apache log lines.  At the moment, apache is churning out
lines of 10 fields separated by a "|".  The 10th field is the one that
can contain any number of cookies, separated by ";".  I want field 10 to
only contain "mycookie")

In the code below, how do I represent the last index in the cookies array for
the end value in the for loop?

Also, is there a shortcut to the string matching pattern in the if
statement that would make it match "MyCookie", "mycookie" or "MYCOOKIE"?

{
        split($0,fields,"|")
        split(fields[10]
        fields[10] = ""
        for (i = 0; i <= ??? ; i++)
        {
                if ( cookies[i] ~ /[Mm][Yy][Cc][Oo][Oo][Kk][Ii]{Ee].*/ )
                {
                        fields[10] = cookies[i]
                }
        }
        print fields[1] fields[2] fields[3] fields[4] fields[5] fields[6]
fields[7] fields[8] fields[9] fields[10]

Quote:
}

Regards,

Richard Cross
Unix Systems Engineer
Freeserve.com PLC



Sun, 07 Dec 2003 18:56:45 GMT  
 last field of array/case insensitive pattern matching query
Okay, so I've managed to get round these problems by using:

for (n in cookies)
{
        for (i = 0; i <= n ; i++)
        {
                if ( tolower(cookies[i]) ~ /mycookie.*/ )
                {
                        fields[10] = cookies[i]
                }
        }

Quote:
}

However, sometimes cookies[i] contains "mycookie", "mycookie or even
mycookie".

Is there any way I can strip out both quotes where they exist and then
reapply quotes on either side again for consistency?

I guess in Perl this would be:

$fields[10] =~ s/\"//g;
$fields[10] = "\"$fields[10]\"";

or something.  What's the awk veriosn of this?


Quote:

> I'm trying to write a script that will strip out unneccessary cookies
> from the end of apache log lines.  At the moment, apache is churning out
> lines of 10 fields separated by a "|".  The 10th field is the one that
> can contain any number of cookies, separated by ";".  I want field 10 to
> only contain "mycookie")

> In the code below, how do I represent the last index in the cookies
> array for the end value in the for loop?

> Also, is there a shortcut to the string matching pattern in the if
> statement that would make it match "MyCookie", "mycookie" or "MYCOOKIE"?

> {
>    split($0,fields,"|")
>    split(fields[10]
>    fields[10] = ""
>    for (i = 0; i <= ??? ; i++)
>    {
>            if ( cookies[i] ~ /[Mm][Yy][Cc][Oo][Oo][Kk][Ii]{Ee].*/ ) {
>                    fields[10] = cookies[i]
>            }
>    }
>    print fields[1] fields[2] fields[3] fields[4] fields[5] fields[6]
> fields[7] fields[8] fields[9] fields[10] }

> Regards,

> Richard Cross
> Unix Systems Engineer
> Freeserve.com PLC



Sun, 07 Dec 2003 21:27:56 GMT  
 last field of array/case insensitive pattern matching query

Quote:

> Okay, so I've managed to get round these problems by using:

> for (n in cookies)
> {
>    for (i = 0; i <= n ; i++)
>    {
>            if ( tolower(cookies[i]) ~ /mycookie.*/ )
>            {
>                    fields[10] = cookies[i]
>            }
>    }
> }

You don't want to do that. You're unneccessarily iterating slices
of the cookies array multiple times. The value of n (i.e., the
number of elements in the cookies array) is given by the split()
function used to create the array:

    n = split(fields[10], cookies, /;/)

Array indexes start at 1 in awk, not 0 as in some other languages.
Also, you don't need to continue iterating the array once you've
matched /mycookie/, right? Finally, tacking .* onto the end of the
regular expression pattern serves no purpose but to consume time.
So your for loop should more look like this:

    for (i = 1; i <= n; i++) {
        if (tolower(cookies[i]) ~ /mycookie/) {
            fields[10] = "\"" cookies[i] "\""
            break
        }
    }

(Some versions of awk might require two backslashes, thus: "\\"".)

Quote:
> However, sometimes cookies[i] contains "mycookie", "mycookie or even
> mycookie".

> Is there any way I can strip out both quotes where they exist and then
> reapply quotes on either side again for consistency?

> I guess in Perl this would be:

> $fields[10] =~ s/\"//g;
> $fields[10] = "\"$fields[10]\"";

> or something.

Or something.

    $fields[9] =~ s/^"//;
    $fields[9] =~ s/"$//;
    $fields[9] = "\"$fields[9]\"";

Quote:
> What's the awk version of this?

Don't globally strip double quotes. Remove only the ones that occur
at the beginning and end of the string:

    sub(/^"/, "", fields[10])
    sub(/"$/, "", fields[10])

Then, later:

    fields[10] = "\"" fields[10] "\""

--
Jim Monty

Tempe, Arizona USA



Mon, 08 Dec 2003 01:25:04 GMT  
 last field of array/case insensitive pattern matching query

Quote:

> Is there any way I can strip out both quotes where they exist and then
> reapply quotes on either side again for consistency?

> I guess in Perl this would be:

> $fields[10] =~ s/\"//g;
> $fields[10] = "\"$fields[10]\"";

> or something.  What's the awk veriosn of this?

gsub(/\"/,"",fields[10])
fields[10] = "\"" fields[10] "\""

Regards...
                Michael



Mon, 08 Dec 2003 01:25:19 GMT  
 last field of array/case insensitive pattern matching query

Quote:


> > Okay, so I've managed to get round these problems by using:

> > for (n in cookies)
> > {
> >       for (i = 0; i <= n ; i++)
> >       {
> >               if ( tolower(cookies[i]) ~ /mycookie.*/ )
> >               {
> >                       fields[10] = cookies[i]
> >               }
> >       }
> > }

> You don't want to do that. You're unneccessarily iterating slices
> of the cookies array multiple times. The value of n (i.e., the
> number of elements in the cookies array) is given by the split()
> function used to create the array:

>     n = split(fields[10], cookies, /;/)

> Array indexes start at 1 in awk, not 0 as in some other languages.
> Also, you don't need to continue iterating the array once you've
> matched /mycookie/, right? Finally, tacking .* onto the end of the
> regular expression pattern serves no purpose but to consume time.
> So your for loop should more look like this:

>     for (i = 1; i <= n; i++) {
>         if (tolower(cookies[i]) ~ /mycookie/) {
>             fields[10] = "\"" cookies[i] "\""
>             break
>         }
>     }

> (Some versions of awk might require two backslashes, thus: "\\"".)

Actually, you don't really care about the order in which you
traverse the array, so this works just as well:

    for (i in cookies) {
        if (tolower(cookies[i]) ~ /mycookie/) {
            fields[10] = "\"" cookies[i] "\""
            break
        }
    }

Why are you explictly splitting the record? Why not just let awk
do the field splitting for you?

    BEGIN {
        FS = OFS = "|"
    }

    {
        # manipulate $10 (or $NF) instead of fields[10]
        # ...

        # then just print the reconstructed record
        print
    }

--
Jim Monty

Tempe, Arizona USA



Mon, 08 Dec 2003 02:07:50 GMT  
 
 [ 5 post ] 

 Relevant Pages 

1. Case-insensitive pattern matching for HTML processing

2. re module -- case insensitive matching.

3. case insensitive match with regexp object

4. Query on parallel pattern matching in FL's

5. Case Sensitivity (actually case-insensitive file-systems)

6. need help on pattern matching with array elements

7. iss-matching - the free Regular Expression / Pattern Matching cluster

8. Match a field of an array ?

9. print last field in a record, 2nd last.

10. case insensitive split()

11. Database search case-insensitive?

12. sort(), but than case-insensitive

 

 
Powered by phpBB® Forum Software