Selecting Specific Records from a File 
Author Message
 Selecting Specific Records from a File

Hi, everybody. This is my first post to the group. Forgive me if I ask
a common question. I'm trying to write an awk script that "selects"
certain records from a file based on a set of keys stored in another
file. For example,

database file contains:
1 21 33
2 20 38
3 23 34
5 23 21
6 78 23
7 87 89

key file contains:
2
3
7

In other words, select records 2, 3, and 7 such that the output file
contains:

2 20 38
3 23 34
7 87 89

I'm completely stumbling on how to do this. I write lots of one-line
awk programs but I wouldn't call myself an awk programmer (that's
because I'm an astronomer!). I could write some awful thing but I am
trying for something elegant. The whole program I believe will be very
short once written decently -- perhaps a line or two. I've been
attempting to do it using the getline command but to no avail.

My real database contains thousands of entries and so speed is an
issue. I have already written programs that do what I need but they
are all slow. I want to use awk and not a compiled program like C. I
prefer to avoid binary files because I'm going to distribute the
scripts I am writing and cross-platform compatibility is an issue. And
there's no reason for an extra file if the same thing can be
accomplished as a line in a script.

Thanks in advance for your expertise,
Jason Quinn



Mon, 25 Apr 2005 04:19:13 GMT  
 Selecting Specific Records from a File

Quote:

> Hi, everybody. This is my first post to the group. Forgive me if I ask
> a common question. I'm trying to write an awk script that "selects"
> certain records from a file based on a set of keys stored in another
> file. For example,

> database file contains:
> 1 21 33
> 2 20 38
> 3 23 34
> 5 23 21
> 6 78 23
> 7 87 89

> key file contains:
> 2
> 3
> 7

> In other words, select records 2, 3, and 7 such that the output file
> contains:

> 2 20 38
> 3 23 34
> 7 87 89

         sed 's/./NR == &/' keyfile | awk -f - datafile

    Or:

         awk "`sed 's/./NR == &/' keyfile`" datafile

    Or:

         sed 's/$/&p/' keyfile | sed -nf - datafile

- Show quoted text -

Quote:
> I'm completely stumbling on how to do this. I write lots of one-line
> awk programs but I wouldn't call myself an awk programmer (that's
> because I'm an astronomer!). I could write some awful thing but I am
> trying for something elegant. The whole program I believe will be very
> short once written decently -- perhaps a line or two. I've been
> attempting to do it using the getline command but to no avail.

> My real database contains thousands of entries and so speed is an
> issue. I have already written programs that do what I need but they
> are all slow. I want to use awk and not a compiled program like C. I
> prefer to avoid binary files because I'm going to distribute the
> scripts I am writing and cross-platform compatibility is an issue. And
> there's no reason for an extra file if the same thing can be
> accomplished as a line in a script.

> Thanks in advance for your expertise,
> Jason Quinn

--
    Chris F.A. Johnson                        http://cfaj.freeshell.org
    ===================================================================
    My code (if any) in this post is copyright 2002, Chris F.A. Johnson
    and may be copied under the terms of the GNU General Public License


Mon, 25 Apr 2005 04:39:30 GMT  
 Selecting Specific Records from a File

% a common question. I'm trying to write an awk script that "selects"
% certain records from a file based on a set of keys stored in another
% file. For example,
%
% database file contains:
% 1 21 33
% 2 20 38
% 3 23 34
% 5 23 21
% 6 78 23
% 7 87 89
%
% key file contains:
% 2
% 3
% 7

It wasn't clear to me whether you wanted to select records where $1
matches the key file contents, or where the record number matches
it. The record number is stored in FNR by awk.

[..]

% short once written decently -- perhaps a line or two. I've been
% attempting to do it using the getline command but to no avail.

You probably want to load the keyfile into an array using getline:

 BEGIN {
    while ((getline < "keyfile") > 0)
       records[$1]
    close("keyfile")
 }

From that point it's just a matter of testing whether $1 (or FNR)
is an index in the array:

  $1 in records { do_whatever() }

% My real database contains thousands of entries and so speed is an
% issue. I have already written programs that do what I need but they

If you only want to print out the matching lines, you might
want to use join. Assuming you're matching on $1, the above script
is just

   join keyfile datafile

compared to
   awk 'BEGIN { while ((getline< "keyfile") > 0) r[$1]; close("keyfile") }
        $1 in r' datafile

Having said that, if you process a lot of data, you should install mawk,
because it's faster than gawk or standard awks.
--

Patrick TJ McPhee
East York  Canada



Mon, 25 Apr 2005 04:35:16 GMT  
 Selecting Specific Records from a File

Quote:

>          sed 's/./NR == &/' keyfile | awk -f - datafile

>     Or:

>          awk "`sed 's/./NR == &/' keyfile`" datafile

>     Or:

>          sed 's/$/&p/' keyfile | sed -nf - datafile

    Thank you! Thank you! Thank you!
    This is what I've been wanted for so long. (Well, almost. I had to
change the NR to $1.) The command I'm going to use is:

sed 's/./$1 == &/' keyfile | awk -f - datafile

Still happy,
Jason Quinn

PS I really need to get better with sed. It's one of those utilities
that I don't often think about and haven't gotten the hang of quite
yet.



Mon, 25 Apr 2005 12:24:43 GMT  
 Selecting Specific Records from a File

Quote:

>          sed 's/./NR == &/' keyfile | awk -f - datafile

>     Or:

>          awk "`sed 's/./NR == &/' keyfile`" datafile

>     Or:

>          sed 's/$/&p/' keyfile | sed -nf - datafile

     Oops. I jumped the gun a little bit. The order of the keyfile is
the important thing. Right now awk is running on the datafile and if
the order
of the keys differs from the order that they appear in the datafile,
things
get messed up. So I guess I have to run gawk on the keyfile? Confused
again.

Jason Quinn



Mon, 25 Apr 2005 13:24:24 GMT  
 Selecting Specific Records from a File

Quote:


>>          sed 's/./NR == &/' keyfile | awk -f - datafile

>>     Or:

>>          awk "`sed 's/./NR == &/' keyfile`" datafile

>>     Or:

>>          sed 's/$/&p/' keyfile | sed -nf - datafile

>      Oops. I jumped the gun a little bit. The order of the keyfile is
> the important thing. Right now awk is running on the datafile and if
> the order
> of the keys differs from the order that they appear in the datafile,
> things
> get messed up. So I guess I have to run gawk on the keyfile? Confused
> again.

       awk 'BEGIN { fmt = "$1 == %d {printf %c%d\t%%s%cn%c, $0}\n" }
                  { printf fmt, $1, 34, NR, 92, 34 }' keyfile |
              awk -f - datafile | sort -n | cut -f2-

--
    Chris F.A. Johnson                        http://cfaj.freeshell.org
    ===================================================================
    My code (if any) in this post is copyright 2002, Chris F.A. Johnson
    and may be copied under the terms of the GNU General Public License



Mon, 25 Apr 2005 14:29:22 GMT  
 Selecting Specific Records from a File

Quote:

> sed 's/./$1 == &/' keyfile | awk -f - datafile
> PS I really need to get better with sed. It's one of those utilities
> that I don't often think about and haven't gotten the hang of quite
> yet.

    I wouldn't worry about it. I don't use sed for much other than
    search and replace, as in the above example.

    For [almost] anything more complex, use awk or a shell script,
    it's far easier to understand when you come back to it later.

    But I do recommend the O'Reilly book, "sed and awk" for getting to
    know both utilities.

--
    Chris F.A. Johnson                        http://cfaj.freeshell.org
    ===================================================================
    My code (if any) in this post is copyright 2002, Chris F.A. Johnson
    and may be copied under the terms of the GNU General Public License



Mon, 25 Apr 2005 14:35:43 GMT  
 Selecting Specific Records from a File
Hello,


Quote:
>      Oops. I jumped the gun a little bit. The order of the keyfile is
> the important thing.  ...

what about something like this:
============= program.awk =========
BEGIN {
        keyfile = "keyfile"
        while (getline <keyfile)
                keys[$0]
        close(keyfile)

Quote:
}

$1 in keys { keys[$1]=$0 }

END {
        while (getline <keyfile)
                print keys[$0]
        close(keyfile)

Quote:
}

=========

Call this program as this:

        awk -f program.awk datafile

Isn't this quick enough?

The program has not been tested but I hope it's easily understandable.

Have a nice day,
        Stepan Kasal



Mon, 25 Apr 2005 19:15:35 GMT  
 Selecting Specific Records from a File

%      Oops. I jumped the gun a little bit. The order of the keyfile is
% the important thing. Right now awk is running on the datafile and if
% the order
% of the keys differs from the order that they appear in the datafile,
% things
% get messed up. So I guess I have to run gawk on the keyfile? Confused
% again.

wrt my earlier answer, join won't help you here.

I would make up two arrays from the keyfile, one to test whether a line
should be printed, and one to give the output order. You can do what
one of the other answers suggests, but with a single pass through the
key file:

  BEGIN {
    while ((getline < "keyfile") > 0) {
      keys[$1]              # indices are the needed keys
      key[++nkey] = $1      # array lists the keys in needed order
    }
    close("keyfile")
  }

  END {
    for (i = 1; i<= nkey; i++)
      if (key[i] in val)    # this prevents printing a blank line if key
                            # was not in the data set
         print val[key[i]]
  }

  $1 in keys { val[$1] = $0 }
--

Patrick TJ McPhee
East York  Canada



Tue, 26 Apr 2005 00:26:04 GMT  
 Selecting Specific Records from a File

Quote:

>        awk 'BEGIN { fmt = "$1 == %d {printf %c%d\t%%s%cn%c, $0}\n" }
>                   { printf fmt, $1, 34, NR, 92, 34 }' keyfile |
>               awk -f - datafile | sort -n | cut -f2-

    This works.

Jason Quinn



Tue, 26 Apr 2005 02:20:28 GMT  
 Selecting Specific Records from a File

Quote:

> what about something like this:
> ============= program.awk =========
> BEGIN {
>    keyfile = "keyfile"
>    while (getline <keyfile)
>            keys[$0]
>    close(keyfile)
> }

> $1 in keys { keys[$1]=$0 }

> END {
>    while (getline <keyfile)
>            print keys[$0]
>    close(keyfile)
> }
> =========

> Call this program as this:

>    awk -f program.awk datafile

  This works too.

Jason



Tue, 26 Apr 2005 02:21:16 GMT  
 Selecting Specific Records from a File

Quote:

>>          sed 's/./NR == &/' keyfile | awk -f - datafile
>>     Or:
>     Oops. I jumped

This is my solution; thought was correct .Any comment appreciated .
Works on the given files, also when they are a bit enlarged. Don't know
with thousands of lines, memory usage or speed.
I also would like to ask a question.
When awk writes a $0-array of the inputfile, is that an array of
pointers ?

# file wkey.awk #
#
{ if (getline aa[NR] < "key" ) k++; bb[$1]=$0 }
END { close("key"); for(i=1;i<=k;i++) print bb[aa[i]] }        

 awk -f wkey.awk inputfile

cor



Tue, 26 Apr 2005 05:49:29 GMT  
 Selecting Specific Records from a File

Quote:

> Hi, everybody. This is my first post to the group. Forgive me if I ask

I cant recall that i have seen a better formulated question ever.
If everyone could ask question as you just have done. Good job!

I dont have a answer for your question (which clearly states an awk-solution).
But if you can live with a minor modification of your key-file you can
get away with an easy-to-type commandline using grep.

        bash$ cat key
        ^2
        ^3
        ^7
        bash$ cat data
        1 21 33
        2 20 38
        3 23 34
        5 23 21
        6 78 23
        7 87 89
        bash$ grep -f key data
        2 20 38
        3 23 34
        7 87 89
        bash$

( Credits to Bill Marcum in comp.unix.shell )
//Mats

--
My code (if any) in this message are Copyright (C) 2002 Mats Blomstrand
and licensed under GNU GPL, http://www.gnu.org/licenses/gpl.html



Tue, 26 Apr 2005 20:09:28 GMT  
 Selecting Specific Records from a File

Clarification. To avoid this:

        bash$ grep -f key data
        2 20 38
        3 23 34
        7 87 89
        7xx should not be here
        bash$

You need to edit the key-file to have '^' in front of the key and a
'space' after it (or whatever separator you have in data-file).
//Mats

--
My code (if any) in this message are Copyright (C) 2002 Mats Blomstrand
and licensed under GNU GPL, http://www.gnu.org/licenses/gpl.html



Tue, 26 Apr 2005 20:39:16 GMT  
 Selecting Specific Records from a File
gawk "$1~/^2|^3|^7/{print}" file


Tue, 26 Apr 2005 22:30:55 GMT  
 
 [ 20 post ]  Go to page: [1] [2]

 Relevant Pages 

1. ?-Save selected Record as text file.

2. Archive selected records from one file into another

3. Selecting 1 or more records from a file

4. Selected fields for a specific report

5. Selecting a specific Control field in a Update form

6. Auto Select Specific Tab

7. Selecting specific printer

8. Select IP address from specific adapter

9. select.select() on Regular Files?

10. Open Browse to specific record

11. counting specific records w POINTER() or POSITION()

12. OOP question: how to call a method from a specific record of a QUEUE with CLASSes

 

 
Powered by phpBB® Forum Software