Comparing value in an input field to any value from another file 
Author Message
 Comparing value in an input field to any value from another file

Here's my problem in brief, and I think that gawk can help solve it:

I have two files. File A is a data file, layed out as
99,89765,Smith,John,,,,,,
99,6754,Anderson,Rick,,,,,
00,124588,Barney,Joan,,,,,,
etc
etc (about 22,000 records)

In this file, $2 is a unique record ID.

File B has a list of a few hundred record IDs,one after the other:
255476
3457
204367
etc
etc

I need to create two files from file A:

     1.If the value of $2 in file A exactly matches any value from
file B, $0 should be written to a new file, ie,

     if $2 == [value from file B] print $0 >> nocf.lst

     2. Likewise, if the value of $2 does not equal a value from file
B, $0 should be written to another file, ie,

     if $2 != [value from file B] print $0 >> cf.lst

I think the best way to handle this problem would be to declare a
variable that would read its value from the list in file B, and then
compare that value to $2 in the input data file.

I am stuck though, b/c I am not sure quite how to proceed with this,
ie which function in gawk to call to accomplish this task.

I also apologize for the lack of example code; I am at work with no
access to gawk.....I'll be home later (after 10pm EST), and will try
to get some code down then, but in the meantime I would appreciate any
help or pointers that anyone would be willing to share with me.....

I imagine I could also cobble something together with 'join', ie
printing a join of records from file A that match some value in file
B, and then the ?anti-join (my own word) of those records in file A
not matching anything in B.

Would that be the simplest method (using join)? Or would gawk be more
parsimonious and simpler?

Thanks in advance for any help or pointers.

Chris Leslie
chrisleslie at mindspring dot com



Tue, 24 Feb 2004 05:05:46 GMT  
 Comparing value in an input field to any value from another file
Christopher Leslie wrote at Thursday 06 September 2001 23:05 like
only he can:

Quote:
> Here's my problem in brief, and I think that gawk can help solve it:

> I have two files. File A is a data file, layed out as
> 99,89765,Smith,John,,,,,,
> 99,6754,Anderson,Rick,,,,,
> 00,124588,Barney,Joan,,,,,,
> etc
> etc (about 22,000 records)

> In this file, $2 is a unique record ID.

> File B has a list of a few hundred record IDs,one after the other:
> 255476
> 3457
> 204367
> etc
> etc

> I need to create two files from file A:

>      1.If the value of $2 in file A exactly matches any value from
> file B, $0 should be written to a new file, ie,

>      if $2 == [value from file B] print $0 >> nocf.lst

>      2. Likewise, if the value of $2 does not equal a value from
>      file
> B, $0 should be written to another file, ie,

>      if $2 != [value from file B] print $0 >> cf.lst

> I think the best way to handle this problem would be to declare a
> variable that would read its value from the list in file B, and then
> compare that value to $2 in the input data file.

> I am stuck though, b/c I am not sure quite how to proceed with this,
> ie which function in gawk to call to accomplish this task.

> I also apologize for the lack of example code; I am at work with no
> access to gawk.....I'll be home later (after 10pm EST), and will try
> to get some code down then, but in the meantime I would appreciate
> any help or pointers that anyone would be willing to share with
> me.....

> I imagine I could also cobble something together with 'join', ie
> printing a join of records from file A that match some value in file
> B, and then the ?anti-join (my own word) of those records in file A
> not matching anything in B.

> Would that be the simplest method (using join)? Or would gawk be
> more parsimonious and simpler?

> Thanks in advance for any help or pointers.

> Chris Leslie
> chrisleslie at mindspring dot com

for i in `cat B`; do grep $i A;done >> newfile

May work or not, depending on your data.

for i in `cat A`; do awk -F, -v s=$i '{if ($2~s)print}' B;done >>  \
newfile

Would be better, using GNU awk and bash.

Michael Heiming



Tue, 24 Feb 2004 05:38:48 GMT  
 Comparing value in an input field to any value from another file
Michael Heiming wrote at Thursday 06 September 2001 23:38 like only
he can:

Quote:
> Christopher Leslie wrote at Thursday 06 September 2001 23:05 like
> only he can:

>> Here's my problem in brief, and I think that gawk can help solve
>> it:

>> I have two files. File A is a data file, layed out as
>> 99,89765,Smith,John,,,,,,
>> 99,6754,Anderson,Rick,,,,,
>> 00,124588,Barney,Joan,,,,,,
>> etc
>> etc (about 22,000 records)

>> In this file, $2 is a unique record ID.

>> File B has a list of a few hundred record IDs,one after the other:
>> 255476
>> 3457
>> 204367
>> etc
>> etc

>> I need to create two files from file A:

>>      1.If the value of $2 in file A exactly matches any value from
>> file B, $0 should be written to a new file, ie,

>>      if $2 == [value from file B] print $0 >> nocf.lst

>>      2. Likewise, if the value of $2 does not equal a value from
>>      file
>> B, $0 should be written to another file, ie,

>>      if $2 != [value from file B] print $0 >> cf.lst

>> I think the best way to handle this problem would be to declare a
>> variable that would read its value from the list in file B, and
>> then compare that value to $2 in the input data file.

>> I am stuck though, b/c I am not sure quite how to proceed with
>> this, ie which function in gawk to call to accomplish this task.

>> I also apologize for the lack of example code; I am at work with no
>> access to gawk.....I'll be home later (after 10pm EST), and will
>> try to get some code down then, but in the meantime I would
>> appreciate any help or pointers that anyone would be willing to
>> share with me.....

>> I imagine I could also cobble something together with 'join', ie
>> printing a join of records from file A that match some value in
>> file B, and then the ?anti-join (my own word) of those records in
>> file A not matching anything in B.

>> Would that be the simplest method (using join)? Or would gawk be
>> more parsimonious and simpler?

>> Thanks in advance for any help or pointers.

>> Chris Leslie
>> chrisleslie at mindspring dot com

> for i in `cat B`; do grep $i A;done >> newfile

> May work or not, depending on your data.

> for i in `cat A`; do awk -F, -v s=$i '{if ($2~s)print}' B;done >>  \
> newfile

> Would be better, using GNU awk and bash.

> Michael Heiming

Should be like this, got confused with your A B...somehow reminds me
of microsoft....

for i in `cat B`; do awk -F, -v s=$i '{if ($2~s)print}' A;done >>  \
newfile

Michael Heiming



Tue, 24 Feb 2004 05:43:55 GMT  
 Comparing value in an input field to any value from another file
...

Quote:
>I need to create two files from file A:

>     1.If the value of $2 in file A exactly matches any value from
>file B, $0 should be written to a new file, ie,

>     if $2 == [value from file B] print $0 >> nocf.lst

>     2. Likewise, if the value of $2 does not equal a value from file
>B, $0 should be written to another file, ie,

>     if $2 != [value from file B] print $0 >> cf.lst

If you're on a unix or unix-like system, use join. If you're on Win32, the
GNU textutils package contains join.

If you really want to use awk,

'splitter.awk - invoke as: awk -f splitter.awk FileA
BEGIN { FS = ","; f = "FileB"; while ((getline s < f) > 0) a[s]; close(f) }
{ print >> (($2 in a) ? "nocf.lst" : "cf.lst") }



Tue, 24 Feb 2004 06:37:25 GMT  
 Comparing value in an input field to any value from another file
Another way (rather crude but works) i did this kind of thing before is -

num_lines_A=`wc A | gawk '{print $1}'`;
cat A B | gawk '{
                     if (NR < '$num_lines_A') {
                           ArrayA[NR]=$1;
                     }
                     else {
                     if ($2 in ArrayA) {
                        print $0 > file1
                     }
                     else print $0 >  file2
                }'

<above code is not tested .. just wanted to give the method>
hope this helps ...

--mouli

Quote:


> ...
> >I need to create two files from file A:

> >     1.If the value of $2 in file A exactly matches any value from
> >file B, $0 should be written to a new file, ie,

> >     if $2 == [value from file B] print $0 >> nocf.lst

> >     2. Likewise, if the value of $2 does not equal a value from file
> >B, $0 should be written to another file, ie,

> >     if $2 != [value from file B] print $0 >> cf.lst

> If you're on a unix or unix-like system, use join. If you're on Win32, the
> GNU textutils package contains join.

> If you really want to use awk,

> 'splitter.awk - invoke as: awk -f splitter.awk FileA
> BEGIN { FS = ","; f = "FileB"; while ((getline s < f) > 0) a[s]; close(f) }
> { print >> (($2 in a) ? "nocf.lst" : "cf.lst") }



Thu, 26 Feb 2004 05:50:40 GMT  
 Comparing value in an input field to any value from another file

Quote:
>  for i in `cat B`; do awk -F, -v s=$i '{if ($2~s)print}' A;done >>  \
>  newfile

i had to try it to believe it!  the first thing caught my attention was
"for i in `cat some-large-file`; do ... done", because i expected some sort
of line buffer overflow, but it seems trusty ol' bash() buffers everything
in some file.

does anybody know if this true?  does sh() do the same?

--
clemens



Mon, 15 Mar 2004 23:13:27 GMT  
 Comparing value in an input field to any value from another file

Quote:


>>  for i in `cat B`; do awk -F, -v s=$i '{if ($2~s)print}' A;done >>  \
>>  newfile

>i had to try it to believe it!  the first thing caught my attention was
>"for i in `cat some-large-file`; do ... done", because i expected some sort
>of line buffer overflow, but it seems trusty ol' bash() buffers everything
>in some file.

>does anybody know if this true?  does sh() do the same?

bash has no size limit for the expanded command line (apart from current
system resources- it will be using virtual memory though, not actually
a file). A lot of modern shells probably do the same but I don't think
you can rely on it outside of bash... There is another limit that is
the number of arguments to an external command (I think it's called ARG_MAX)
but because 'for' is a builtin it is not subject to that, however that
won't be arbitary so its something to watch out for when dealing with
a lot of data (like passing thousands of filenames to 'rm' )

seeya,

--
: ${L:-aura} # http://lf.8k.com:80
:            # http://lf.1accesshost.com

Quote:
>--
>clemens



Thu, 18 Mar 2004 00:45:46 GMT  
 
 [ 7 post ] 

 Relevant Pages 

1. Field values dissappear on input form

2. report based on field value user inputs at run time

3. report based on field value user inputs at run time

4. Nedd assistance ASAP reading values and comparing values from an input file.

5. Compare a value in a COM file

6. Input Files and Alt Values for Blanks

7. splitting a large file based on the change of values in a key field

8. Decimal Values Rounding up when assigning from lookup file field

9. How to lookup a value for a field using data from another file

10. Find the value of a field width file properties

11. Using field as value for another field

12. comparing flat file fields to database fields

 

 
Powered by phpBB® Forum Software