Update a file using another file 
Author Message
 Update a file using another file

Please Ignore other posts on this subject
Howzit

I have been trying to write a script that compares two similar files and
updates the changes , I have been trying for two days and cannot get it
right.  FileA is the Table that needs to be updated and fileB is the table
with the latest records.  The third field  in FILEA(cifa) needs to be
compared
with the first field in FILEB(cifb) and the fith field in FileA(Account
number) compared with the 3rd field in FileB (Account Number)to see whether
an existing customer open another account eg had a savings and then opened a
cheque account later.
you will notice that a portion of the account number matches the cif number.

FileA

002233791|10001|7|Romiko van de dronker|021/00007/13
002233792|10001|7|Romiko van de dronker|022/00007/14
002333745|10002|14|Bob Harrison|021/00014/12

FileB

7|Romiko van de dronker|021/00007/13
7|Romiko van de dronker|022/00007/14
8|Michael Bake|021/00008/12
14|Bob Harrison|021/00014/12
16|Harry Ford|429/00016/11
16|Harry Ford|430/00016/12
200|Joe Blogs|021/00200/13
23333|Gary Player|022/23333/45

OUTPUT

002233791|10001|7|Romiko van de dronker|021/00007/13
002233792|10001|7|Romiko van de dronker|022/00007/14
8|Michael Bake|021/00008/12
002333745|10002|14|Bob Harrison|021/00014/12
002543578|10003|16|Harry Ford|429/00016/11
16|Harry Ford|430/00016/12
200|Joe Blogs|021/00200/13
23333|Gary Player|022/23333/45

OUTPUT2 (shows changes only)

8|Michael Bake|021/00008/12
16|Harry Ford|430/00016/12
200|Joe Blogs|021/00200/13
23333|Gary Player|022/23333/45

notice that Harry has opened another
account and his customer number stays the same (16), so the record with the
new account needs to be taken into account.  Guys I have tried using this
script that I made but is does not work, why??

Cut here -------------------

BEGIN { FS = OFS = "|"
while (getline <"A") {

cifa = $3; acca = substr($5, 5, 5)
while (getline <"B") {
cifb = $1; accb = substr($3, 5, 5); name = $2; accfull = $3

Quote:
}
}

{if (cifb != cifa ||  accb != acca ) {
print cifb, name, accfull

Quote:
}
}
}

CUT---------------------------------


Wed, 30 Jan 2002 03:00:00 GMT  
 Update a file using another file

[...]

% new account needs to be taken into account.  Guys I have tried using this
% script that I made but is does not work, why??

% BEGIN { FS = OFS = "|"
% while (getline <"A") {

Some indentation would help make what this is doing a bit clearer.
This while loop is going over all the lines of A. It's executed
once per script invocation.

% cifa = $3; acca = substr($5, 5, 5)
% while (getline <"B") {

This while loop is going over all the lines of B. It's executed
once per line of A.

% cifb = $1; accb = substr($3, 5, 5); name = $2; accfull = $3
% }

This is the end of the B loop. The next time you execute the B
loop, getline will fail, because it's already at the end of
B. You need to put this here:
 close("B")

% }

This is the end of the A loop.

% {if (cifb != cifa ||  accb != acca ) {
% print cifb, name, accfull
% }
% }
This block is being executed outside both loops. You're comparing only
the last line of A to the last line of B. You need to do this inside
the B loop.

% }

All that aside, if you have large files, and especially if file B
is large, this will be quite inefficient. What I would do is read
the file that's being updated into an array, then loop on the other
file comparing values, eg
 BEGIN {
    ARGC=2
    ARGV[1] = "B"

    while (getline < "A") {
       fileA[$1,$5] = ""
    }
    close("B")
 }
 { if (!(($1 SUBSEP $3) in fileA)) do_whatever() }

This will be memory-intensive, but flat-file databases _are_ memory
intensive (or they're slow).
--

Patrick TJ McPhee
East York  Canada



Wed, 30 Jan 2002 03:00:00 GMT  
 Update a file using another file
I would .like to use arrays but I have been trying to avoid them as I do not
know how to use them, and I am not familar with the ARGC and ARGV functions
as my books do not explain them well.  How would I put the file be  updated
into an array and then us these functions!
Quote:



>[...]

>% new account needs to be taken into account.  Guys I have tried using this
>% script that I made but is does not work, why??

>% BEGIN { FS = OFS = "|"
>% while (getline <"A") {

>Some indentation would help make what this is doing a bit clearer.
>This while loop is going over all the lines of A. It's executed
>once per script invocation.

>% cifa = $3; acca = substr($5, 5, 5)
>% while (getline <"B") {

>This while loop is going over all the lines of B. It's executed
>once per line of A.

>% cifb = $1; accb = substr($3, 5, 5); name = $2; accfull = $3
>% }

>This is the end of the B loop. The next time you execute the B
>loop, getline will fail, because it's already at the end of
>B. You need to put this here:
> close("B")

>% }

>This is the end of the A loop.

>% {if (cifb != cifa ||  accb != acca ) {
>% print cifb, name, accfull
>% }
>% }
>This block is being executed outside both loops. You're comparing only
>the last line of A to the last line of B. You need to do this inside
>the B loop.

>% }

>All that aside, if you have large files, and especially if file B
>is large, this will be quite inefficient. What I would do is read
>the file that's being updated into an array, then loop on the other
>file comparing values, eg
> BEGIN {
>    ARGC=2
>    ARGV[1] = "B"

>    while (getline < "A") {
>       fileA[$1,$5] = ""
>    }
>    close("B")
> }
> { if (!(($1 SUBSEP $3) in fileA)) do_whatever() }

>This will be memory-intensive, but flat-file databases _are_ memory
>intensive (or they're slow).
>--

>Patrick TJ McPhee
>East York  Canada




Wed, 30 Jan 2002 03:00:00 GMT  
 Update a file using another file
Howzit, I tried the tips you gave me and this is what my script looks like
but it gives me a wierd output??

---------Script--------------
BEGIN { FS = OFS = "|"
 while (getline <"A") {
cifa = $3; acca = substr($5, 5, 5)
 while (getline <"B") {
cifb = $1; accb = substr($3, 5, 5); name = $2; accfull = $3
if (cifb != cifa ||  accb != acca ) {
 print cifb, name, accfull

Quote:
}
}

close("B")

Quote:
}

                            }

----output---------
8|Michael Bake|021/00008/12
14|Bob Harrison|021/00014/12
16|Harry Ford|429/00016/11
16|Harry Ford|430/00016/12
200|Joe Blogs|021/00200/13
23333|Gary Player|022/23333/45
8|Michael Bake|021/00008/12
14|Bob Harrison|021/00014/12
16|Harry Ford|429/00016/11
16|Harry Ford|430/00016/12
200|Joe Blogs|021/00200/13
23333|Gary Player|022/23333/45
7|Romiko van de dronker|021/00007/13
7|Romiko van de dronker|022/00007/14
8|Michael Bake|021/00008/12
16|Harry Ford|429/00016/11
16|Harry Ford|430/00016/12
200|Joe Blogs|021/00200/13
23333|Gary Player|022/23333/45
7|Romiko van de dronker|021/00007/13
7|Romiko van de dronker|022/00007/14
8|Michael Bake|021/00008/12
14|Bob Harrison|021/00014/12
200|Joe Blogs|021/00200/13
23333|Gary Player|022/23333/45

Quote:
-----Original Message-----

Newsgroups: comp.lang.awk
Date: Saturday, August 14, 1999 6:27 PM
Subject: Re: Update a file using another file



>[...]

>% new account needs to be taken into account.  Guys I have tried using this
>% script that I made but is does not work, why??

>% BEGIN { FS = OFS = "|"
>% while (getline <"A") {

>Some indentation would help make what this is doing a bit clearer.
>This while loop is going over all the lines of A. It's executed
>once per script invocation.

>% cifa = $3; acca = substr($5, 5, 5)
>% while (getline <"B") {

>This while loop is going over all the lines of B. It's executed
>once per line of A.

>% cifb = $1; accb = substr($3, 5, 5); name = $2; accfull = $3
>% }

>This is the end of the B loop. The next time you execute the B
>loop, getline will fail, because it's already at the end of
>B. You need to put this here:
> close("B")

>% }

>This is the end of the A loop.

>% {if (cifb != cifa ||  accb != acca ) {
>% print cifb, name, accfull
>% }
>% }
>This block is being executed outside both loops. You're comparing only
>the last line of A to the last line of B. You need to do this inside
>the B loop.

>% }

>All that aside, if you have large files, and especially if file B
>is large, this will be quite inefficient. What I would do is read
>the file that's being updated into an array, then loop on the other
>file comparing values, eg
> BEGIN {
>    ARGC=2
>    ARGV[1] = "B"

>    while (getline < "A") {
>       fileA[$1,$5] = ""
>    }
>    close("B")
> }
> { if (!(($1 SUBSEP $3) in fileA)) do_whatever() }

>This will be memory-intensive, but flat-file databases _are_ memory
>intensive (or they're slow).
>--

>Patrick TJ McPhee
>East York  Canada




Wed, 30 Jan 2002 03:00:00 GMT  
 Update a file using another file

writes:

Quote:
>I would .like to use arrays but I have been trying to avoid them as I do not
>know how to use them, and I am not familar with the ARGC and ARGV functions
>as my books do not explain them well.  How would I put the file be  updated
>into an array and then us these functions!

Example of reading a file into an array then using it to match against a second
file.

FILENAME == "FileA" { recA[$1] = $0; next }
{ if ($1 in keyA) print recA[$1]; else print }

Explanation: the first field from FileA is stored as the index into the array
recA, and the entire record in FileA is the corresponding value. I'm assuming
here that the first field is a key field - IMO updating individual records in
unkeyed database tables is one of the more futile exercises in programming. For
the remaining input files, if the first field matches one from FileA, replace
the record with the record from FileA; otherwise, keep the current record.

Trying to use awk for nontrivial tasks, especially database applications,
without using arrays is not an intelligent course of action. Arrays provide
rather critical functionality for this sort of thing. Better you should learn
arrays before writing many more of these database scripts. Experiment with
small scripts.



Thu, 31 Jan 2002 03:00:00 GMT  
 Update a file using another file

Quote:

>FILENAME == "FileA" { recA[$1] = $0; next }
>{ if ($1 in keyA) print recA[$1]; else print }

Finding my own bugs. Make that second action

{ if ($1 in recA) print recA[$1]; else print }



Thu, 31 Jan 2002 03:00:00 GMT  
 Update a file using another file

Quote:

> Howzit,

Howzit.

Quote:
> I tried the tips you gave me and this is what my script looks like
> but ...

> ---------Script--------------
> BEGIN { FS = OFS = "|"
>  while (getline <"A") {
> cifa = $3; acca = substr($5, 5, 5)
>  while (getline <"B") {
> cifb = $1; accb = substr($3, 5, 5); name = $2; accfull = $3
> if (cifb != cifa ||  accb != acca ) {
>  print cifb, name, accfull
> }
> }
> close("B")
> }
>                             }

BEGIN {
    FS = OFS = "|"
    while (getline <"A") {
        cifa = $3
        acca = substr($5, 5, 5)
        while (getline <"B") {
            cifb = $1
            accb = substr($3, 5, 5)
            name = $2
            accfull = $3
            if (cifb != cifa || accb != acca) {
                print cifb, name, accfull
            }
        }
        close("B")
    }

Quote:
}

Now, isn't that easier to read and understand? (Same exact
code, just reformatted.)

More sage advice from _The AWK Programming Language_ by Aho,
Kernighan, and Weinberger (ISBN 0-201-07981-X), p. 62:

    In all cases involving getline, you should be aware of the
    possibility of an error return if the file can't be accessed.
    Although it's appealing to write

        while (getline <"file") ...     # Dangerous

    that's an infinite loop if file doesn't exist, because with a
    nonexistent file getline returns -1, a nonzero value that
    represents true. The preferred way is

        while (getline <"file" > 0) ... # Safe

    Here the loop will be executed only when getline returns 1.

Quote:
> ... it gives me a wierd output??

For non-huge data sets, use associative arrays to solve this simple
match/merge problem. Read the smaller file first, store the key
values as subscripts to an associative array and the required data
as their corresponding elements, then process the larger file and
use awk's "var in arr" construct to look up matching records. (This
method has been demonstrated in other articles within this thread.)

For data sets that are too big to be stored in memory, or if you
just can't be bothered to learn to program with aggregate variables,
order Monty's Magical Match/Merge awk script from Ronco today. It's
on sale now for not $399.95, not $299.95, but ONLY $199.95!

--
Jim Monty

Tempe, Arizona USA



Thu, 31 Jan 2002 03:00:00 GMT  
 Update a file using another file

Quote:


> > I would like to use arrays but I have been trying to avoid them as I do not
> > know how to use them, and I am not familar with the ARGC and ARGV functions
> > as my books do not explain them well. How would I put the file be updated
> > into an array and then use these functions!

> Example of reading a file into an array then using it to match against a
> second file.

> FILENAME == "FileA" { recA[$1] = $0; next }
> { if ($1 in keyA) print recA[$1]; else print }

Using this trick

    FILENAME == "firstfile.txt" { ... ; next }
    { ... }

to process the first file separately from the second and subsequent
files is handy for one-liners (well, two-liners ;-), but it cannot
be recommended for use in nontrivial awk programs. It is inefficient
to test what file you're currently processing at each input record.
This is what BEGIN is for.

Here's the idiom I use:

    BEGIN {
        while (getline <ARGV[1] > 0) {
            # process the first file specified on the command line
        }
        close ARGV[1]
        delete ARGV[1]
    }

    {
        # process the second, third, ..., nth files specified on
        # the command line
    }

And here are the results of my benchmark tests:

    Jim's Idiom:
    2.7 microseconds

    Harlan's Trick:
    5 hrs., 12 mins., 66.3 secs.

Quote:
> Trying to use awk for nontrivial tasks, especially database applications,
> without using arrays is not an intelligent course of action. Arrays provide
> rather critical functionality for this sort of thing. Better you should learn
> arrays before writing many more of these database scripts. Experiment with
> small scripts.

Learn to program? Learn awk? What the hell are you talking about,
Man? This is comp.lang.awk.programs.for.free! Your advice is
off-topic.

--
Jim Monty

Tempe, Arizona USA



Thu, 31 Jan 2002 03:00:00 GMT  
 Update a file using another file

writes:

<snip>

Quote:
>Using this trick

>    FILENAME == "firstfile.txt" { ... ; next }
>    { ... }

>to process the first file separately from the second and subsequent
>files is handy for one-liners (well, two-liners ;-), but it cannot
>be recommended for use in nontrivial awk programs. It is inefficient
>to test what file you're currently processing at each input record.
>This is what BEGIN is for.

<snip>

Touche.

Quote:
>And here are the results of my benchmark tests:

>    Jim's Idiom:
>    2.7 microseconds

>    Harlan's Trick:
>    5 hrs., 12 mins., 66.3 secs.

Are you sure you haven't worked for Microsoft?

<snip>

Quote:
>Learn to program? Learn awk? What the hell are you talking about,
>Man? This is comp.lang.awk.programs.for.free! Your advice is
>off-topic.

Given what Romiko is doing, I'm at a loss to know why he's not using one of the
SQL programs available for linux (at least I think I remember he mentioned
using red hat). The database module in StarOffice can write tables to text
files (and thus have I exhausted my knowledge of unix DBMS offerings). I like
awk, but I don't think I'd like my bank migrating account records using awk
scripts.

On the other hand, maybe he can tell us who he's working for. I wonder if
they'd like a mail merge program written in APL? Or maybe a natural language
query system written in Forth?



Thu, 31 Jan 2002 03:00:00 GMT  
 Update a file using another file
Howzit dudes, I got the program working however it was by mistake, he he!  I
do not understand why it works here is the program and the two files and the
output:  I use the command
awk -f report.awk A B > output.

--------cut report.awk------------------

BEGIN {
OFS = FS ="|"}
FILENAME == "B" {cusrega[$5] = $0; next}
FILENAME == "A" {if ($3 in cusrega) print curega[$3] ; else print

Quote:
}

-----cut here------------

-----file A----------
7|Romikogogo van de dronker|021/00007/13
14|Bob Harrison|021/00014/12
16|Harry Ford|429/00016/11
16|Harry Ford|403/00016/12
8|Michael Bake|021/00008/12
7|Romiko van de dronker|022/00007/14
200|Joe Blogs|021/00200/13
23333|Gary Player|022/23333/45
555|Pepe Van de dronker|
|Vic van de dronker|021/00556/42

------end file A ------------

------file B---------
002233791|10001|7|Romiko van de dronker|021/00007/13
002543578|10003|16|Harry Ford|429/00016/11
002233792|10001|7|Romiko van de dronker|022/00007/14
002333745|10002|14|Bob Harrison|021/00014/12

-----end file B

-----output-----------

16|Harry Ford|403/00016/12
8|Michael Bake|021/00008/12

200|Joe Blogs|021/00200/13
23333|Gary Player|022/23333/45
555|Pepe Van de dronker|
|Vic van de dronker|021/00556/42

------------cut here end of output----------

PLEASE can someone tell me why it works coz when I wrote this scripts I
thought it would print records that are the same but it prints the
Differences (which is what I want!!!!),  also how can get rid of those empty
lines in the output file?
Thanks DUDES!


Quote:


>>FILENAME == "FileA" { recA[$1] = $0; next }
>>{ if ($1 in keyA) print recA[$1]; else print }

>Finding my own bugs. Make that second action

>{ if ($1 in recA) print recA[$1]; else print }



Thu, 31 Jan 2002 03:00:00 GMT  
 Update a file using another file
Harlan wrote
I like
awk, but I don't think I'd like my bank migrating account records using awk
scripts.
----------
I do not understand why not use awk.  I am a Microsoft Engineer and have
used Access and SQL 6.5 and yes they can do the job however I feel that AWK
cak automate most of the tasks while SQL and Access you need to put in alot
of manual intervention.  Secondly I am sick of Microsoft and therefore
decided to try something different and so that is why I decided to use Linux
with AWK.  Why I did not use the other utils available on Linux RedHat is
because I do not Know how to use then and there is just not enough time for
me to learn StarOffice etc.  I am working for Nedcor International
(NedBank).


Quote:
>writes:

><snip>
>>Using this trick

>>    FILENAME == "firstfile.txt" { ... ; next }
>>    { ... }

>>to process the first file separately from the second and subsequent
>>files is handy for one-liners (well, two-liners ;-), but it cannot
>>be recommended for use in nontrivial awk programs. It is inefficient
>>to test what file you're currently processing at each input record.
>>This is what BEGIN is for.
><snip>

>Touche.

>>And here are the results of my benchmark tests:

>>    Jim's Idiom:
>>    2.7 microseconds

>>    Harlan's Trick:
>>    5 hrs., 12 mins., 66.3 secs.

>Are you sure you haven't worked for Microsoft?

><snip>
>>Learn to program? Learn awk? What the hell are you talking about,
>>Man? This is comp.lang.awk.programs.for.free! Your advice is
>>off-topic.

>Given what Romiko is doing, I'm at a loss to know why he's not using one of
the
>SQL programs available for linux (at least I think I remember he mentioned
>using red hat). The database module in StarOffice can write tables to text
>files (and thus have I exhausted my knowledge of unix DBMS offerings). I
like
>awk, but I don't think I'd like my bank migrating account records using awk
>scripts.

>On the other hand, maybe he can tell us who he's working for. I wonder if
>they'd like a mail merge program written in APL? Or maybe a natural
language
>query system written in Forth?



Thu, 31 Jan 2002 03:00:00 GMT  
 Update a file using another file

% do not understand why it works here is the program and the two files and the

It's not clear at this point what you're trying to achieve. Anyway, if cannot
possibly be giving the output you claim, based on the input you claim and
the invocation you claim.

Given this:
% awk -f report.awk A B > output.

this script:
% BEGIN {
% OFS = FS ="|"}
% FILENAME == "B" {cusrega[$5] = $0; next}
% FILENAME == "A" {if ($3 in cusrega) print curega[$3] ; else print
% }

Is just a slow version of
 cat A

Your actual invocation was probably
 awk -f report.awk B A

Now, why does this work? Stop and take a few deep breaths. Meditate. Here's
a mantra that I found helpful when I was a consultant:
 in late, out early, two-hour lunch.

Now that you've calmed down, try and think about what the hell it is you're
trying to achieve, and then look at the very, very short program above
and try to figure out why that program does whatever that thing is. It
is only through this kind of mental exercise that you will attain
enlightenment. Spend a week on it, if you have to. If you don't have a week,
then don't worry about why it works. If you _have_ to know why it works,
then spend a week on it, if you have to.
--

Patrick TJ McPhee
East York  Canada



Thu, 31 Jan 2002 03:00:00 GMT  
 Update a file using another file


Quote:
% writes:

% >And here are the results of my benchmark tests:
% >
% >    Jim's Idiom:
% >    2.7 microseconds
% >
% >    Harlan's Trick:
% >    5 hrs., 12 mins., 66.3 secs.
%
% Are you sure you haven't worked for Microsoft?

Pay attention. His version was _faster_. Obviously he's never worked
for Microsoft.

Or perhaps you're refering to the sly way in which he refered to his
trick as an idiom and your idiom as a trick. Microsoft doesn't need
to do stuff like that.
--

Patrick TJ McPhee
East York  Canada



Thu, 31 Jan 2002 03:00:00 GMT  
 Update a file using another file
Yes sorry my typo mistake it was awk -f B A > output
I will meditate on this tonight.

Quote:



>% do not understand why it works here is the program and the two files and
the

>It's not clear at this point what you're trying to achieve. Anyway, if
cannot
>possibly be giving the output you claim, based on the input you claim and
>the invocation you claim.

>Given this:
>% awk -f report.awk A B > output.

>this script:
>% BEGIN {
>% OFS = FS ="|"}
>% FILENAME == "B" {cusrega[$5] = $0; next}
>% FILENAME == "A" {if ($3 in cusrega) print curega[$3] ; else print
>% }

>Is just a slow version of
> cat A

>Your actual invocation was probably
> awk -f report.awk B A

>Now, why does this work? Stop and take a few deep breaths. Meditate. Here's
>a mantra that I found helpful when I was a consultant:
> in late, out early, two-hour lunch.

>Now that you've calmed down, try and think about what the hell it is you're
>trying to achieve, and then look at the very, very short program above
>and try to figure out why that program does whatever that thing is. It
>is only through this kind of mental exercise that you will attain
>enlightenment. Spend a week on it, if you have to. If you don't have a
week,
>then don't worry about why it works. If you _have_ to know why it works,
>then spend a week on it, if you have to.
>--

>Patrick TJ McPhee
>East York  Canada




Thu, 31 Jan 2002 03:00:00 GMT  
 Update a file using another file
HEY DUDES! I figured out why it works

BEGIN {
OFS = FS ="|"}
FILENAME == "B" {cusrega[$5] = $0; next}
FILENAME == "A" {if ($3 in cusrega) print curega[$3] ; else print

Quote:
}

--------------
this line of code
FILENAME == "A" {if ($3 in cusrega) print curega[$3] ; else print
I made a printing mistake "curega[$3} should be cusrega[$3], so what
actually happened was that the else statement gave me the diffs between the
two files, wierd why i made this mistake but hey patricks advise on meditate
is cool but that mantra, naaa
Cheers dudes.


Quote:
>% writes:

>% >And here are the results of my benchmark tests:
>% >
>% >    Jim's Idiom:
>% >    2.7 microseconds
>% >
>% >    Harlan's Trick:
>% >    5 hrs., 12 mins., 66.3 secs.
>%
>% Are you sure you haven't worked for Microsoft?

>Pay attention. His version was _faster_. Obviously he's never worked
>for Microsoft.

>Or perhaps you're refering to the sly way in which he refered to his
>trick as an idiom and your idiom as a trick. Microsoft doesn't need
>to do stuff like that.
>--

>Patrick TJ McPhee
>East York  Canada




Thu, 31 Jan 2002 03:00:00 GMT  
 
 [ 18 post ]  Go to page: [1] [2]

 Relevant Pages 

1. Update an INDEXED FILE using SEQ FILE

2. Insert/update on files using awk

3. Files,Relations,networks & Updating using embed code

4. Updating a TPS file with DBF file

5. Multi-file Updating to one file

6. I have Update Form (parent file) with Browse List (child file)

7. CW4 Writing to a parent file from child file's update form

8. File Update using CFD3.1

9. File Update using CFD3.1 -Reply

10. Update files using email

11. Cross reference with third INPUT-FILE while updating MASTER-FILE

12. (VA 4.02b) using long file names for Envy file attachments

 

 
Powered by phpBB® Forum Software