Incrementing 
Author Message
 Incrementing

Howzit     Guys

I have nearly finished my script that changes bankl account numbers to the
new system however I have one problem, suppose I have the following acounts
023/23232
023/23231
021/00007
021/00007
429/00016
023/23232
022/00007
021/00014
022/00007

What I need to do is increment the second to last number for every account
number that is the same.
Proposed Solutions:

023/23232/1
023/23231/1
021/00007/1
021/00007/2
429/00016/1
023/23232/2
022/00007/1
021/00014/1
022/00007/2

Notice that "021/00007" is different to "022/00007/1", it actually
represents one customer with different accounts and the increment I need is
in fact a serial number.
When I have finalised the entire script I will definitley post it to the
group, but I need to get around this tough one I am not used to comoparing
fields in a vertical manner.

Thanks for you help guys
Romiko



Fri, 08 Feb 2002 03:00:00 GMT  
 Incrementing

Quote:

>Howzit     Guys

>I have nearly finished my script that changes bankl account numbers to the
>new system however I have one problem, suppose I have the following acounts
>023/23232
>023/23231
>021/00007
>021/00007
>429/00016
>023/23232
>022/00007
>021/00014
>022/00007

>What I need to do is increment the second to last number for every account
>number that is the same.
>Proposed Solutions:

>023/23232/1
>023/23231/1
>021/00007/1
>021/00007/2
>429/00016/1
>023/23232/2
>022/00007/1
>021/00014/1
>022/00007/2

>Notice that "021/00007" is different to "022/00007/1", it actually
>represents one customer with different accounts and the increment I need is
>in fact a serial number.
>When I have finalised the entire script I will definitley post it to the
>group, but I need to get around this tough one I am not used to comoparing
>fields in a vertical manner.

You will find handling the data much easier to visualize the processing
for if you sort the input data (numerically) by the second field and
then the first before trying to add the serial number.

man sort

If that's not allowed, then you'll have to use an array indexed by the
account number to keep track of the number added on the end.

This uses more memory than dealing with a sorted file approach.

gawk 'BEGIN{FS=OFS="/"}{a[$2]++;print $0,a[$2]}' infile

For this input:

023/23232
023/23231
021/00007
021/00007
429/00016
023/23232
022/00007
021/00014
022/00007

It creates this output:

023/23232/1
023/23231/1
021/00007/1
021/00007/2
429/00016/1
023/23232/2
022/00007/3
021/00014/1
022/00007/4

If you need the first field or the entire line to be relevant too,
as it seems you do, change the indexing value(s) used.

gawk 'BEGIN{FS=OFS="/"}{a[$1,$2]++;print $0,a[$1,$2]}' infile

or

gawk 'BEGIN{FS=OFS="/"}{a[$0]++;print $0,a[$0]}' infile

both of which produces this output:

023/23232/1
023/23231/1
021/00007/1
021/00007/2
429/00016/1
023/23232/2
022/00007/1
021/00014/1
022/00007/2

It's not too hard to do this once you recognize how to use
the awk/gawk array capability.

Chuck Demas
Needham, Mass.

--
  Eat Healthy    |   _ _   | Nothing would be done at all,

  Die Anyway     |    v    | That no one could find fault with it.



Fri, 08 Feb 2002 03:00:00 GMT  
 Incrementing


...
Quote:
>I have nearly finished my script that changes bankl account numbers
>to the new system however I have one problem, suppose I have the
>following acounts
>023/23232
>023/23231
>021/00007
>021/00007
>429/00016
>023/23232
>022/00007
>021/00014
>022/00007

>What I need to do is increment the second to last number for every
>account number that is the same.
>Proposed Solutions:

>023/23232/1
>023/23231/1
>021/00007/1
>021/00007/2
>429/00016/1
>023/23232/2
>022/00007/1
>021/00014/1
>022/00007/2

>Notice that "021/00007" is different to "022/00007/1", it actually
>represents one customer with different accounts and the increment I
>need is in fact a serial number.

<snip>

As they say in some parts of northern New England (w/o the accent): you
can't get there from here.

Meaning, you can't reliably increment account numbers in the format
shown above. You have two 021/00007 entries in your input list. You
can't just make the second 021/00008 unless you know there's no such
account number.

Alternative 1: FASTEST BUT PROBABLY LEAST ACCEPTABLE - change ALL
account numbers so they're all unique, though the /^[0-9]{3}[/]/
pattern (branch code?) beginning each account number would remain as-
is. This is equivalent to a wholesale replacement of account numbers as
primary keys. Rekeying tables is not without problems, but it does
ensure that new keys are unique.

Alternative 2: read through all account numbers, storing the largest
account serial number for each branch (?). Immediately print the record
corresponding to the first occurrence of each account number, and store
the records corresponding to second and subsequent occurrences of each
account number. After reading all account records, print the stored
records using new account serial numbers based on the (incremented)
maximum account serial number for each branch.

BEGIN { FS = OFS = "|" # as I recall }
{ # read through ALL account records first
  if ($acctnumfield in acctnum) {
    dupacctnum[$acctnumfield, ++acctnum[$acctnumfield]] = NR
    rec[NR,"b"] = substr($0, 1, position_just_before_acctnumfield)
    rec[NR,"a"] = substr($0, position_just_after_acctnumfield)
  } else {
    acctnum[$acctnumfield] = 1
    print
  }
  branch = substr($acctnumfield,1,3)
  serial = substr($acctnumfield,5) + 0
  if (maxserial[branch] < serial) maxserial[branch] = serial

Quote:
}

END {
  for (a in acctnum) {
    if (acctnum[a] > 1) { # only process duplicates
      for (j = 2; j <= acctnum[a], ++j) {
        branch = substr(a,1,3)
        a = branch "/" ++maxserial[branch]
        nr = dupacctnum[a,j]
        print rec[nr,"b"], a, rec[nr,"a"]
      }
    }
  }

Quote:
} # UNTESTED!!!

This will change the order of your records. If record order is
important, start each output record with NR, or nr, as appropriate,
sort the result on the first field, then delete the first field.

Alternative 3: MOST MEMORY-INTENSIVE AND SLOWEST - similar to above,
but instead of storing the maximum account serial number by branch,
record which account serial numbers are used by branch. Then when
processing the duplicate records, use the next unused account serial
number by branch for each duplicate. This would generate a VERY LARGE
array to track account numbers used.

Sent via Deja.com http://www.deja.com/
Share what you know. Learn what you don't.



Fri, 08 Feb 2002 03:00:00 GMT  
 
 [ 3 post ] 

 Relevant Pages 

1. Spin Box increments less than 1- from .0153 to .98438 w/.0153 increment changes

2. DolphinVM dll ref count not incremented?

3. Pattern to increment an instance variable

4. Increments

5. Auto Increment

6. Help incrementing print fields

7. Increment field?

8. Q: Increment field?

9. Auto-increment on dBase-table

10. Auto increment record

11. Auto Increment Error

12. CW2003:Auto-increment 9,999,999,999?

 

 
Powered by phpBB® Forum Software