Grabbing Wrong Field, Wrong File 
Author Message
 Grabbing Wrong Field, Wrong File

Still bashing away at my semantic identifiers for the error massages in my
firm's applications.

I have come a long way since I first posed questions about the indexing of
the above in this forum. Among other things, the promising model I was
working on was scrapped and my supervisors sort of sent me back to square
one. However, I gained some techniques in the process which are still
proving useful.

Now I'm stuck again. This is where I'm at:

I have a file called sujetUnN.txt which contains a list of multiword terms
that have been sorted, rendered unique and numbered. The first field is the
number of the item and it is separated from the item entry by a tab. This is
so that the tab can be evoked as FS as the words composing the item entries
are separated by spaces. Here's an example, using +AD4APg- to represent the tab:

0 +AD4APg-
1 +AD4APg- Acc+AOg-s
2 +AD4APg- Accus+AOk-
3 +AD4APg- Accus+AOk- de reception
4 +AD4APg- Acompte
5 +AD4APg- Actualisation automatique

I have another file called sujet.txt which contains the unsorted, non-unique
entries. I have given this a dummie first field, separated from the rest by
a tab again, consisting of the character +ACIAQAAi-:

+AEA- +AD4APg- Risque
+AEA- +AD4APg- Risque
+AEA- +AD4APg- Type d'+AOk-dition
+AEA- +AD4APg- Edition
+AEA- +AD4APg- Erreur param+AOk-trage
+AEA- +AD4APg- Actualisation automatique

Now, what I want is an awk program that will read through sujet.txt above.
If it finds a match between the entry of sujetUnN.txt and the entry of
sujet.txt (as in this example it would at the line +ACIAQA- +AD4APg- Actualisation
automatique+ACI-, then it should concatenate the +ACQ-1 number from sujetUnN.txt
onto the +ACIAQAAi- of sujet.txt, thus:
+AEA--5 +AD4APg- Actualisation automatique
And so on, substituting the dummie field for the assigned number of the
entry for each of the entries.

The program I have produced thus far looks like this:

awk '
FILENAME +AD0APQ- +ACI-sujetUnN.txt+ACI- +AHs-
                         split(+ACQ-0, entry, +ACIAXA-t+ACI-)
                         obj+AFs-entry+AFs-1+AF0AXQ- +AD0- entry+AFs-2+AF0-
                         next
                         +AH0-
                        +AHs-for (i +AD0- 1+ADs- i +ADwAPQ- NF+ADs- i+-+-)
                               if (+ACQ-i in obj)
                               +AHs-
                               +ACQ-1 +AD0- +ACQ-1 +ACI--+ACI- obj+AFsAJA-i+AF0-
                               +AH0-
                        +AH0-
                        +AHs-print +ACQ-0+AH0- ' sujetUnN.txt +ACQAKg-

(It is closely modeled on the program +ACI-awkro+ACI- on pp 193-195 of my
Dougherty-Robbins). But this is taking the wrong bits out of the wrong bobs.
When I run it with +ACI-pgNumAss sujet.txt +AD4- outputfile+ACI-, the outputfile looks
exactly like the input file:

+AEA- +AD4APg- Risque
+AEA- +AD4APg- Risque
+AEA- +AD4APg- Type d'+AOk-dition
+AEA- +AD4APg- Edition
+AEA- +AD4APg- Erreur param+AOk-trage
+AEA- +AD4APg- Actualisation automatique

I know there is something simple escaping me here, but am too slow-witted to
pinpoint it. As my supervisers are already drumming their fingers, I would
be most grateful for a correction.

Many thanks
Elisa Roselli



Sun, 26 May 2002 03:00:00 GMT  
 Grabbing Wrong Field, Wrong File


Quote:
> I have a file called sujetUnN.txt which contains a list of multiword
terms
> that have been sorted, rendered unique and numbered. The first field
is the
> number of the item and it is separated from the item entry by a tab.
This is
> so that the tab can be evoked as FS as the words composing the item
entries
> are separated by spaces. Here's an example, using +AD4APg- to
represent the tab:

> 0 +AD4APg-
> 1 +AD4APg- Acc+AOg-s
> 2 +AD4APg- Accus+AOk-
> 3 +AD4APg- Accus+AOk- de reception
> 4 +AD4APg- Acompte
> 5 +AD4APg- Actualisation automatique

> I have another file called sujet.txt which contains the unsorted, non-
unique
> entries. I have given this a dummie first field, separated from the
rest by
> a tab again, consisting of the character +ACIAQAAi-:

> +AEA- +AD4APg- Risque
> +AEA- +AD4APg- Risque
> +AEA- +AD4APg- Type d'+AOk-dition
> +AEA- +AD4APg- Edition
> +AEA- +AD4APg- Erreur param+AOk-trage
> +AEA- +AD4APg- Actualisation automatique

> Now, what I want is an awk program that will read through sujet.txt
above.
> If it finds a match between the entry of sujetUnN.txt and the entry of
> sujet.txt (as in this example it would at the line +ACIAQA- +AD4APg-
Actualisation
> automatique+ACI-, then it should concatenate the +ACQ-1 number from
sujetUnN.txt
> onto the +ACIAQAAi- of sujet.txt, thus:
> +AEA--5 +AD4APg- Actualisation automatique
> And so on, substituting the dummie field for the assigned number of
the
> entry for each of the entries.

Either I am more stupid than some people would accuse me of or your
posting has been obfuscated on the way to Usenet. It's a VERY high
probability I have misunderstood your question completely, but maybe,
maybe something like the following is what you need?:

awk '
BEGIN { FS="\t" }
FILENAME == ARGV[1] { keys[$2]=$1; next }
$0 in keys { print keys[$0], $0; next }
{ print "Key not found in (" FILENAME ") row " FNR ": " $0 > "log.txt" }
' sujet_n.txt sujet.txt

HTH,
/Peter
--
-= Spam safe(?) e-mail address: pez68 at netscape.net =-

Sent via Deja.com http://www.deja.com/
Before you buy.



Mon, 27 May 2002 03:00:00 GMT  
 Grabbing Wrong Field, Wrong File


Quote:


> > so that the tab can be evoked as FS as the words composing the item
> entries
> > are separated by spaces.> awk '
> BEGIN { FS="\t" }
> FILENAME == ARGV[1] { keys[$2]=$1; next }
> $0 in keys { print keys[$0], $0; next }
> { print "Key not found in (" FILENAME ") row " FNR ": "
$0  "log.txt" }
> ' sujet_n.txt sujet.txt

Hmm, I think maybe you'd like to change the FS part to OFS=FS="\t".
/Peter
--
-= Spam safe(?) e-mail address: pez68 at netscape.net =-

Sent via Deja.com http://www.deja.com/
Before you buy.



Mon, 27 May 2002 03:00:00 GMT  
 Grabbing Wrong Field, Wrong File
Please don't bother answering this - I found the solution myself in the
course of the morning. The main problem was in the line +ACI-obj+AFs-entry+AFs-1+AF0AXQ- +AD0-
entry+AFs-2+AF0AIg- which should have been +ACI-obj+AFs-entry+AFs-2+AF0AXQ- +AD0- entry+AFs-1+AF0AIg-. Also I
eliminated the for loop in favor of +ACI-item +AD0- substr(+ACQ-0, index(+ACQ-0, +ACQ-2)) +ACI-.

Thanks for bearing with me
Elisa Roselli

awk '
+ACM- charger les objets dans une matrice obj
FILENAME +AD0APQ- +ACI-sujetUnN.txt+ACI- +AHs-
                         split(+ACQ-0, entry, +ACIAXA-t+ACI-)
                         obj+AFs-entry+AFs-2+AF0AXQ- +AD0- entry+AFs-1+AF0-
                         next
                         +AH0-
                        +AHs-item +AD0- substr(+ACQ-0, index(+ACQ-0, +ACQ-2))
                               if (item in obj)
                                +ACQ-1 +AD0- obj+AFs-item+AF0- +ACIAXA-t+ACI-
                                else +ACQ-1 +AD0- +ACI-0000+AFw-t+ACI-
                        +AH0-
                        +AHs-print +ACQ-0+AH0- ' sujetUnN.txt +ACQAKg-



Mon, 27 May 2002 03:00:00 GMT  
 Grabbing Wrong Field, Wrong File


Quote:

>Either I am more stupid than some people would accuse me of or your
>posting has been obfuscated on the way to Usenet.

Very much the latter. If you're using Outlook Express, try selecting one of
the "unversal alphabets" under the Language parameter of the View menu. I
don't know what's been going wrong at my end, but a double "more than" sign
(>>), which I used to represent a tab, has been coming out as an
incomprehensible string of gobbeldygook (+AD4APg-), and the accented French
characters have been seriously deformed as well.

 It's a VERY high

Quote:
>probability I have misunderstood your question completely, but maybe,
>maybe something like the following is what you need?:

>awk '
>BEGIN { FS="\t" }
>FILENAME == ARGV[1] { keys[$2]=$1; next }
>$0 in keys { print keys[$0], $0; next }
>{ print "Key not found in (" FILENAME ") row " FNR ": " $0 > "log.txt" }
>' sujet_n.txt sujet.txt

You're impossibly sweet to keep at it, especially under the circumstances.
You rightly caught my error of inverting the array subscript and content in
the assignment "keys[$2]=$1". I'll try your suggestion of using ARGV instead
of the name of the file in the FILENAME assignment, as this might be more
supple.

Thanks ever,
Elisa



Mon, 27 May 2002 03:00:00 GMT  
 Grabbing Wrong Field, Wrong File


SNIP

Quote:
>>BEGIN { FS="\t" }
>>FILENAME == ARGV[1] { keys[$2]=$1; next }
>>$0 in keys { print keys[$0], $0; next }
>>{ print "Key not found in (" FILENAME ") row " FNR ": " $0 > "log.txt" }

A 'trick' I often use for detecting the 1st of multiple files used as input is

if (NR == FNR) ....

I find it looks simpler ;-)
Mark
---------
Mark Katz
ISPC, London - Innovation in data-delivery tools
Tel: (44) 208-455 4665/Direct 208-731 7516, Fax: 208-458 9554
** See our website at http://www.efiche.com **



Mon, 27 May 2002 03:00:00 GMT  
 Grabbing Wrong Field, Wrong File


Quote:
> awk '
> +ACM- charger les objets dans une matrice obj
> FILENAME +AD0APQ- +ACI-sujetUnN.txt+ACI- +AHs-
>                          split(+ACQ-0, entry, +ACIAXA-t+ACI-)
>                          obj+AFs-entry+AFs-2+AF0AXQ- +AD0-
entry+AFs-1+AF0-
>                          next
>                          +AH0-
>                         +AHs-item +AD0- substr(+ACQ-0, index(+ACQ-0,
+ACQ-2))
>                                if (item in obj)
>                                 +ACQ-1 +AD0- obj+AFs-item+AF0-
+ACIAXA-t+ACI-
>                                 else +ACQ-1 +AD0- +ACI-0000+AFw-t+ACI-
>                         +AH0-
>                         +AHs-print +ACQ-0+AH0- ' sujetUnN.txt +ACQAKg-

Hehe, I _do_ think you should concider using another news reader.
/Peter
--
-= Spam safe(?) e-mail address: pez68 at netscape.net =-

Sent via Deja.com http://www.deja.com/
Before you buy.



Mon, 27 May 2002 03:00:00 GMT  
 Grabbing Wrong Field, Wrong File


Quote:



> >Either I am more stupid than some people would accuse me of or your
> >posting has been obfuscated on the way to Usenet.

> Very much the latter.

I'm glad to hear that. =)

Quote:
> You're impossibly sweet to keep at it, especially under the
circumstances.
> You rightly caught my error of inverting the array subscript and
content in
> the assignment "keys[$2]=$1". I'll try your suggestion of using ARGV
instead
> of the name of the file in the FILENAME assignment, as this might be
more
> supple.

Well, I didn't really caught that error. I couldn't read your script at
all after what Outlook had done to it. But since you desrcibed your
inputs and outputs so well I might have recreated your wheel.

When it comes to that ARGV[1] thing. It relies on my intuitive notion
that awk reads the input files in the order listed on the command line.
I really don't know if I can rely on that, but so far I have never seen
it fail. I'm sure a few people frequenting this newsgroup knows though.
And if one of the inputs would come from stdin, what would happen then?
Anyway, hardcoding the filename into the script doesn't solve that
potential problem. And hardcoding. Hardcoding is bad as all kids in
South Park well knows. =)

/Peter
--
-= Spam safe(?) e-mail address: pez68 at netscape.net =-

Sent via Deja.com http://www.deja.com/
Before you buy.



Mon, 27 May 2002 03:00:00 GMT  
 Grabbing Wrong Field, Wrong File

writes:

Quote:
>Very much the latter. If you're using Outlook Express, try selecting one of
>the "unversal alphabets" under the Language parameter of the View menu. . . .

Just make sure your Internet Options are set to plain text for newsgroups.


Wed, 29 May 2002 03:00:00 GMT  
 Grabbing Wrong Field, Wrong File

Harlan Grove a +AOk-crit dans le message
+ADw-19991211021032.26491.00000155+AEA-ngol07.aol.com+AD4-...
+AD4-Just make sure your Internet Options are set to plain text for newsgroups.

I'm afraid they +AF8-are+AF8- set to plain text already.
EFR



Fri, 31 May 2002 03:00:00 GMT  
 
 [ 10 post ] 

 Relevant Pages 

1. Where am I wrong?(this has to be wrong, it can't be so simple)

2. Stippled text on a canvas drawn in the wrong place using the wrong color

3. Saving a Variable to a field in a record- what am i doing wrong

4. Wrong Field Mapping

5. TPS file system rewrites WRONG record

6. Wrong file format

7. Wrong SecWin WRI file at IceTips

8. File sharing - what can go wrong?

9. saving data: wrong file type on Win98

10. call library function- wrong file path

11. GNATing ada files with the wrong extensions.

12. Delete file (code), What is wrong?

 

 
Powered by phpBB® Forum Software