Author |
Message |
jfl #1 / 5
|
 "&" and Awk
Hello first, excuse me for my poor english! I'm writing files in html and there are sentences in french. In french there are characters with accent (eg , ), so if you want to write, for example, the word "t", you have to write "été". You understand that it's difficult to read the html source. So, i try to write an html parser with awk . When, i'll finish to write my htlm source, the parser will transform all the "" to "é" ,the "" to "à", .... I use the gsub function ( gsub("","é"); ) and it doesn't work; if the input is the word "t" the output is "eacute;teacute;". Is there a solution? thanks Jean-Fran?ois (or Jean-François !!) PS: I'm on windows NT with gawk 3.0, patchlevel 0. I have also gnuwin32.
|
Mon, 06 Oct 2003 19:28:43 GMT |
|
 |
t.. #2 / 5
|
 "&" and Awk
Quote:
> I use the gsub function ( gsub("","é"); ) and it doesn't work; if > the input is the word "t" the output is "eacute;teacute;". > Is there a solution?
Yep. "&" is special in the substitution part of (g)sub, it means the string being substituted. You need to escape it to produce literal &: gsub("","\\é") You need two backslashes because string interpolation will eat the first away. Same problem occurs with sed, although there a single backslash suffices: sed 's--\é-g' -- Tapani Tarvainen
|
Mon, 06 Oct 2003 21:37:50 GMT |
|
 |
Bob Stearn #3 / 5
|
 "&" and Awk
Quote:
> Hello > first, excuse me for my poor english! > I'm writing files in html and there are sentences in french. In french there > are characters with accent (eg , ), so if you want to write, for example, > the word "t", you have to write "été". You understand that > it's difficult to read the html source. > So, i try to write an html parser with awk . When, i'll finish to write my > htlm source, the parser will transform all the "" to "é" ,the "" > to "à", .... > I use the gsub function ( gsub("","é"); ) and it doesn't work; if > the input is the word "t" the output is "eacute;teacute;". > Is there a solution? > thanks > Jean-Fran?ois (or Jean-François !!) > PS: I'm on windows NT with gawk 3.0, patchlevel 0. I have also gnuwin32.
The problem is that "&" in the second parameter is taken to mean "the thing being substituted for" so that it is easy to do gsub("long complex pattern","long complex pattern plus") by writing gsub("long complex pattern","& plus") This saves a lot of typing and backreferencing. Your solution is gsub("","\é"); which removes the special meaning of the "&" in this context. You might find it convenient to acquire a copy of "sed & awk" by Dougherty and Robbins, published by O'reilly and Associates. Their web site/catalog is www.ora.com. -- Bob Stearns University of Georgia
(706)542-5110
|
Mon, 06 Oct 2003 21:55:02 GMT |
|
 |
jfl #4 / 5
|
 "&" and Awk
i've the solution (thanks Mark ..): gsub("","\\é"), there are 2 '\'. ( I've tried with 1 '\', and it isn't ok ) Quote: > Hello > first, excuse me for my poor english! > I'm writing files in html and there are sentences in french. In french there > are characters with accent (eg , ), so if you want to write, for example, > the word "t", you have to write "été". You understand that > it's difficult to read the html source. > So, i try to write an html parser with awk . When, i'll finish to write my > htlm source, the parser will transform all the "" to "é" ,the "" > to "à", .... > I use the gsub function ( gsub("","é"); ) and it doesn't work; if > the input is the word "t" the output is "eacute;teacute;". > Is there a solution? > thanks > Jean-Fran?ois (or Jean-François !!) > PS: I'm on windows NT with gawk 3.0, patchlevel 0. I have also gnuwin32.
|
Mon, 06 Oct 2003 22:48:11 GMT |
|
 |
Perique des Palotte #5 / 5
|
 "&" and Awk
Quote:
> I use the gsub function ( gsub("","é"); ) and it doesn't work; if > the input is the word "t" the output is "eacute;teacute;". > Is there a solution?
Mais oui, French is not the only language coping with diacriticals marks... Now, in awk the "&" has a special meaning in the 2nd argument to gsub(), as a place holder for the text matched. For your needs, you should escape the & with a \, and then to escape the \ again so that the string does really contain the \. Use: gsub("","\\é") et bone chance. -- All true believers shall break their eggs at the convenient end.
|
Fri, 10 Oct 2003 18:08:20 GMT |
|
|
1. string.join(["Tk 4.2p2", "Python 1.4", "Win32", "free"], "for")
2. Nullifying "$$" in Expect script
3. awk "search and replace"
4. "Embedding" an awk script in bash
5. Looking for "pretty-printer"/reformatter for AWK
6. sources from "The AWK Programming Language"
7. BEGIN{want[]={"s1o", "s2o", "s2q", "s3q"}
8. Parsing ""D""?
9. "Fifth", "Forth", zai nar?
10. Ruby "finalize", "__del__"
11. beginners "let"/"random" question
12. ANNOUNCE: new "plus"- and "dash"-patches available for Tcl7.5a2/Tk4.1a2
|
|
|