"&" and Awk 
Author Message
 "&" and Awk

Hello

first, excuse me for my poor english!

I'm writing files in html and there are sentences in french. In french there
are characters with accent (eg , ), so if you want to write, for example,
the word "t", you have to write "été". You understand that
it's difficult to read the html source.

So, i try to write an html parser with awk . When, i'll finish to write my
htlm source, the  parser will transform all the "" to "é" ,the ""
to "à", ....

I use the gsub function ( gsub("","é"); ) and it doesn't work; if
the input is the word "t" the output is "eacute;teacute;".
Is there a solution?

thanks

Jean-Fran?ois (or Jean-François !!)

PS: I'm on windows NT with gawk 3.0, patchlevel 0. I have also gnuwin32.



Mon, 06 Oct 2003 19:28:43 GMT  
 "&" and Awk

Quote:

> I use the gsub function ( gsub("","é"); ) and it doesn't work; if
> the input is the word "t" the output is "eacute;teacute;".
> Is there a solution?

Yep. "&" is special in the substitution part of (g)sub, it means the
string being substituted.  You need to escape it to produce literal &:

        gsub("","\\é")

You need two backslashes because string interpolation
will eat the first away.

Same problem occurs with sed, although there a single
backslash suffices:

sed 's--\é-g'

--
Tapani Tarvainen



Mon, 06 Oct 2003 21:37:50 GMT  
 "&" and Awk

Quote:

> Hello

> first, excuse me for my poor english!

> I'm writing files in html and there are sentences in french. In french there
> are characters with accent (eg , ), so if you want to write, for example,
> the word "t", you have to write "été". You understand that
> it's difficult to read the html source.

> So, i try to write an html parser with awk . When, i'll finish to write my
> htlm source, the  parser will transform all the "" to "é" ,the ""
> to "à", ....

> I use the gsub function ( gsub("","é"); ) and it doesn't work; if
> the input is the word "t" the output is "eacute;teacute;".
> Is there a solution?

> thanks

> Jean-Fran?ois (or Jean-François !!)

> PS: I'm on windows NT with gawk 3.0, patchlevel 0. I have also gnuwin32.

The problem is that "&" in the second parameter is taken to mean "the
thing being substituted for" so that it is easy to do
   gsub("long complex pattern","long complex pattern plus")
by writing
   gsub("long complex pattern","& plus")
This saves a lot of typing and backreferencing. Your solution is
   gsub("","\é");
which removes the special meaning of the "&" in this context.

You might find it convenient to acquire a copy of "sed & awk" by
Dougherty and Robbins, published by O'reilly and Associates. Their
web site/catalog is www.ora.com.
--
Bob Stearns
University of Georgia

(706)542-5110



Mon, 06 Oct 2003 21:55:02 GMT  
 "&" and Awk
i've the solution (thanks Mark ..):
gsub("","\\é"), there are 2 '\'. ( I've tried with 1 '\', and it
isn't ok )

Quote:
> Hello

> first, excuse me for my poor english!

> I'm writing files in html and there are sentences in french. In french
there
> are characters with accent (eg , ), so if you want to write, for
example,
> the word "t", you have to write "été". You understand that
> it's difficult to read the html source.

> So, i try to write an html parser with awk . When, i'll finish to write my
> htlm source, the  parser will transform all the "" to "é" ,the ""
> to "à", ....

> I use the gsub function ( gsub("","é"); ) and it doesn't work; if
> the input is the word "t" the output is "eacute;teacute;".
> Is there a solution?

> thanks

> Jean-Fran?ois (or Jean-François !!)

> PS: I'm on windows NT with gawk 3.0, patchlevel 0. I have also gnuwin32.



Mon, 06 Oct 2003 22:48:11 GMT  
 "&" and Awk

Quote:

> I use the gsub function ( gsub("","é"); ) and it doesn't work; if
> the input is the word "t" the output is "eacute;teacute;".
> Is there a solution?

Mais oui, French is not the only language coping with diacriticals
marks...
Now, in awk the "&" has a special meaning in the 2nd argument to gsub(),
as a place holder for the text matched.
For your needs, you should escape the & with a \, and then to escape
the \ again so that the string does really contain the \.
Use:
  gsub("","\\é")
et bone chance.

--
  All true believers shall break their eggs at the convenient end.



Fri, 10 Oct 2003 18:08:20 GMT  
 
 [ 5 post ] 

 Relevant Pages 

1. string.join(["Tk 4.2p2", "Python 1.4", "Win32", "free"], "for")

2. Nullifying "$$" in Expect script

3. awk "search and replace"

4. "Embedding" an awk script in bash

5. Looking for "pretty-printer"/reformatter for AWK

6. sources from "The AWK Programming Language"

7. BEGIN{want[]={"s1o", "s2o", "s2q", "s3q"}

8. Parsing ""D""?

9. "Fifth", "Forth", zai nar?

10. Ruby "finalize", "__del__"

11. beginners "let"/"random" question

12. ANNOUNCE: new "plus"- and "dash"-patches available for Tcl7.5a2/Tk4.1a2

 

 
Powered by phpBB® Forum Software