Is AWK sub/gsub broken? 
Author Message
 Is AWK sub/gsub broken?

I'm converting some text files to XML.  One of the things I need to do
is convert "special" characters to XML entities.  For example, '<' must
be translated to "&lt;".  I tried the following:

gsub(/</, "\&lt;", string)             # sub also doesn't work

But the resulting output is "<lt;" instead of "&lt;".  It appears that
'&' in the replacement string is being interpreted as the matched text.
In other words, AWK appears to be ignoring the fact that I've escaped
the '&' indicating that I want the literal character '&' and NOT the
matched text.  I'm not an AWK expert, but this is my understanding
according to what I've read.  Am I doing something wrong, or is AWK
broken in this case?

BTW, I'm using a Win32 version of AWK that I'm pretty sure came from the
Bell Labs website.  I also tried a version of AWK on a Linux machine we
have here, and got the same results.  If this is indeed a bug, where
should it be reported?

Thanks!
Chris



Sun, 03 Aug 2003 04:03:07 GMT  
 Is AWK sub/gsub broken?

Quote:

>gsub(/</, "\&lt;", string) # sub also doesn't work

>But the resulting output is "<lt;" instead of "&lt;".  It appears that
>'&' in the replacement string is being interpreted as the matched text.

That's right, but it is not a bug.

There is a subtle problem in POSIX, see section
"More About `\' and `&' with sub, gsub and gensub"
in "Effective AWK Programming" by Arnold D.Robbins:
http://www.gnu.org/manual/gawk/html_chapter/gawk_13.html#SEC126

Try two backslashes instead - you should get "&".

--
HQ



Sun, 03 Aug 2003 04:36:45 GMT  
 Is AWK sub/gsub broken?

...
Quote:
>gsub(/</, "\&lt;", string) # sub also doesn't work

>But the resulting output is "<lt;" instead of "&lt;".  It appears that
>'&' in the replacement string is being interpreted as the matched text.
>In other words, AWK appears to be ignoring the fact that I've escaped
>the '&' indicating that I want the literal character '&' and NOT the
>matched text.  I'm not an AWK expert, but this is my understanding
> according to what I've read.  Am I doing something wrong, or is AWK
> broken in this case?

...

This is just another time when you need to remember that unix-like tools
tend to process backslashes in double-quoted strings before they do ANYTHING
else with them. You need '\&' in the regexp replacement pattern, meaning
there needs to be a backslash remaining after standard double-quoted string
processing, meaning your replacement pattern needs to be "\\&lt;".



Sun, 03 Aug 2003 04:39:05 GMT  
 Is AWK sub/gsub broken?

Quote:



> ...
> >gsub(/</, "\&lt;", string) # sub also doesn't work

> ...

> This is just another time when you need to remember that unix-like tools
> tend to process backslashes in double-quoted strings before they do ANYTHING
> else with them. You need '\&' in the regexp replacement pattern, meaning
> there needs to be a backslash remaining after standard double-quoted string
> processing, meaning your replacement pattern needs to be "\\&lt;".

That makes sense, and it solves my problem.

Thanks!
Chris



Sun, 03 Aug 2003 05:26:26 GMT  
 
 [ 4 post ] 

 Relevant Pages 

1. Syntax for sub, gsub

2. help with sub and gsub

3. HELP! sub gsub

4. SUB and GSUB trouble

5. Substituting single quotes from Perl via shell using awk and gsub

6. : gsub in awk

7. Awk sub command -simple question for html

8. I am having problems setting the page break through Active X

9. Alert: GNU Awk 3.0.4 for Win32 possibly broken

10. $(TELEGEN2)/lib/rt.sub versus $(TELEGEN2)/lib/h_rt.sub

11. sorting by sub-sub-list elements

12. I am not deaf, but am I mute?

 

 
Powered by phpBB® Forum Software