Help - Code decodes encoded Html 
Author Message
 Help - Code decodes encoded Html

Hello

I'm building Html-encoded XML that has to be transformed to Html via XSLT.
The problem ist - the XSLT processor seems to decode my encoded Html again
during the transformation which screws up my Html report.

//get encoded Html ( ? -> ö)
string xml = GenerateXml (sales);

//load the XML into a document...
StringWriter writer = new StringWriter();
XPathDocument doc = new XPathDocument (new System.IO.StringReader(xml));

//...and do the transformation
XPathNavigator nav = doc.CreateNavigator();
transformer.Transform(nav, null, writer);

//what I return here, is decoded Html (ö is again ?)
return writer.GetStringBuilder().ToString();

There must be a bug somewhere but I don't get round the problem. Thank you
very much for your advice

Philipp



Tue, 22 Feb 2005 20:03:16 GMT  
 Help - Code decodes encoded Html


Wed, 18 Jun 1902 08:00:00 GMT  
 Help - Code decodes encoded Html
I've just seen, that my XML characters are being escaped by the numeric
rather than the descriptive code (? is being escaped as &#246, not &ouml).
Inserting descriptive codes results in parser errors anyway (invalid
entities). The problem is, that while reading the xml input, these
characters are being decoded again (&#246 becomes ? again) and then written
to the html file :-(

Thanks

Philipp



Tue, 22 Feb 2005 22:12:21 GMT  
 Help - Code decodes encoded Html
Philipp,

    As they should be.  Just because you are writing to HTML doesn't mean
that it will do some automatic conversion to the appropriate entity types.
You will have to do this yourself in your code somehow.  What you might want
to do is actually do a manual translation of element items so that the
entity references are escaped properly into HTML.  For example, in your XML,
you have ?, change it to &ouml before the translation.

    Hope this helps.

--
               - Nicholas Paldino [.NET/C# MVP]


Quote:
> I've just seen, that my XML characters are being escaped by the numeric
> rather than the descriptive code (? is being escaped as &#246, not &ouml).
> Inserting descriptive codes results in parser errors anyway (invalid
> entities). The problem is, that while reading the xml input, these
> characters are being decoded again (&#246 becomes ? again) and then
written
> to the html file :-(

> Thanks

> Philipp



Wed, 23 Feb 2005 00:08:00 GMT  
 Help - Code decodes encoded Html
Hello Nicholas

I was hoping to hear from you (been trying for hours now) ;-)

Quote:
>  For example, in your XML, you have ?, change it to &ouml before the

translation.

My problem is, that the XML does not contain any "?"-characters - I have
them escaped by using the HttpUtility.HtmlEncode()-method which escaped it
as "&#246".

Furthermore, even encoded umlauts within the stylesheet are being decoded:
<xsl:text>this is an &#246;</xsl:text> becomes "this is an ?" within the
html reports source code.

...I just can't imagin that all I can do is to parse the whole document to
re-escape all these characters (umlauts, french characters, etc.). Any idea?

Thank you very much

Philipp



Wed, 23 Feb 2005 00:33:57 GMT  
 Help - Code decodes encoded Html
Why don't you put it into a CDATA section instead of an XmlText
element?

Jonathan Schafer



Quote:
>Hello Nicholas

>I was hoping to hear from you (been trying for hours now) ;-)

>>  For example, in your XML, you have ?, change it to &ouml before the
>translation.

>My problem is, that the XML does not contain any "?"-characters - I have
>them escaped by using the HttpUtility.HtmlEncode()-method which escaped it
>as "&#246".

>Furthermore, even encoded umlauts within the stylesheet are being decoded:
><xsl:text>this is an &#246;</xsl:text> becomes "this is an ?" within the
>html reports source code.

>...I just can't imagin that all I can do is to parse the whole document to
>re-escape all these characters (umlauts, french characters, etc.). Any idea?

>Thank you very much

>Philipp



Wed, 23 Feb 2005 01:20:49 GMT  
 Help - Code decodes encoded Html
Hello Jonathan

That was close but also strange. If I use CDATA, the parser does not
de-escape the characters but extend them with an &amp character. The result
is that in the HTML source, I have &amp;#246; instead of &#246; and the
displayed report displays "&#246" rather than "?". This drives me crazy (and
I think it sould be so easy)...

Thanks for your help

Philipp



Quote:
> Why don't you put it into a CDATA section instead of an XmlText
> element?

> Jonathan Schafer



> >Hello Nicholas

> >I was hoping to hear from you (been trying for hours now) ;-)

> >>  For example, in your XML, you have ?, change it to &ouml before the
> >translation.

> >My problem is, that the XML does not contain any "?"-characters - I have
> >them escaped by using the HttpUtility.HtmlEncode()-method which escaped
it
> >as "&#246".

> >Furthermore, even encoded umlauts within the stylesheet are being
decoded:
> ><xsl:text>this is an &#246;</xsl:text> becomes "this is an ?" within the
> >html reports source code.

> >...I just can't imagin that all I can do is to parse the whole document
to
> >re-escape all these characters (umlauts, french characters, etc.). Any
idea?

> >Thank you very much

> >Philipp



Wed, 23 Feb 2005 01:58:21 GMT  
 Help - Code decodes encoded Html
If HTMLEncode is returning &#246, then that is what should be in your
CDATA section.  If is not supposed to escape characters in that
section.

Maybe you could post some sample code to look at?

Jonathan Schafer



Quote:
>Hello Jonathan

>That was close but also strange. If I use CDATA, the parser does not
>de-escape the characters but extend them with an &amp character. The result
>is that in the HTML source, I have &amp;#246; instead of &#246; and the
>displayed report displays "&#246" rather than "?". This drives me crazy (and
>I think it sould be so easy)...

>Thanks for your help

>Philipp



>> Why don't you put it into a CDATA section instead of an XmlText
>> element?

>> Jonathan Schafer



>> >Hello Nicholas

>> >I was hoping to hear from you (been trying for hours now) ;-)

>> >>  For example, in your XML, you have ?, change it to &ouml before the
>> >translation.

>> >My problem is, that the XML does not contain any "?"-characters - I have
>> >them escaped by using the HttpUtility.HtmlEncode()-method which escaped
>it
>> >as "&#246".

>> >Furthermore, even encoded umlauts within the stylesheet are being
>decoded:
>> ><xsl:text>this is an &#246;</xsl:text> becomes "this is an ?" within the
>> >html reports source code.

>> >...I just can't imagin that all I can do is to parse the whole document
>to
>> >re-escape all these characters (umlauts, french characters, etc.). Any
>idea?

>> >Thank you very much

>> >Philipp



Wed, 23 Feb 2005 02:21:46 GMT  
 Help - Code decodes encoded Html
Ok, I've written a little test class and included the things that hurt me
;-)

Thanks again

Philipp

**********************************
C# code

static void Main(string[] args)
{
XPathDocument doc = new XPathDocument("test.xml");
XslTransform transform = new XslTransform();
transform.Load("style.xslt");

//create navigator
XPathNavigator nav = doc.CreateNavigator();
transform.Transform(nav,null, fs);

Quote:
}

************************************3
Here is the XML file:

<?xml version="1.0"?>
<sales>
 <comments>
  <![CDATA[L&#246;sung soll NCR Galaxy abl&#246;sen.]]>
 </comments>
</sales>

************************************************************************
Here is the stylesheet

 <xsl:template match="/">
  <html><head></head><body>

  <!-- Test with directly written output -->
  <xsl:text>This is an &#246; </xsl:text>

  <!-- Get XML -->
  <xsl:value-of select="sales/comments"/>

  </body></html>
 </xsl:template>

************************************************
This is the resulting HTML source code:

<html>
  <head>
    <META http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
  </head>
  <body>
        This is an ?
        L&amp;#246;sung soll NCR Galaxy abl&amp;#246;sen.</body>
</html>



Wed, 23 Feb 2005 02:53:11 GMT  
 Help - Code decodes encoded Html
I found this help document in VS.NET

ms-help://MS.VSCC/MS.MSDNVS/dnmag00/html/xslt.htm

There are some attributes you can apply in your XSL that might affect
how it is output.

One is the disable-output-escaping, which you can apply on a field
like this..

<xsl:text disable-output-escaping='yes' >Wo&gt;rld</xsl:text>

Another thing you can try would be to use the "output" directive...
as you can see below, you can specify to emit in Xml, html, or text.

<xsl:output
  method = "xml" | "html" | "text" | qname-but-not-ncname
  version = nmtoken
  encoding = string
  omit-xml-declaration = "yes" | "no"
  standalone = "yes" | "no"
  doctype-public = string
  doctype-system = string
  cdata-section-elements = qnames
  indent = "yes" | "no"
  media-type = string  
/>

Jonathan Schafer



Quote:
>Ok, I've written a little test class and included the things that hurt me
>;-)

>Thanks again

>Philipp

>**********************************
>C# code

>static void Main(string[] args)
>{
>XPathDocument doc = new XPathDocument("test.xml");
>XslTransform transform = new XslTransform();
>transform.Load("style.xslt");

>//create navigator
>XPathNavigator nav = doc.CreateNavigator();
>transform.Transform(nav,null, fs);
>}

>************************************3
>Here is the XML file:

><?xml version="1.0"?>
><sales>
> <comments>
>  <![CDATA[L&#246;sung soll NCR Galaxy abl&#246;sen.]]>
> </comments>
></sales>

>************************************************************************
>Here is the stylesheet

> <xsl:template match="/">
>  <html><head></head><body>

>  <!-- Test with directly written output -->
>  <xsl:text>This is an &#246; </xsl:text>

>  <!-- Get XML -->
>  <xsl:value-of select="sales/comments"/>

>  </body></html>
> </xsl:template>

>************************************************
>This is the resulting HTML source code:

><html>
>  <head>
>    <META http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
>  </head>
>  <body>
>        This is an ?
>        L&amp;#246;sung soll NCR Galaxy abl&amp;#246;sen.</body>
></html>



Wed, 23 Feb 2005 04:10:52 GMT  
 Help - Code decodes encoded Html
Hello Jonathan

That's was it! Thank you very much! :-))

I've looked at disable-output-escaping before, but I never tried it along
with CDATA.

Working solution:
XML: <![CDATA[&#246]]>
XSL: <xsl:value-of select="elementname" disable-output-escaping="yes">

However, CDATA is needed - otherwise the parser decodes the input while
reading the content. I'm pretty sure it has no information which characters
it holds in memory have been escaped and which not - which smells a bit
buggyt...

Have a nice weekend

Philipp



Quote:
> I found this help document in VS.NET

> ms-help://MS.VSCC/MS.MSDNVS/dnmag00/html/xslt.htm

> There are some attributes you can apply in your XSL that might affect
> how it is output.

> One is the disable-output-escaping, which you can apply on a field
> like this..

> <xsl:text disable-output-escaping='yes' >Wo&gt;rld</xsl:text>

> Another thing you can try would be to use the "output" directive...
> as you can see below, you can specify to emit in Xml, html, or text.

> <xsl:output
>   method = "xml" | "html" | "text" | qname-but-not-ncname
>   version = nmtoken
>   encoding = string
>   omit-xml-declaration = "yes" | "no"
>   standalone = "yes" | "no"
>   doctype-public = string
>   doctype-system = string
>   cdata-section-elements = qnames
>   indent = "yes" | "no"
>   media-type = string
> />

> Jonathan Schafer



> >Ok, I've written a little test class and included the things that hurt me
> >;-)

> >Thanks again

> >Philipp

> >**********************************
> >C# code

> >static void Main(string[] args)
> >{
> >XPathDocument doc = new XPathDocument("test.xml");
> >XslTransform transform = new XslTransform();
> >transform.Load("style.xslt");

> >//create navigator
> >XPathNavigator nav = doc.CreateNavigator();
> >transform.Transform(nav,null, fs);
> >}

> >************************************3
> >Here is the XML file:

> ><?xml version="1.0"?>
> ><sales>
> > <comments>
> >  <![CDATA[L&#246;sung soll NCR Galaxy abl&#246;sen.]]>
> > </comments>
> ></sales>

> >************************************************************************
> >Here is the stylesheet

> > <xsl:template match="/">
> >  <html><head></head><body>

> >  <!-- Test with directly written output -->
> >  <xsl:text>This is an &#246; </xsl:text>

> >  <!-- Get XML -->
> >  <xsl:value-of select="sales/comments"/>

> >  </body></html>
> > </xsl:template>

> >************************************************
> >This is the resulting HTML source code:

> ><html>
> >  <head>
> >    <META http-equiv="Content-Type" content="text/html;

charset=iso-8859-1">

- Show quoted text -

Quote:
> >  </head>
> >  <body>
> >        This is an ?
> >        L&amp;#246;sung soll NCR Galaxy abl&amp;#246;sen.</body>
> ></html>



Wed, 23 Feb 2005 17:36:30 GMT  
 
 [ 11 post ] 

 Relevant Pages 

1. code for encode/decode

2. source code of gif encoding and decoding?

3. Help: How to decode a URL Encoded String in C++

4. libraries for MIME UUE and yEnc encoding/decoding

5. Base64 Encoding/Decoding

6. Decoding url-encoded cgi form data

7. i want to learn about mpeg decode and encode program

8. Decoding C encoded ASCII characters

9. base 36 encode/decode (base-n too)

10. Encoding/Decoding COBOL COMP fields in C

11. Encoding and Decoding Strings

12. encoding\decoding files

 

 
Powered by phpBB® Forum Software