XML:Simple character set problem 
Author Message
 XML:Simple character set problem

I'm extracting data from an XML file using XML:Simple. Unfortunatly local
language charaters are being translated. For example the string "??" is
converted to "??? ????? ?".

I use the following command to get the data in a hash:

$elements    = XMLin($filename,noattr=>1,suppressempty=>1);

In the XML file heading, I have -> <?xml version="1.0"
encoding="ISO-8859-1"?>

Has someone an idea what's going on and how to fix this problem?

Best regards



Tue, 20 Apr 2004 20:32:06 GMT  
 XML:Simple character set problem

 > I'm extracting data from an XML file using XML:Simple. Unfortunatly local
 > language charaters are being translated. For example the string
"??" is
 > converted to "??? ????? ?".
 >
 > I use the following command to get the data in a hash:
 >
 > $elements    = XMLin($filename,noattr=>1,suppressempty=>1);
 >
 > In the XML file heading, I have -> <?xml version="1.0"
 > encoding="ISO-8859-1"?>
 >
 > Has someone an idea what's going on and how to fix this problem?

I had the same phenomenon this morning using XML::Parser. Strangely
enough, it was fixed by setting the ProtocolEncoding to UTF-8 even
though the document itself used latin1.

So try:

$elements = XMLin($filename, noattr => 1,
                             suppressempty => 1,
                             parseropts => [ProtocolEncoding => 'UTF-8']
                  );

Tassilo

PS: I thought I had already sent this message, but it did not appear
even a few hours after posting. Hope it wont be there twice.

--
$a=[(74,116)];$b=[($a->[1]-1,$a->[1]++,0x20)];$c=[(97,110)];$d=[($c->

(chr($_)):print;}}$c=sub{$l=shift;[(0x20+$l-1,0x50,0x65,0x73-0x01,108


ord(chr($h->{$_}))))};



Wed, 21 Apr 2004 05:22:42 GMT  
 XML:Simple character set problem

Quote:

> I'm extracting data from an XML file using XML:Simple. Unfortunatly local
> language charaters are being translated. For example the string "??" is
> converted to "??? ????? ?".

> I use the following command to get the data in a hash:

> $elements    = XMLin($filename,noattr=>1,suppressempty=>1);

> In the XML file heading, I have -> <?xml version="1.0"
> encoding="ISO-8859-1"?>

> Has someone an idea what's going on and how to fix this problem?

I had the same phenomenon this morning using XML::Parser. Strangely
enough, it was fixed by setting the ProtocolEncoding to UTF-8 even
though the document itself used latin1.

So try:

$elements = XMLin($filename, noattr => 1,
                             suppressempty => 1,
                             parseropts => [ProtocolEncoding => 'UTF-8']
                  );

Tassilo

--
$a=[(74,116)];$b=[($a->[1]-1,$a->[1]++,0x20)];$c=[(97,110)];$d=[($c->

(chr($_)):print;}}$c=sub{$l=shift;[(0x20+$l-1,0x50,0x65,0x73-0x01,108


ord(chr($h->{$_}))))};



Wed, 21 Apr 2004 01:19:33 GMT  
 XML:Simple character set problem

Quote:
> I had the same phenomenon this morning using XML::Parser. Strangely
> enough, it was fixed by setting the ProtocolEncoding to UTF-8 even
> though the document itself used latin1.

> So try:

> $elements = XMLin($filename, noattr => 1,
>      suppressempty => 1,
>      parseropts => [ProtocolEncoding => 'UTF-8']
>   );

Hi Tassilo,

Thanks for your reply. I did some tries with the ProtocolEncoding option.
Unfortunatly in my case as soon as use isolatin1 charaters in the XML file,
I should declare it so in the encoding tag (enconding="ISO-8859-1").

Then if I try to put UTF-8 in parseropts => [ProtocolEncoding => 'UTF-8'],
then I get a parse error.

To summarize I did try some combinations but the result is of to types, in
one case the characters are wrongly translated (ISO-8859-1 in both XML and
script parseopts) and in the other cases there is a parsing error.

So, I'm still looking further for a solution.



Fri, 23 Apr 2004 16:22:51 GMT  
 XML:Simple character set problem
Ok, I found the reason of my problem. XML::Parser is translating anything to
charset UTF-8, but I expected an iso-latin1 output.

So to fix this, I had to add an addional conversion setp:

Let say if elements are in the hash:

$elements    = XMLin($filename,noattr=>1,suppressempty=>1);

an element would be: $elements->{'subject'}; #in UTF-8

and to get it in iso-8859-1:

use Unicode::String qw(utf8 latin1);
...
$u = utf8($elements->{'$subject');
print $u->latin1;

This has maybe to do with your problem?



Fri, 23 Apr 2004 22:16:17 GMT  
 
 [ 5 post ] 

 Relevant Pages 

1. Problem on installing XML-Simple/XML-Parser on LynxOS

2. Problem on installing XML-Simple/XML-Parser on LynxOS

3. Using XML::Simple on Access 2002 generated XML: Working Code

4. Character encoding problem with XML::Parser

5. Problem with XML::Parser and scandinavian characters

6. Problem with character set (input) and Ptk.

7. LWP::Simple bug or set-up problem under Win98

8. Installing XML-Generator / XML-Parser - make problems

9. problems using XML::XPath to parse an XML document that contains a DTD definition (docbook)

10. XML::Parser Choking on Special Characters....Workaround??

11. How do I translate between the Unicode 1.1 Hangul character set and the Unicode 3.1 Hangul character set?

12. XML::Writer - having trouble encoding characters

 

 
Powered by phpBB® Forum Software