Problem with UnicodeError and xml.dom.minidom 
Author Message
 Problem with UnicodeError and xml.dom.minidom

When I scan previous posts on this, it seems like a lot of people are
having trouble with UnicodeError. I don't use any non-ASCII characters
(yet), but still I got this uninterpretable error:

E:\test>python count.py
Traceback (most recent call last):
  File "count.py", line 37, in ?
    message = domext.createElementEmpty(doc, root, u'message')
  File "e:\python\gustaf\domext.py", line 12, in createElementEmpty
    parent.appendChild(e)
  File "E:\python\_xmlplus\dom\minidom.py", line 144, in appendChild
    if node.nodeType not in self.childNodeTypes:
UnicodeError: ASCII decoding error: ordinal not in range(128)

I'm converting mails to XML, using xml.dom.minidom. Works fine in
interactive mode, but not when I run it as a script! Here's the function in
my domext.py module that appears to raise the error:

# Create an element-only or EMPTY element
def createElementEmpty(doc, parent, name):
  e = doc.createElement(name)
  parent.appendChild(e)
  return e

It takes the DOM object, a parent node and a name for the element to
create.

The call that fails looks like this:

message = domext.createElementEmpty(doc, root, 'message')

This is line 37 in count.py. No non-ASCII characters here, and it works
fine in Idle. I'm paralysed. Can someone break the spell?

Gustaf Liljegren



Fri, 28 Nov 2003 02:36:39 GMT  
 Problem with UnicodeError and xml.dom.minidom

Quote:

>...

> This is line 37 in count.py. No non-ASCII characters here, and it works
> fine in Idle. I'm paralysed. Can someone break the spell?

Something is weird with your traceback. I don't see how this line could
generate this error:

  File "E:\python\_xmlplus\dom\minidom.py", line 144, in appendChild
    if node.nodeType not in self.childNodeTypes:
UnicodeError: ASCII decoding error: ordinal not in range(128)

It doesn't seem to be doing string manipulations at all. There is a
"unicode" builtin that will convert text to Unicode. Sprinkle some calls
to that around like this:

assert unicode(somestring)

Sprinkle some calls to that around your code and even around the library
code if you have to.

Also, as a strategy, consider unicode-ing all of your input data as soon
as you read it in:

somestring = unicode(somestring)

--
Take a recipe. Leave a recipe.  
python Cookbook!  http://www.ActiveState.com/pythoncookbook



Fri, 28 Nov 2003 11:16:17 GMT  
 Problem with UnicodeError and xml.dom.minidom

Quote:

>Sprinkle some calls to that around your code and even around the library
>code if you have to.

Thanks Paul, I'm closer to the solution now. I found a specific message in
the mailbox that gives the program trouble. Well, it isn't so specific --
it merely has some Swedish characters in the "From:" field...

The idea is to check if the "From:" field contains some of my friend's e-
mails, to sort them out. It looks like this:

for e in myFriend.getEmails():       # Friends often change e-mails...
  if string.find(m.get('from'), e):  # Here's the line that fails!

When I found this message, I could cut it out, and paste it at the
beginning of the mailbox file, and then provoke same error in Idle. I found
that the string returned from m.get('from') has some kind of hexadecimal
escaping for non-ASCII characters ('?' is represented as '\xe4'), while the
'e' string containing the e-mail has a 'u' (for Unicode) in front. So that
explains why there is a problem when comparing them.

So, I wrapped a unicode() function around m.get('from'), but I still have
this same error: "UnicodeError: ASCII decoding error: ordinal not in
range(128)". Acctually, all I need to do to get this error is to write:

print unicode(m.get('from'))  # Works fine without unicode() though

Hope this is enough info for someone to solve it. Otherwise, I'd gladly
share the complete code if you need. But it may be too much for a
newsgroup.

Regards,

Gustaf Liljegren



Sat, 29 Nov 2003 05:08:45 GMT  
 Problem with UnicodeError and xml.dom.minidom

Quote:

>...

> Hope this is enough info for someone to solve it. Otherwise, I'd gladly
> share the complete code if you need. But it may be too much for a
> newsgroup.

The easiest thing is to do the comparison as 8-bit strings rather than
Unicode strings:

if string.find(m.get('from'), e.encode("Latin-1")):

or both as Unicode strings:

if string.find(unicode(m.get('from'), "Latin-1"), e):

Some of us have fought a losing war to make Python do that for you
automatically but the opposition is fierce, well organized and well
funded. ;)

--
Take a recipe. Leave a recipe.  
Python Cookbook!  http://www.ActiveState.com/pythoncookbook



Sat, 29 Nov 2003 07:39:19 GMT  
 
 [ 4 post ] 

 Relevant Pages 

1. how to save xml (create with fuctions in xml.dom.minidom

2. mod_python and xml.dom.minidom problem

3. problem in xml.dom.minidom

4. xml.dom.minidom parser error

5. xml.dom.minidom

6. xml.dom.minidom question

7. How does xml.dom.minidom compare?

8. xml.dom.minidom + py2exe

9. xml.dom.minidom on os x

10. Exception in multiple use of xml.dom.minidom.parse()

11. ASCII decoding error with xml.dom.minidom

12. Please help required urgently - xml.dom.minidom

 

 
Powered by phpBB® Forum Software