You may use the ==
operator to compare unicode objects for equality.
>>> s1 = u'Hello'
>>> s2 = unicode("Hello")
>>> type(s1), type(s2)
(<type 'unicode'>, <type 'unicode'>)
>>> s1==s2
True
>>>
>>> s3='Hello'.decode('utf-8')
>>> type(s3)
<type 'unicode'>
>>> s1==s3
True
>>>
But, your error message indicates that you aren't comparing unicode objects. You are probably comparing a unicode
object to a str
object, like so:
>>> u'Hello' == 'Hello'
True
>>> u'Hello' == 'x81x01'
__main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
False
See how I have attempted to compare a unicode object against a string which does not represent a valid UTF8 encoding.
Your program, I suppose, is comparing unicode objects with str objects, and the contents of a str object is not a valid UTF8 encoding. This seems likely the result of you (the programmer) not knowing which variable holds unicide, which variable holds UTF8 and which variable holds the bytes read in from a file.
I recommend http://nedbatchelder.com/text/unipain.html, especially the advice to create a "Unicode Sandwich."
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…