Here are some unit tests that fail with HTMLParser. def testValidButBogusDeclarationFAILS(self): self.assertSoupEquals('a', 'a') def testIncompleteDeclarationAtEndFAILS(self): self.assertSoupEquals('a, baz""", ', baz') # SGMLParser generates bogus parse events when attribute values # contain embedded brackets, but at least Beautiful Soup fixes # it up a little. self.assertSoupEquals('', '">') self.assertSoupEquals(' and blah and blah""") invalidEntity = "foo&#bar;baz" soup = BeautifulStoneSoup\ (invalidEntity, convertEntities=htmlEnt) self.assertEquals(str(soup), invalidEntity) Tag names that contain Unicode characters crash the parser: def testUnicodeTagNamesFAILS(self): self.assertSoupEquals("<デダ芻デダtext>2PM") Here's the implementation of NavigableString.__unicode__: def __unicode__(self): return unicode(str(self)) It converts the Unicode to a string, and then back to Unicode. I can't find any other way of turning an element of a Unicode subclass into a normal Unicode object. This is pretty bad and a better technique is welcome.