1 files changed, 6 insertions, 53 deletions
diff --git a/TODO b/TODO
index ea32bbb..a799bbb 100644
--- a/TODO
+++ b/TODO
@@ -1,11 +1,11 @@
-html5lib has its own Unicode, Dammit-like system. Converting the input
-to Unicode should be up to the builder. The lxml builder would use
-Unicode, Dammit, and the html5lib builder would be a no-op.
-
 Bare ampersands should be converted to HTML entities upon output.
 
-It should also be possible to convert certain Unicode characters to
-HTML entities upon output.
+It should also be possible to, on output, convert to HTML entities any
+Unicode characters found in htmlentitydefs.codepoint2name. (This
+algorithm would allow me to simplify Unicode, Dammit--convert
+everything to Unicode, and then convert to entities upon output, not
+treating smart quotes differently from any other Unicode character
+that can be represented as an entity.)
 
 XML handling:
 
@@ -21,50 +21,3 @@ as-yet-unreleased version of html5lib changes the parser's handling of
 CDATA sections to allow CDATA sections in tags like <svg> and
 <math>. The HTML5TreeBuilder will need to be updated to create CData
 objects instead of Comment objects in this situation.
-
-
-
----
-
-Here are some unit tests that fail with HTMLParser.
-
-    def testValidButBogusDeclarationFAILS(self):
-        self.assertSoupEquals('<! Foo >a', '<!Foo >a')
-
-    def testIncompleteDeclarationAtEndFAILS(self):
-        self.assertSoupEquals('a<!b')
-
-    def testIncompleteEntityAtEndFAILS(self):
-        self.assertSoupEquals('&lt;Hello&gt')
-
-        # This is not what the original author had in mind, but it's
-        # a legitimate interpretation of what they wrote.
-        self.assertSoupEquals("""<a href="foo</a>, </a><a href="bar">baz</a>""",
-        '<a href="foo&lt;/a&gt;, &lt;/a&gt;&lt;a href="></a>, <a href="bar">baz</a>')
-        # SGMLParser generates bogus parse events when attribute values
-        # contain embedded brackets, but at least Beautiful Soup fixes
-        # it up a little.
-        self.assertSoupEquals('<a b="<a>">', '<a b="&lt;a&gt;"></a><a>"></a>')
-        self.assertSoupEquals('<a href="http://foo.com/<a> and blah and blah',
-                              """<a href='"http://foo.com/'></a><a> and blah and blah</a>""")
-
-        invalidEntity = "foo&#bar;baz"
-        soup = BeautifulStoneSoup\
-               (invalidEntity,
-                convertEntities=htmlEnt)
-        self.assertEquals(str(soup), invalidEntity)
-
-
-Tag names that contain Unicode characters crash the parser:
-    def testUnicodeTagNamesFAILS(self):
-	self.assertSoupEquals("<デダ芻デダtext>2PM</デダ芻デダtext>")
-
-Here's the implementation of NavigableString.__unicode__:
-
-    def __unicode__(self):
-        return unicode(str(self))
-
-It converts the Unicode to a string, and then back to Unicode. I can't
-find any other way of turning an element of a Unicode subclass into a
-normal Unicode object. This is pretty bad and a better technique is
-welcome.