Figured out the deal with CDATA sections in lxml and html5lib, and added comments and tests.

author: Leonard Richardson <leonard.richardson@canonical.com> 2011-02-13 10:37:38 -0500
committer: Leonard Richardson <leonard.richardson@canonical.com> 2011-02-13 10:37:38 -0500
commit: 87a55b145f0a73e6fc9ede9a762d81d2527161b6 (patch)
tree: b265fc282c99140d1371962b2339bc32cde1beff /TODO
parent: d0531c4204a67a4289025bf7108a922f680fa057 (diff)
parent: 84d7f8dd319039d385b9afe1da751006be2c9859 (diff)
1 files changed, 17 insertions, 0 deletions
diff --git a/TODO b/TODO
index 9792743..ea32bbb 100644
--- a/TODO
+++ b/TODO
@@ -7,6 +7,23 @@ Bare ampersands should be converted to HTML entities upon output.
 It should also be possible to convert certain Unicode characters to
 HTML entities upon output.
 
+XML handling:
+
+The elementtree XMLParser has a strip_cdata argument that, when set to
+False, should allow Beautiful Soup to preserve CDATA sections instead
+of treating them as text. (This argument is also present for
+HTMLParser, but does nothing.)
+
+Later:
+
+Currently, htm5lib converts CDATA sections into comments. An
+as-yet-unreleased version of html5lib changes the parser's handling of
+CDATA sections to allow CDATA sections in tags like <svg> and
+<math>. The HTML5TreeBuilder will need to be updated to create CData
+objects instead of Comment objects in this situation.
+
+
+
 ---
 
 Here are some unit tests that fail with HTMLParser.
author	Leonard Richardson <leonard.richardson@canonical.com>	2011-02-13 10:37:38 -0500
committer	Leonard Richardson <leonard.richardson@canonical.com>	2011-02-13 10:37:38 -0500
commit	87a55b145f0a73e6fc9ede9a762d81d2527161b6 (patch)
tree	b265fc282c99140d1371962b2339bc32cde1beff /TODO
parent	d0531c4204a67a4289025bf7108a922f680fa057 (diff)
parent	84d7f8dd319039d385b9afe1da751006be2c9859 (diff)