summaryrefslogtreecommitdiff
path: root/TODO
diff options
context:
space:
mode:
authorLeonard Richardson <leonard.richardson@canonical.com>2011-02-13 10:37:38 -0500
committerLeonard Richardson <leonard.richardson@canonical.com>2011-02-13 10:37:38 -0500
commit87a55b145f0a73e6fc9ede9a762d81d2527161b6 (patch)
treeb265fc282c99140d1371962b2339bc32cde1beff /TODO
parentd0531c4204a67a4289025bf7108a922f680fa057 (diff)
parent84d7f8dd319039d385b9afe1da751006be2c9859 (diff)
Figured out the deal with CDATA sections in lxml and html5lib, and added comments and tests.
Diffstat (limited to 'TODO')
-rw-r--r--TODO17
1 files changed, 17 insertions, 0 deletions
diff --git a/TODO b/TODO
index 9792743..ea32bbb 100644
--- a/TODO
+++ b/TODO
@@ -7,6 +7,23 @@ Bare ampersands should be converted to HTML entities upon output.
It should also be possible to convert certain Unicode characters to
HTML entities upon output.
+XML handling:
+
+The elementtree XMLParser has a strip_cdata argument that, when set to
+False, should allow Beautiful Soup to preserve CDATA sections instead
+of treating them as text. (This argument is also present for
+HTMLParser, but does nothing.)
+
+Later:
+
+Currently, htm5lib converts CDATA sections into comments. An
+as-yet-unreleased version of html5lib changes the parser's handling of
+CDATA sections to allow CDATA sections in tags like <svg> and
+<math>. The HTML5TreeBuilder will need to be updated to create CData
+objects instead of Comment objects in this situation.
+
+
+
---
Here are some unit tests that fail with HTMLParser.