diff options
author | Leonard Richardson <leonard.richardson@canonical.com> | 2011-02-13 10:37:38 -0500 |
---|---|---|
committer | Leonard Richardson <leonard.richardson@canonical.com> | 2011-02-13 10:37:38 -0500 |
commit | 87a55b145f0a73e6fc9ede9a762d81d2527161b6 (patch) | |
tree | b265fc282c99140d1371962b2339bc32cde1beff /TODO | |
parent | d0531c4204a67a4289025bf7108a922f680fa057 (diff) | |
parent | 84d7f8dd319039d385b9afe1da751006be2c9859 (diff) |
Figured out the deal with CDATA sections in lxml and html5lib, and added comments and tests.
Diffstat (limited to 'TODO')
-rw-r--r-- | TODO | 17 |
1 files changed, 17 insertions, 0 deletions
@@ -7,6 +7,23 @@ Bare ampersands should be converted to HTML entities upon output. It should also be possible to convert certain Unicode characters to HTML entities upon output. +XML handling: + +The elementtree XMLParser has a strip_cdata argument that, when set to +False, should allow Beautiful Soup to preserve CDATA sections instead +of treating them as text. (This argument is also present for +HTMLParser, but does nothing.) + +Later: + +Currently, htm5lib converts CDATA sections into comments. An +as-yet-unreleased version of html5lib changes the parser's handling of +CDATA sections to allow CDATA sections in tags like <svg> and +<math>. The HTML5TreeBuilder will need to be updated to create CData +objects instead of Comment objects in this situation. + + + --- Here are some unit tests that fail with HTMLParser. |