diff options
author | Leonard Richardson <leonard.richardson@canonical.com> | 2011-02-18 12:53:33 -0500 |
---|---|---|
committer | Leonard Richardson <leonard.richardson@canonical.com> | 2011-02-18 12:53:33 -0500 |
commit | b5fa9d7f5579f22f5fe0f7c9dc63e0aa7d29262f (patch) | |
tree | f089e9dee8109e0fdfae2589cd8228d4ddee5939 /TODO | |
parent | 5962a409b04b8a78d78e9186da97bedbb67df8e6 (diff) |
By default, Unicode Dammit converts smart quotes to Unicode characters, not XML entities.
Diffstat (limited to 'TODO')
-rw-r--r-- | TODO | 12 |
1 files changed, 6 insertions, 6 deletions
@@ -1,11 +1,11 @@ -html5lib has its own Unicode, Dammit-like system. Converting the input -to Unicode should be up to the builder. The lxml builder would use -Unicode, Dammit, and the html5lib builder would be a no-op. - Bare ampersands should be converted to HTML entities upon output. -It should also be possible to convert certain Unicode characters to -HTML entities upon output. +It should also be possible to, on output, convert to HTML entities any +Unicode characters found in htmlentitydefs.codepoint2name. (This +algorithm would allow me to simplify Unicode, Dammit--convert +everything to Unicode, and then convert to entities upon output, not +treating smart quotes differently from any other Unicode character +that can be represented as an entity.) XML handling: |