blob: a799bbbca809460c0a184275f6e1a1957cb574b4 (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
|
Bare ampersands should be converted to HTML entities upon output.
It should also be possible to, on output, convert to HTML entities any
Unicode characters found in htmlentitydefs.codepoint2name. (This
algorithm would allow me to simplify Unicode, Dammit--convert
everything to Unicode, and then convert to entities upon output, not
treating smart quotes differently from any other Unicode character
that can be represented as an entity.)
XML handling:
The elementtree XMLParser has a strip_cdata argument that, when set to
False, should allow Beautiful Soup to preserve CDATA sections instead
of treating them as text. (This argument is also present for
HTMLParser, but does nothing.)
Later:
Currently, htm5lib converts CDATA sections into comments. An
as-yet-unreleased version of html5lib changes the parser's handling of
CDATA sections to allow CDATA sections in tags like <svg> and
<math>. The HTML5TreeBuilder will need to be updated to create CData
objects instead of Comment objects in this situation.
|