1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
|
Tag.insert_before() and Tag.insert_after()
Also, I think you can avoid the variable altogether by having repr
return the version without substituting the html entities. This seems
fine because a truly canonical representation of the object itself
would not be all that useful compared to the unicode
representation. Of course, this breaks anything
---------------------
Bugs
----
* I think whitespace may not be processed correctly.
* Characters like & < > should always be converted to HTML entities on
output, even if substitute_html_entities is False.
* html5lib doesn't support SoupStrainers, which is OK, but there
should be a warning about it.
Big features
------------
* Add namespace support.
* soup.new_tag("<br>") should create an empty-element tag if the soup
was created with an HTML-aware builder, but not otherwise. This
requires keeping around information about the builder.
Optimizations
-------------
markup_attr_map can be optimized since it's always a map now.
BS3 features not yet ported
---------------------------
* In BS3, "soup.aTag" is the same as 'soup.find("a")'. This lets you
locate a tag called (let's say) "find" with attribute
access. "soup.find" won't do what you want, but "soup.findTag" will.
This still works In BS4 but it's deprecated. I could make
"soup.find_tag" work the same way as "soup.find('find')", but I don't
think it's worth it.
CDATA
-----
The elementtree XMLParser has a strip_cdata argument that, when set to
False, should allow Beautiful Soup to preserve CDATA sections instead
of treating them as text. Except it doesn't. (This argument is also
present for HTMLParser, and also does nothing there.)
Currently, htm5lib converts CDATA sections into comments. An
as-yet-unreleased version of html5lib changes the parser's handling of
CDATA sections to allow CDATA sections in tags like <svg> and
<math>. The HTML5TreeBuilder will need to be updated to create CData
objects instead of Comment objects in this situation.
|