diff options
Diffstat (limited to 'TODO')
-rw-r--r-- | TODO | 41 |
1 files changed, 29 insertions, 12 deletions
@@ -1,26 +1,43 @@ -soup.new_tar("<br>") should create an empty-element tag if the soup +Bugs +---- + +* I think whitespace may not be processed correctly. + +* Characters like & < > should always be converted to HTML entities on + output, even if substitute_html_entities is False. + +Big features +------------ + +* Add namespace support. + +* soup.new_tag("<br>") should create an empty-element tag if the soup was created with an HTML-aware builder, but not otherwise. This requires keeping around information about the builder. -Is whitespace being processed correctly? +Optimizations +------------- -if len(tag) > 3 and tag.endswith('Tag'): -> endswith('_tag') markup_attr_map can be optimized since it's always a map now. -Can we get rid of isList? -Split self.assertRaises(ValueError, tree.index, 1) into a separate test -Bare ampersands should be converted to HTML entities upon output. +BS3 features not yet ported +--------------------------- + +* In BS3, "soup.aTag" is the same as 'soup.find("a")'. This lets you +locate a tag called (let's say) "find" with attribute +access. "soup.find" won't do what you want, but "soup.findTag" will. -Add namespace support. +This still works In BS4 but it's deprecated. I could make +"soup.find_tag" work the same way as "soup.find('find')", but I don't +think it's worth it. -XML handling: +CDATA +----- The elementtree XMLParser has a strip_cdata argument that, when set to False, should allow Beautiful Soup to preserve CDATA sections instead -of treating them as text. (This argument is also present for -HTMLParser, but does nothing.) - -Later: +of treating them as text. Except it doesn't. (This argument is also +present for HTMLParser, and also does nothing there.) Currently, htm5lib converts CDATA sections into comments. An as-yet-unreleased version of html5lib changes the parser's handling of |