diff options
author | Leonard Richardson <leonardr@segfault.org> | 2019-07-16 20:31:39 -0400 |
---|---|---|
committer | Leonard Richardson <leonardr@segfault.org> | 2019-07-16 20:31:39 -0400 |
commit | fc71bc1c04e0495c34a5b78ec21895e32848b344 (patch) | |
tree | 3a80bb499deecda8be343d8ad3935c30d2158e26 | |
parent | 0bd336741b26269108e8b345b92d8904c6092980 (diff) |
Added documentation for Tag.smooth().
-rw-r--r-- | CHANGELOG | 30 | ||||
-rw-r--r-- | doc/source/index.rst | 34 |
2 files changed, 55 insertions, 9 deletions
@@ -1,16 +1,28 @@ -= Unreleased += 4.8.0 (Unreleased) * It's now possible to customize the TreeBuilder object by passing - keyword arguments into the BeautifulSoup constructor. The main - reason to do this right now is to change how multi-valued - attributes are treated -- you can do this with the - `multi_valued_attributes` argument. [bug=1832978] + keyword arguments into the BeautifulSoup constructor. The main + reason to do this right now is to change how multi-valued + attributes are treated -- you can do this with the + `multi_valued_attributes` argument. [bug=1832978] -* A Formatter can now decide how (or whether) to order the attributes - inside a tag. [bug=1812422] +* The role of Formatter objects has been greatly expanded. It now contains + consolidated code for controlling the following: -* ' (which is valid in XML and XHTML, but not HTML 4) is now - recognized as a named entity and converted to a single quote. [bug=1818721] + - The function to call to perform entity substitution. (This was + previously Formatter's only job.) + - Which tags should be treated as containing CDATA and have their + contents exempt from entity substitution. + - The order in which a tag's attributes are output. [bug=1812422] + - Whether or not to put a '/' inside a void element, e.g. '<br/>' vs '<br>' + + All preexisting code should work as before. + +* Added a new method to the API, Tag.smooth(), which consolidates + multiple adjacent NavigableString elements. + +* ' (which is valid in XML, XHTML, and HTML 5, but not HTML 4) is now + recognized as a named entity and converted to a single quote. [bug=1818721] = 4.7.1 (20190106) diff --git a/doc/source/index.rst b/doc/source/index.rst index 4bca0ae..9ef8ef4 100644 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -2112,6 +2112,40 @@ whatever's inside that tag. It's good for stripping out markup:: Like ``replace_with()``, ``unwrap()`` returns the tag that was replaced. +``smooth()`` +--------------------------- + +After calling a bunch of methods that modify the parse tree, you may end up with two or more ``NavigableString`` objects next to each other. Beautiful Soup doesn't have any problems with this, but since it can't happen in a freshly parsed document, you might not expect behavior like the following:: + + soup = BeautifulSoup("<p>A one</p>") + soup.p.append(", a two") + + soup.p.contents + # [u'A one', u', a two'] + + print(soup.p.encode()) + # <p>A one, a two</p> + + print(soup.p.prettify()) + # <p> + # A one + # , a two + # </p> + +You can call ``Tag.smooth()`` to clean up the parse tree by consolidating adjacent strings:: + + soup.smooth() + + soup.p.contents + # [u'A one, a two'] + + print(soup.p.prettify()) + # <p> + # A one, a two + # </p> + +The ``smooth()`` method is new in Beautiful Soup 4.8.0. + Output ====== |