diff options
-rw-r--r-- | CHANGELOG | 30 | ||||
-rw-r--r-- | doc/source/index.rst | 34 |
2 files changed, 55 insertions, 9 deletions
@@ -1,16 +1,28 @@ -= Unreleased += 4.8.0 (Unreleased) * It's now possible to customize the TreeBuilder object by passing - keyword arguments into the BeautifulSoup constructor. The main - reason to do this right now is to change how multi-valued - attributes are treated -- you can do this with the - `multi_valued_attributes` argument. [bug=1832978] + keyword arguments into the BeautifulSoup constructor. The main + reason to do this right now is to change how multi-valued + attributes are treated -- you can do this with the + `multi_valued_attributes` argument. [bug=1832978] -* A Formatter can now decide how (or whether) to order the attributes - inside a tag. [bug=1812422] +* The role of Formatter objects has been greatly expanded. It now contains + consolidated code for controlling the following: -* ' (which is valid in XML and XHTML, but not HTML 4) is now - recognized as a named entity and converted to a single quote. [bug=1818721] + - The function to call to perform entity substitution. (This was + previously Formatter's only job.) + - Which tags should be treated as containing CDATA and have their + contents exempt from entity substitution. + - The order in which a tag's attributes are output. [bug=1812422] + - Whether or not to put a '/' inside a void element, e.g. '<br/>' vs '<br>' + + All preexisting code should work as before. + +* Added a new method to the API, Tag.smooth(), which consolidates + multiple adjacent NavigableString elements. + +* ' (which is valid in XML, XHTML, and HTML 5, but not HTML 4) is now + recognized as a named entity and converted to a single quote. [bug=1818721] = 4.7.1 (20190106) diff --git a/doc/source/index.rst b/doc/source/index.rst index 4bca0ae..9ef8ef4 100644 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -2112,6 +2112,40 @@ whatever's inside that tag. It's good for stripping out markup:: Like ``replace_with()``, ``unwrap()`` returns the tag that was replaced. +``smooth()`` +--------------------------- + +After calling a bunch of methods that modify the parse tree, you may end up with two or more ``NavigableString`` objects next to each other. Beautiful Soup doesn't have any problems with this, but since it can't happen in a freshly parsed document, you might not expect behavior like the following:: + + soup = BeautifulSoup("<p>A one</p>") + soup.p.append(", a two") + + soup.p.contents + # [u'A one', u', a two'] + + print(soup.p.encode()) + # <p>A one, a two</p> + + print(soup.p.prettify()) + # <p> + # A one + # , a two + # </p> + +You can call ``Tag.smooth()`` to clean up the parse tree by consolidating adjacent strings:: + + soup.smooth() + + soup.p.contents + # [u'A one, a two'] + + print(soup.p.prettify()) + # <p> + # A one, a two + # </p> + +The ``smooth()`` method is new in Beautiful Soup 4.8.0. + Output ====== |