summaryrefslogtreecommitdiff
path: root/bs4/element.py
AgeCommit message (Collapse)Author
2021-02-13The behavior of methods like .get_text() and .strings now differsLeonard Richardson
depending on the type of tag. The change is visible with HTML tags like <script>, <style>, and <template>. Starting in 4.9.0, methods like get_text() returned no results on such tags, because the contents of those tags are not considered 'text' within the document as a whole. But a user who calls script.get_text() is working from a different definition of 'text' than a user who calls div.get_text()--otherwise there would be no need to call script.get_text() at all. In 4.10.0, the contents of (e.g.) a <script> tag are considered 'text' during a get_text() call on the tag itself, but not considered 'text' during a get_text() call on the tag's parent. Because of this change, calling get_text() on each child of a tag may now return a different result than calling get_text() on the tag itself. That's because different tags now have different understandings of what counts as 'text'. [bug=1906226] [bug=1868861]
2021-02-13Corrected output when the namespace prefix associated with aLeonard Richardson
namespaced attribute is the empty string, as opposed to None. [bug=1915583]
2020-10-02Implemented a significant performance optimization to the process ofLeonard Richardson
searching the parse tree. Patch by Morotti. [bug=1898212]
2020-09-26Fixed a bug that inconsistently moved elements over when passingLeonard Richardson
a Tag, rather than a list, into Tag.extend(). [bug=1885710]
2020-05-17Switch entirely to Python 3-style print statements, even in Python 2.Leonard Richardson
2020-05-17Added a keyword argument on_duplicate_attribute to theLeonard Richardson
BeautifulSoupHTMLParser constructor (used by the html.parser tree builder) which lets you customize the handling of markup that contains the same attribute more than once, as in: <a href="url1" href="url2"> [bug=1878209]
2020-04-24If you encode a document with a Python-specific encoding likeLeonard Richardson
'unicode_escape', that encoding is no longer mentioned in the final XML or HTML document. Instead, encoding information is omitted or left blank. [bug=1874955]
2020-04-05Embedded CSS and Javascript is now stored in distinct Stylesheet andLeonard Richardson
Script tags, which are ignored by methods like get_text(). This feature is not supported by the html5lib treebuilder. [bug=1868861]
2020-04-04Use an :rtype: reported to work in pycharm.Leonard Richardson
2020-04-04select() always returns a Tag, so be more specific about its return type.Leonard Richardson
2020-03-09Make find() methods return a union type of the two most common PageElements, ↵Leonard Richardson
rather than PageElement itself.
2020-03-05Added a performance optimization to PageElement.extract(). Patch by Arthur ↵Leonard Richardson
Darcet.
2020-01-01API CHANGE - Added PageElement.decomposed, a new property which lets youLeonard Richardson
check whether you've already called decompose() on a Tag or NavigableString.
2019-12-24Added :rtype: to the find method docstrings.Leonard Richardson
2019-12-24Added docstrings to diagnose.py.Leonard Richardson
2019-12-18Added Python docstrings to all public methods in element.py.Leonard Richardson
2019-11-10Fix deprecation warning with Python >= 3.7.Colin Watson
Python >= 3.7 issues a deprecation warning when using collections.Callable rather than collections.abc.Callable. Most of Beautiful Soup deals with this by using a conditional import, but the automatic Python 3 conversion apparently translates `callable(obj)` to `isinstance(obj, collections.Callable)` which trips this deprecation warning. `isinstance(obj, Callable)` works fine in Python 2 as well as 3, so just use it directly.
2019-10-05Avoid a crash when unpickling certain parse trees generated using html5lib ↵Leonard Richardson
on Python 3. [bug=1843545]
2019-08-26Fixed the definition of the default XML namespace when usingLeonard Richardson
lxml 4.4. Patch by Isaac Muse. [bug=1840141]
2019-08-21Copying a Tag preserves information that was originally obtained fromLeonard Richardson
the TreeBuilder used to build the original Tag. [bug=1838903]
2019-08-21Explicitly set preserve_whitespace_tags to None if there is no TreeBuilder.Leonard Richardson
2019-08-21Fixed a crash when pretty-printing tags that were not createdLeonard Richardson
during initial parsing. [bug=1838903]
2019-07-22Added a section about project support to the README.Leonard Richardson
2019-07-21Implemented line number tracking for html5lib.Leonard Richardson
2019-07-21Adapt Chris Mayo's code to track line number and position when using ↵Leonard Richardson
html.parser.
2019-07-15Implemented Tag.smooth.Leonard Richardson
2019-07-15Moved the formatter to its own class and updated its documentation.Leonard Richardson
2019-07-14Give the Formatter class more control over formatting decisions.Leonard Richardson
2019-07-07A Formatter can now decide how (or whether) to order the attributesLeonard Richardson
inside a tag. [bug=1812422]
2019-07-07It's now possible to override a TreeBuilder's cdata_list_attributes ↵Leonard Richardson
dictionary by passing in a replacement. None will disable the feature altogether. [bug=1832978]
2019-01-06Tried even harder to avoid the deprecation warning originally fixed inLeonard Richardson
4.6.1. [bug=1778909]
2019-01-06Fixed an incorrectly raised exception when inserting a tag before orLeonard Richardson
after an identical tag. [bug=1810692]
2018-12-31Improved and tested error checking for insert_before and insert_after.Leonard Richardson
2018-12-30Add convienances for inserting multiple tagsIsaac Muse
Add extend method to append a list of tags. Make insert_before and insert_after accept multiple arguments
2018-12-30Fixed a problem with multi-valued attributes where the valueLeonard Richardson
contained whitespace. Thanks to Jens Svalgaard for the fix. [bug=1787453]
2018-12-24Clarified the software license.Leonard Richardson
2018-12-24Issue a warning and raise a more useful exception if someone tries to call ↵Leonard Richardson
Tag.select() without SoupSieve installed.
2018-12-24Keep track of the namespace abbreviations found while parsing the document. ↵Leonard Richardson
This makes select() work most of the time without requiring a value for 'namespaces'.
2018-12-23Merging Isaac Muse's Soup Sieve branch as-is before making some modifications.Leonard Richardson
2018-12-22Fix next and previous linkage issues. Fixes issues #1806598 and #1782928.Isaac Muse
2018-12-20Pass flags to soupsieve.Isaac Muse
2018-12-19Add Soup Sieve supportIsaac Muse
2018-07-30Fix an exception when a custom formatter was asked to format a voidLeonard Richardson
element. [bug=1784408]
2018-07-28When markup contains duplicate elements, a select() call thatLeonard Richardson
includes multiple match clauses will match all relevant elements. [bug=1770596]
2018-07-21Clarified the deprecation warning when accessing tag.fooTag, to coverLeonard Richardson
the possibility that you might really have been looking for a tag called 'fooTag'.
2018-07-18Fixed a bug where find_all() was not working when asked to find aLeonard Richardson
tag with a namespaced name in an XML document that was parsed as HTML. [bug=1723783]
2018-07-15Introduced the Formatter system. [bug=1716272].Leonard Richardson
2018-07-15It's possible for a TreeBuilder subclass to specify that voidLeonard Richardson
elements should be represented as <element> rather than <element/>, by setting TreeBuilder.void_element_close_prefix to the empty string. [bug=1716272]
2018-07-14Fixed a disconnected parse tree when one BeautifulSoup object wasLeonard Richardson
inserted into another. [bug=1105148]
2018-07-14Fixed code that was causing deprecation warnings in recent Python 3Leonard Richardson
versions. Includes a patch from Ville Skyttä. [bug=1778909] [bug=1689496]