summaryrefslogtreecommitdiff
path: root/bs4/element.py
AgeCommit message (Collapse)Author
2022-04-07Omit untrusted input when issuing warnings.Leonard Richardson
2021-12-21It's now possible to customize the way output is indented byLeonard Richardson
providing a value for the 'indent' argument to the Formatter constructor. The 'indent' argument works very similarly to the argument of the same name in the Python standard library's json.dump() method. [bug=1955497]
2021-11-29Do a better job of keeping track of namespaces as an XML document isLeonard Richardson
parsed, so that CSS selectors that use namespaces will do the right thing more often. [bug=1946243]
2021-10-24Used a warning to formally deprecate the 'text' argument in favor of 'string'.Leonard Richardson
2021-10-23Renamed the 'text' field to 'string' for real. Tests are not changed in this ↵Leonard Richardson
commit to demonstrate that the renaming doesn't break anything. [bug=1947038]
2021-10-11Added special string classes, RubyParenthesisString and RubyTextString,Leonard Richardson
to make it possible to treat ruby text specially in get_text() calls. [bug=1941980]
2021-10-11Broke up some monolithic unit test files.Leonard Richardson
2021-09-07Goodbye, Python 2. [bug=1942919]Leonard Richardson
2021-06-01The 'replace_with()' method now takes a variable number of arguments,Leonard Richardson
and can be used to replace a single element with a sequence of elements. Patch by Bill Chandos.
2021-02-14NavigableString and its subclasses now implement the get_text()Leonard Richardson
method, as well as the properties .strings and .stripped_strings. These methods will either return the string itself, or nothing, so the only reason to use this is when iterating over a list of mixed Tag and NavigableString objects. [bug=1904309]
2021-02-13The behavior of methods like .get_text() and .strings now differsLeonard Richardson
depending on the type of tag. The change is visible with HTML tags like <script>, <style>, and <template>. Starting in 4.9.0, methods like get_text() returned no results on such tags, because the contents of those tags are not considered 'text' within the document as a whole. But a user who calls script.get_text() is working from a different definition of 'text' than a user who calls div.get_text()--otherwise there would be no need to call script.get_text() at all. In 4.10.0, the contents of (e.g.) a <script> tag are considered 'text' during a get_text() call on the tag itself, but not considered 'text' during a get_text() call on the tag's parent. Because of this change, calling get_text() on each child of a tag may now return a different result than calling get_text() on the tag itself. That's because different tags now have different understandings of what counts as 'text'. [bug=1906226] [bug=1868861]
2021-02-13Corrected output when the namespace prefix associated with aLeonard Richardson
namespaced attribute is the empty string, as opposed to None. [bug=1915583]
2020-10-02Implemented a significant performance optimization to the process ofLeonard Richardson
searching the parse tree. Patch by Morotti. [bug=1898212]
2020-09-26Fixed a bug that inconsistently moved elements over when passingLeonard Richardson
a Tag, rather than a list, into Tag.extend(). [bug=1885710]
2020-05-17Switch entirely to Python 3-style print statements, even in Python 2.Leonard Richardson
2020-05-17Added a keyword argument on_duplicate_attribute to theLeonard Richardson
BeautifulSoupHTMLParser constructor (used by the html.parser tree builder) which lets you customize the handling of markup that contains the same attribute more than once, as in: <a href="url1" href="url2"> [bug=1878209]
2020-04-24If you encode a document with a Python-specific encoding likeLeonard Richardson
'unicode_escape', that encoding is no longer mentioned in the final XML or HTML document. Instead, encoding information is omitted or left blank. [bug=1874955]
2020-04-05Embedded CSS and Javascript is now stored in distinct Stylesheet andLeonard Richardson
Script tags, which are ignored by methods like get_text(). This feature is not supported by the html5lib treebuilder. [bug=1868861]
2020-04-04Use an :rtype: reported to work in pycharm.Leonard Richardson
2020-04-04select() always returns a Tag, so be more specific about its return type.Leonard Richardson
2020-03-09Make find() methods return a union type of the two most common PageElements, ↵Leonard Richardson
rather than PageElement itself.
2020-03-05Added a performance optimization to PageElement.extract(). Patch by Arthur ↵Leonard Richardson
Darcet.
2020-01-01API CHANGE - Added PageElement.decomposed, a new property which lets youLeonard Richardson
check whether you've already called decompose() on a Tag or NavigableString.
2019-12-24Added :rtype: to the find method docstrings.Leonard Richardson
2019-12-24Added docstrings to diagnose.py.Leonard Richardson
2019-12-18Added Python docstrings to all public methods in element.py.Leonard Richardson
2019-11-10Fix deprecation warning with Python >= 3.7.Colin Watson
Python >= 3.7 issues a deprecation warning when using collections.Callable rather than collections.abc.Callable. Most of Beautiful Soup deals with this by using a conditional import, but the automatic Python 3 conversion apparently translates `callable(obj)` to `isinstance(obj, collections.Callable)` which trips this deprecation warning. `isinstance(obj, Callable)` works fine in Python 2 as well as 3, so just use it directly.
2019-10-05Avoid a crash when unpickling certain parse trees generated using html5lib ↵Leonard Richardson
on Python 3. [bug=1843545]
2019-08-26Fixed the definition of the default XML namespace when usingLeonard Richardson
lxml 4.4. Patch by Isaac Muse. [bug=1840141]
2019-08-21Copying a Tag preserves information that was originally obtained fromLeonard Richardson
the TreeBuilder used to build the original Tag. [bug=1838903]
2019-08-21Explicitly set preserve_whitespace_tags to None if there is no TreeBuilder.Leonard Richardson
2019-08-21Fixed a crash when pretty-printing tags that were not createdLeonard Richardson
during initial parsing. [bug=1838903]
2019-07-22Added a section about project support to the README.Leonard Richardson
2019-07-21Implemented line number tracking for html5lib.Leonard Richardson
2019-07-21Adapt Chris Mayo's code to track line number and position when using ↵Leonard Richardson
html.parser.
2019-07-15Implemented Tag.smooth.Leonard Richardson
2019-07-15Moved the formatter to its own class and updated its documentation.Leonard Richardson
2019-07-14Give the Formatter class more control over formatting decisions.Leonard Richardson
2019-07-07A Formatter can now decide how (or whether) to order the attributesLeonard Richardson
inside a tag. [bug=1812422]
2019-07-07It's now possible to override a TreeBuilder's cdata_list_attributes ↵Leonard Richardson
dictionary by passing in a replacement. None will disable the feature altogether. [bug=1832978]
2019-01-06Tried even harder to avoid the deprecation warning originally fixed inLeonard Richardson
4.6.1. [bug=1778909]
2019-01-06Fixed an incorrectly raised exception when inserting a tag before orLeonard Richardson
after an identical tag. [bug=1810692]
2018-12-31Improved and tested error checking for insert_before and insert_after.Leonard Richardson
2018-12-30Add convienances for inserting multiple tagsIsaac Muse
Add extend method to append a list of tags. Make insert_before and insert_after accept multiple arguments
2018-12-30Fixed a problem with multi-valued attributes where the valueLeonard Richardson
contained whitespace. Thanks to Jens Svalgaard for the fix. [bug=1787453]
2018-12-24Clarified the software license.Leonard Richardson
2018-12-24Issue a warning and raise a more useful exception if someone tries to call ↵Leonard Richardson
Tag.select() without SoupSieve installed.
2018-12-24Keep track of the namespace abbreviations found while parsing the document. ↵Leonard Richardson
This makes select() work most of the time without requiring a value for 'namespaces'.
2018-12-23Merging Isaac Muse's Soup Sieve branch as-is before making some modifications.Leonard Richardson
2018-12-22Fix next and previous linkage issues. Fixes issues #1806598 and #1782928.Isaac Muse