Age | Commit message (Collapse) | Author | |
---|---|---|---|
2022-04-07 | Omit untrusted input when issuing warnings. | Leonard Richardson | |
2021-12-21 | It's now possible to customize the way output is indented by | Leonard Richardson | |
providing a value for the 'indent' argument to the Formatter constructor. The 'indent' argument works very similarly to the argument of the same name in the Python standard library's json.dump() method. [bug=1955497] | |||
2021-11-29 | Do a better job of keeping track of namespaces as an XML document is | Leonard Richardson | |
parsed, so that CSS selectors that use namespaces will do the right thing more often. [bug=1946243] | |||
2021-10-24 | Used a warning to formally deprecate the 'text' argument in favor of 'string'. | Leonard Richardson | |
2021-10-23 | Renamed the 'text' field to 'string' for real. Tests are not changed in this ↵ | Leonard Richardson | |
commit to demonstrate that the renaming doesn't break anything. [bug=1947038] | |||
2021-10-11 | Added special string classes, RubyParenthesisString and RubyTextString, | Leonard Richardson | |
to make it possible to treat ruby text specially in get_text() calls. [bug=1941980] | |||
2021-10-11 | Broke up some monolithic unit test files. | Leonard Richardson | |
2021-09-07 | Goodbye, Python 2. [bug=1942919] | Leonard Richardson | |
2021-06-01 | The 'replace_with()' method now takes a variable number of arguments, | Leonard Richardson | |
and can be used to replace a single element with a sequence of elements. Patch by Bill Chandos. | |||
2021-02-14 | NavigableString and its subclasses now implement the get_text() | Leonard Richardson | |
method, as well as the properties .strings and .stripped_strings. These methods will either return the string itself, or nothing, so the only reason to use this is when iterating over a list of mixed Tag and NavigableString objects. [bug=1904309] | |||
2021-02-13 | The behavior of methods like .get_text() and .strings now differs | Leonard Richardson | |
depending on the type of tag. The change is visible with HTML tags like <script>, <style>, and <template>. Starting in 4.9.0, methods like get_text() returned no results on such tags, because the contents of those tags are not considered 'text' within the document as a whole. But a user who calls script.get_text() is working from a different definition of 'text' than a user who calls div.get_text()--otherwise there would be no need to call script.get_text() at all. In 4.10.0, the contents of (e.g.) a <script> tag are considered 'text' during a get_text() call on the tag itself, but not considered 'text' during a get_text() call on the tag's parent. Because of this change, calling get_text() on each child of a tag may now return a different result than calling get_text() on the tag itself. That's because different tags now have different understandings of what counts as 'text'. [bug=1906226] [bug=1868861] | |||
2021-02-13 | Corrected output when the namespace prefix associated with a | Leonard Richardson | |
namespaced attribute is the empty string, as opposed to None. [bug=1915583] | |||
2020-10-02 | Implemented a significant performance optimization to the process of | Leonard Richardson | |
searching the parse tree. Patch by Morotti. [bug=1898212] | |||
2020-09-26 | Fixed a bug that inconsistently moved elements over when passing | Leonard Richardson | |
a Tag, rather than a list, into Tag.extend(). [bug=1885710] | |||
2020-05-17 | Switch entirely to Python 3-style print statements, even in Python 2. | Leonard Richardson | |
2020-05-17 | Added a keyword argument on_duplicate_attribute to the | Leonard Richardson | |
BeautifulSoupHTMLParser constructor (used by the html.parser tree builder) which lets you customize the handling of markup that contains the same attribute more than once, as in: <a href="url1" href="url2"> [bug=1878209] | |||
2020-04-24 | If you encode a document with a Python-specific encoding like | Leonard Richardson | |
'unicode_escape', that encoding is no longer mentioned in the final XML or HTML document. Instead, encoding information is omitted or left blank. [bug=1874955] | |||
2020-04-05 | Embedded CSS and Javascript is now stored in distinct Stylesheet and | Leonard Richardson | |
Script tags, which are ignored by methods like get_text(). This feature is not supported by the html5lib treebuilder. [bug=1868861] | |||
2020-04-04 | Use an :rtype: reported to work in pycharm. | Leonard Richardson | |
2020-04-04 | select() always returns a Tag, so be more specific about its return type. | Leonard Richardson | |
2020-03-09 | Make find() methods return a union type of the two most common PageElements, ↵ | Leonard Richardson | |
rather than PageElement itself. | |||
2020-03-05 | Added a performance optimization to PageElement.extract(). Patch by Arthur ↵ | Leonard Richardson | |
Darcet. | |||
2020-01-01 | API CHANGE - Added PageElement.decomposed, a new property which lets you | Leonard Richardson | |
check whether you've already called decompose() on a Tag or NavigableString. | |||
2019-12-24 | Added :rtype: to the find method docstrings. | Leonard Richardson | |
2019-12-24 | Added docstrings to diagnose.py. | Leonard Richardson | |
2019-12-18 | Added Python docstrings to all public methods in element.py. | Leonard Richardson | |
2019-11-10 | Fix deprecation warning with Python >= 3.7. | Colin Watson | |
Python >= 3.7 issues a deprecation warning when using collections.Callable rather than collections.abc.Callable. Most of Beautiful Soup deals with this by using a conditional import, but the automatic Python 3 conversion apparently translates `callable(obj)` to `isinstance(obj, collections.Callable)` which trips this deprecation warning. `isinstance(obj, Callable)` works fine in Python 2 as well as 3, so just use it directly. | |||
2019-10-05 | Avoid a crash when unpickling certain parse trees generated using html5lib ↵ | Leonard Richardson | |
on Python 3. [bug=1843545] | |||
2019-08-26 | Fixed the definition of the default XML namespace when using | Leonard Richardson | |
lxml 4.4. Patch by Isaac Muse. [bug=1840141] | |||
2019-08-21 | Copying a Tag preserves information that was originally obtained from | Leonard Richardson | |
the TreeBuilder used to build the original Tag. [bug=1838903] | |||
2019-08-21 | Explicitly set preserve_whitespace_tags to None if there is no TreeBuilder. | Leonard Richardson | |
2019-08-21 | Fixed a crash when pretty-printing tags that were not created | Leonard Richardson | |
during initial parsing. [bug=1838903] | |||
2019-07-22 | Added a section about project support to the README. | Leonard Richardson | |
2019-07-21 | Implemented line number tracking for html5lib. | Leonard Richardson | |
2019-07-21 | Adapt Chris Mayo's code to track line number and position when using ↵ | Leonard Richardson | |
html.parser. | |||
2019-07-15 | Implemented Tag.smooth. | Leonard Richardson | |
2019-07-15 | Moved the formatter to its own class and updated its documentation. | Leonard Richardson | |
2019-07-14 | Give the Formatter class more control over formatting decisions. | Leonard Richardson | |
2019-07-07 | A Formatter can now decide how (or whether) to order the attributes | Leonard Richardson | |
inside a tag. [bug=1812422] | |||
2019-07-07 | It's now possible to override a TreeBuilder's cdata_list_attributes ↵ | Leonard Richardson | |
dictionary by passing in a replacement. None will disable the feature altogether. [bug=1832978] | |||
2019-01-06 | Tried even harder to avoid the deprecation warning originally fixed in | Leonard Richardson | |
4.6.1. [bug=1778909] | |||
2019-01-06 | Fixed an incorrectly raised exception when inserting a tag before or | Leonard Richardson | |
after an identical tag. [bug=1810692] | |||
2018-12-31 | Improved and tested error checking for insert_before and insert_after. | Leonard Richardson | |
2018-12-30 | Add convienances for inserting multiple tags | Isaac Muse | |
Add extend method to append a list of tags. Make insert_before and insert_after accept multiple arguments | |||
2018-12-30 | Fixed a problem with multi-valued attributes where the value | Leonard Richardson | |
contained whitespace. Thanks to Jens Svalgaard for the fix. [bug=1787453] | |||
2018-12-24 | Clarified the software license. | Leonard Richardson | |
2018-12-24 | Issue a warning and raise a more useful exception if someone tries to call ↵ | Leonard Richardson | |
Tag.select() without SoupSieve installed. | |||
2018-12-24 | Keep track of the namespace abbreviations found while parsing the document. ↵ | Leonard Richardson | |
This makes select() work most of the time without requiring a value for 'namespaces'. | |||
2018-12-23 | Merging Isaac Muse's Soup Sieve branch as-is before making some modifications. | Leonard Richardson | |
2018-12-22 | Fix next and previous linkage issues. Fixes issues #1806598 and #1782928. | Isaac Muse | |