summaryrefslogtreecommitdiff
path: root/CHANGELOG
AgeCommit message (Collapse)Author
2021-02-13The behavior of methods like .get_text() and .strings now differsLeonard Richardson
depending on the type of tag. The change is visible with HTML tags like <script>, <style>, and <template>. Starting in 4.9.0, methods like get_text() returned no results on such tags, because the contents of those tags are not considered 'text' within the document as a whole. But a user who calls script.get_text() is working from a different definition of 'text' than a user who calls div.get_text()--otherwise there would be no need to call script.get_text() at all. In 4.10.0, the contents of (e.g.) a <script> tag are considered 'text' during a get_text() call on the tag itself, but not considered 'text' during a get_text() call on the tag's parent. Because of this change, calling get_text() on each child of a tag may now return a different result than calling get_text() on the tag itself. That's because different tags now have different understandings of what counts as 'text'. [bug=1906226] [bug=1868861]
2021-02-13Corrected the use of special string container classes in cases when aLeonard Richardson
single tag may contain strings with different containers; such as the <template> tag, which may contain both TemplateString objects and Comment objects. [bug=1913406]
2021-02-13Added a second way to pass specify encodings to UnicodeDammit andLeonard Richardson
EncodingDetector, based on the order of precedence defined in the HTML5 spec, starting at: https://html.spec.whatwg.org/multipage/parsing.html#parsing-with-a-known-character-encoding Encodings in 'known_definite_encodings' are tried first, then byte-order-mark sniffing is run, then encodings in 'user_encodings' are tried. The old argument, 'override_encodings', is now a deprecated alias for 'known_definite_encodings'. This changes the default behavior of the html.parser and lxml tree builders, in a way that may slightly improve encoding detection but will probably have no effect. [bug=1889014]
2021-02-13Performance improvement when processing tags that speeds up overallLeonard Richardson
tree construction by 2%. Patch by Morotti. [bug=1899358]
2021-02-13Improve the warning issued when a directory name (as opposed toLeonard Richardson
the name of a regular file) is passed as markup into the BeautifulSoup constructor. [bug=1913628]
2021-02-13Corrected output when the namespace prefix associated with aLeonard Richardson
namespaced attribute is the empty string, as opposed to None. [bug=1915583]
2020-10-03Prepare for release.Leonard Richardson
2020-10-02Implemented a significant performance optimization to the process ofLeonard Richardson
searching the parse tree. Patch by Morotti. [bug=1898212]
2020-09-26Increment version number.Leonard Richardson
2020-09-26Fixed a bug that inconsistently moved elements over when passingLeonard Richardson
a Tag, rather than a list, into Tag.extend(). [bug=1885710]
2020-09-26Change the signatures for BeautifulSoup.insert_before and insert_afterLeonard Richardson
(which are not implemented) to match PageElement.insert_before and insert_after, quieting warnings in some IDEs. [bug=1897120]
2020-08-31Specify the soupsieve dependency in a way that complies withLeonard Richardson
PEP 508. Patch by Mike Nerone. [bug=1893696]
2020-05-30Fixed a bug that caused too many tags to be popped from the tagLeonard Richardson
stack during tree building, when encountering a closing tag that had no matching opening tag. [bug=1880420]
2020-05-17Prep for release.Leonard Richardson
2020-05-17Documented some recently added customization features.Leonard Richardson
2020-05-17Added a keyword argument on_duplicate_attribute to theLeonard Richardson
BeautifulSoupHTMLParser constructor (used by the html.parser tree builder) which lets you customize the handling of markup that contains the same attribute more than once, as in: <a href="url1" href="url2"> [bug=1878209]
2020-04-24If you encode a document with a Python-specific encoding likeLeonard Richardson
'unicode_escape', that encoding is no longer mentioned in the final XML or HTML document. Instead, encoding information is omitted or left blank. [bug=1874955]
2020-04-21Fixed typo.Leonard Richardson
2020-04-21Added two distinct UserWarning subclasses for warnings issued from the ↵Leonard Richardson
BeautifulSoup constructor which a caller may want to filter out. [bug=1873787]
2020-04-12Fixed test failures when run against soupselect 2.0. Patch by TomášLeonard Richardson
Chvátal. [bug=1872279]
2020-04-07Added a notice about the new behavior of .text to the documentation.Leonard Richardson
2020-04-05Embedded CSS and Javascript is now stored in distinct Stylesheet andLeonard Richardson
Script tags, which are ignored by methods like get_text(). This feature is not supported by the html5lib treebuilder. [bug=1868861]
2020-04-04Added a Russian translation by 'authoress' to the repository.Leonard Richardson
2020-03-10Fixed a bug that happened when passing a Unicode filename containingLeonard Richardson
non-ASCII characters as markup into Beautiful Soup, on a system that allows Unicode filenames. [bug=1866717]
2020-03-05Added a performance optimization to PageElement.extract(). Patch by Arthur ↵Leonard Richardson
Darcet.
2020-01-01API CHANGE - Added PageElement.decomposed, a new property which lets youLeonard Richardson
check whether you've already called decompose() on a Tag or NavigableString.
2019-12-29Fixed an unhandled exception when formatting a Tag that had been ↵Leonard Richardson
decomposed.[bug=1857767]
2019-12-24Added docstrings for some but not all tree buidlers.Leonard Richardson
2019-12-24Wrote docstrings for formatter.py.Leonard Richardson
2019-12-24Fixed deprecation warning. [bug=1855301]Leonard Richardson
2019-12-18Added Python docstrings to all public methods in element.py.Leonard Richardson
2019-11-11The html.parser tree builder now correctly handles DOCTYPEs that areLeonard Richardson
not uppercase. [bug=1848401]
2019-11-11Fixed a deprecation warning on Python 3.7. Patch by ColinLeonard Richardson
Watson. [bug=1847592]
2019-10-06Added section on Python 2 sunsetting.Leonard Richardson
2019-10-05Avoid a crash when unpickling certain parse trees generated using html5lib ↵Leonard Richardson
on Python 3. [bug=1843545]
2019-09-02Avoid a crash when trying to detect the declared encoding of aLeonard Richardson
Unicode document. Raise an explanatory exception when the underlying parser completely rejects the incoming markup. [bug=1838877]
2019-08-26Fixed the definition of the default XML namespace when usingLeonard Richardson
lxml 4.4. Patch by Isaac Muse. [bug=1840141]
2019-08-21When instantiating a BeautifulSoup object, it's now possible toLeonard Richardson
provide replacement classes to be instantiated for every tag ('tag_class') or string ('string_class') encountered during parsing, rather than using the default Tag and NavigableString objects.
2019-08-21Copying a Tag preserves information that was originally obtained fromLeonard Richardson
the TreeBuilder used to build the original Tag. [bug=1838903]
2019-08-21Fixed a crash when pretty-printing tags that were not createdLeonard Richardson
during initial parsing. [bug=1838903]
2019-07-22Added a section about project support to the README.Leonard Richardson
2019-07-21Implemented line number tracking for html5lib.Leonard Richardson
2019-07-21Adapt Chris Mayo's code to track line number and position when using ↵Leonard Richardson
html.parser.
2019-07-20Minor changes to docs and CHANGELOG.Leonard Richardson
2019-07-19Clarified the changelog.Leonard Richardson
2019-07-16Prep for release.Leonard Richardson
2019-07-16Added documentation for Tag.smooth().Leonard Richardson
2019-07-07A Formatter can now decide how (or whether) to order the attributesLeonard Richardson
inside a tag. [bug=1812422]
2019-07-07&apos; (which is valid in XML and XHTML, but not HTML 4) is nowLeonard Richardson
recognized as a named entity and converted to a single quote. [bug=1818721]
2019-07-07Renamed the cdata_list_attributes argument to multi_valued_attributes since ↵Leonard Richardson
it's facing the end-user and that's a more easily understandable name.