Age | Commit message (Collapse) | Author | |
---|---|---|---|
2021-02-14 | The 'html5' formatter now treats attributes whose values are the | Leonard Richardson | |
empty string as HTML boolean attributes. Previously (and in other formatters), an attribute value must be set as None to be treated as a boolean attribute. In a future release, I plan to also give this behavior to the 'html' formatter. Patch by Isaac Muse. [bug=1915424] | |||
2021-02-13 | The behavior of methods like .get_text() and .strings now differs | Leonard Richardson | |
depending on the type of tag. The change is visible with HTML tags like <script>, <style>, and <template>. Starting in 4.9.0, methods like get_text() returned no results on such tags, because the contents of those tags are not considered 'text' within the document as a whole. But a user who calls script.get_text() is working from a different definition of 'text' than a user who calls div.get_text()--otherwise there would be no need to call script.get_text() at all. In 4.10.0, the contents of (e.g.) a <script> tag are considered 'text' during a get_text() call on the tag itself, but not considered 'text' during a get_text() call on the tag's parent. Because of this change, calling get_text() on each child of a tag may now return a different result than calling get_text() on the tag itself. That's because different tags now have different understandings of what counts as 'text'. [bug=1906226] [bug=1868861] | |||
2021-02-13 | Corrected the use of special string container classes in cases when a | Leonard Richardson | |
single tag may contain strings with different containers; such as the <template> tag, which may contain both TemplateString objects and Comment objects. [bug=1913406] | |||
2021-02-13 | Added a second way to pass specify encodings to UnicodeDammit and | Leonard Richardson | |
EncodingDetector, based on the order of precedence defined in the HTML5 spec, starting at: https://html.spec.whatwg.org/multipage/parsing.html#parsing-with-a-known-character-encoding Encodings in 'known_definite_encodings' are tried first, then byte-order-mark sniffing is run, then encodings in 'user_encodings' are tried. The old argument, 'override_encodings', is now a deprecated alias for 'known_definite_encodings'. This changes the default behavior of the html.parser and lxml tree builders, in a way that may slightly improve encoding detection but will probably have no effect. [bug=1889014] | |||
2021-02-13 | Performance improvement when processing tags that speeds up overall | Leonard Richardson | |
tree construction by 2%. Patch by Morotti. [bug=1899358] | |||
2021-02-13 | Improve the warning issued when a directory name (as opposed to | Leonard Richardson | |
the name of a regular file) is passed as markup into the BeautifulSoup constructor. [bug=1913628] | |||
2021-02-13 | Corrected output when the namespace prefix associated with a | Leonard Richardson | |
namespaced attribute is the empty string, as opposed to None. [bug=1915583] | |||
2020-10-24 | Exclude more tests from the package. Patch by Ville Skyttä. | Leonard Richardson | |
2020-10-24 | Fix tests install exclusion | Ville Skyttä | |
2020-10-03 | I always forget to bump the version number in the doc. | Leonard Richardson | |
2020-10-03 | Prepare for release. | Leonard Richardson | |
2020-10-02 | Implemented a significant performance optimization to the process of | Leonard Richardson | |
searching the parse tree. Patch by Morotti. [bug=1898212] | |||
2020-09-26 | Changed version number of development Python in use. | Leonard Richardson | |
2020-09-26 | Incremented version number in the documentation. | Leonard Richardson | |
2020-09-26 | Increment version number. | Leonard Richardson | |
2020-09-26 | Fixed a bug that inconsistently moved elements over when passing | Leonard Richardson | |
a Tag, rather than a list, into Tag.extend(). [bug=1885710] | |||
2020-09-26 | Change the signatures for BeautifulSoup.insert_before and insert_after | Leonard Richardson | |
(which are not implemented) to match PageElement.insert_before and insert_after, quieting warnings in some IDEs. [bug=1897120] | |||
2020-08-31 | Specify the soupsieve dependency in a way that complies with | Leonard Richardson | |
PEP 508. Patch by Mike Nerone. [bug=1893696] | |||
2020-08-31 | Correct PyPI dep metadata (PEP 508 env markers instead of a condition in ↵ | Mike Nerone | |
setup.py) | |||
2020-07-29 | Ran through all of the documentation code examples using Python 3, corrected ↵ | Leonard Richardson | |
discrepancies and errors, and updated representations. | |||
2020-07-24 | Added a paragraph to the documentation about the fact that bs4 Tag ↵ | Leonard Richardson | |
implements __hash__ and bs3 Tag doesn't. | |||
2020-06-11 | Converted the sample code in README.md to Python 3. | Leonard Richardson | |
2020-05-31 | Make the doc a little less defensive. | Leonard Richardson | |
2020-05-31 | Added to the troubleshooting section a bit to catch searches for the ↵ | Leonard Richardson | |
AttributeError that happens if you treat a string like a tag. | |||
2020-05-30 | Fixed a bug that caused too many tags to be popped from the tag | Leonard Richardson | |
stack during tree building, when encountering a closing tag that had no matching opening tag. [bug=1880420] | |||
2020-05-30 | Remove explicit reference to the module name within the module, replacing it ↵ | Leonard Richardson | |
with __name__. | |||
2020-05-17 | Prep for release. | Leonard Richardson | |
2020-05-17 | Switch entirely to Python 3-style print statements, even in Python 2. | Leonard Richardson | |
2020-05-17 | Documented some recently added customization features. | Leonard Richardson | |
2020-05-17 | Added docstring for BeautifulSoup.new_tag. | Leonard Richardson | |
2020-05-17 | Added a keyword argument on_duplicate_attribute to the | Leonard Richardson | |
BeautifulSoupHTMLParser constructor (used by the html.parser tree builder) which lets you customize the handling of markup that contains the same attribute more than once, as in: <a href="url1" href="url2"> [bug=1878209] | |||
2020-04-25 | Try to clarify the docs further that get_text now returns human-readable text. | Leonard Richardson | |
2020-04-24 | If you encode a document with a Python-specific encoding like | Leonard Richardson | |
'unicode_escape', that encoding is no longer mentioned in the final XML or HTML document. Instead, encoding information is omitted or left blank. [bug=1874955] | |||
2020-04-21 | Fixed typo. | Leonard Richardson | |
2020-04-21 | Added two distinct UserWarning subclasses for warnings issued from the ↵ | Leonard Richardson | |
BeautifulSoup constructor which a caller may want to filter out. [bug=1873787] | |||
2020-04-12 | Fixed test failures when run against soupselect 2.0. Patch by Tomáš | Leonard Richardson | |
Chvátal. [bug=1872279] | |||
2020-04-07 | Add Script, Stylesheet, and TemplateString to the 'bs4' namespace. | Leonard Richardson | |
2020-04-07 | Added a notice about the new behavior of .text to the documentation. | Leonard Richardson | |
2020-04-05 | Set up a different soupsieve dependency for Python 2. | Leonard Richardson | |
2020-04-05 | Embedded CSS and Javascript is now stored in distinct Stylesheet and | Leonard Richardson | |
Script tags, which are ignored by methods like get_text(). This feature is not supported by the html5lib treebuilder. [bug=1868861] | |||
2020-04-04 | Use an :rtype: reported to work in pycharm. | Leonard Richardson | |
2020-04-04 | select() always returns a Tag, so be more specific about its return type. | Leonard Richardson | |
2020-04-04 | Added a Russian translation by 'authoress' to the repository. | Leonard Richardson | |
2020-04-04 | Corrected error in Chinese translation, found by "One J". | Leonard Richardson | |
2020-03-10 | Fixed a bug that happened when passing a Unicode filename containing | Leonard Richardson | |
non-ASCII characters as markup into Beautiful Soup, on a system that allows Unicode filenames. [bug=1866717] | |||
2020-03-09 | Make find() methods return a union type of the two most common PageElements, ↵ | Leonard Richardson | |
rather than PageElement itself. | |||
2020-03-06 | Added a paragraph about the fact that prettify() adds whitespace to a document. | Leonard Richardson | |
2020-03-05 | Added a performance optimization to PageElement.extract(). Patch by Arthur ↵ | Leonard Richardson | |
Darcet. | |||
2020-01-22 | Merging in request 377978 | Leonard Richardson | |
2020-01-23 | Fix a confusing typo in the description of formatter="html5". | Colin Watson | |