beautifulsoup.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2021-10-24	Issue a warning when an HTML parser is used to parse a document that	Leonard Richardson
	looks like XML but not XHTML. [bug=1939121]
2021-10-24	Used a warning to formally deprecate the 'text' argument in favor of 'string'.	Leonard Richardson

2021-10-23	Renamed the 'text' field to 'string' for real. Tests are not changed in this ↵	Leonard Richardson
	commit to demonstrate that the renaming doesn't break anything. [bug=1947038]
2021-10-23	Added a workaround for an lxml bug ↵	Leonard Richardson
	(https://bugs.launchpad.net/lxml/+bug/1948551) that caused problems when parsing a Unicode string beginning with BYTE ORDER MARK. [bug=1947768]
2021-10-23	Fixed a crash when overriding multi_valued_attributes and using the	Leonard Richardson
	html5lib parser. [bug=1948488]
2021-10-11	Added special string classes, RubyParenthesisString and RubyTextString,	Leonard Richardson
	to make it possible to treat ruby text specially in get_text() calls. [bug=1941980]
2021-09-12	Ported unit tests to use pytest.	Leonard Richardson

2021-09-07	Goodbye, Python 2. [bug=1942919]	Leonard Richardson

2021-06-01	The 'replace_with()' method now takes a variable number of arguments,	Leonard Richardson
	and can be used to replace a single element with a sequence of elements. Patch by Bill Chandos.
2021-05-31	The html.parser tree builder can now handles named entities	Leonard Richardson
	found in the HTML5 spec in much the same way that the html5lib tree builder does. Note that the lxml tree builder still handles named entities differently. [bug=1924908]
2021-02-14	NavigableString and its subclasses now implement the get_text()	Leonard Richardson
	method, as well as the properties .strings and .stripped_strings. These methods will either return the string itself, or nothing, so the only reason to use this is when iterating over a list of mixed Tag and NavigableString objects. [bug=1904309]
2021-02-14	The 'html5' formatter now treats attributes whose values are the	Leonard Richardson
	empty string as HTML boolean attributes. Previously (and in other formatters), an attribute value must be set as None to be treated as a boolean attribute. In a future release, I plan to also give this behavior to the 'html' formatter. Patch by Isaac Muse. [bug=1915424]
2021-02-13	The behavior of methods like .get_text() and .strings now differs	Leonard Richardson
	depending on the type of tag. The change is visible with HTML tags like <script>, <style>, and <template>. Starting in 4.9.0, methods like get_text() returned no results on such tags, because the contents of those tags are not considered 'text' within the document as a whole. But a user who calls script.get_text() is working from a different definition of 'text' than a user who calls div.get_text()--otherwise there would be no need to call script.get_text() at all. In 4.10.0, the contents of (e.g.) a <script> tag are considered 'text' during a get_text() call on the tag itself, but not considered 'text' during a get_text() call on the tag's parent. Because of this change, calling get_text() on each child of a tag may now return a different result than calling get_text() on the tag itself. That's because different tags now have different understandings of what counts as 'text'. [bug=1906226] [bug=1868861]
2021-02-13	Corrected the use of special string container classes in cases when a	Leonard Richardson
	single tag may contain strings with different containers; such as the <template> tag, which may contain both TemplateString objects and Comment objects. [bug=1913406]
2021-02-13	Added a second way to pass specify encodings to UnicodeDammit and	Leonard Richardson
	EncodingDetector, based on the order of precedence defined in the HTML5 spec, starting at: https://html.spec.whatwg.org/multipage/parsing.html#parsing-with-a-known-character-encoding Encodings in 'known_definite_encodings' are tried first, then byte-order-mark sniffing is run, then encodings in 'user_encodings' are tried. The old argument, 'override_encodings', is now a deprecated alias for 'known_definite_encodings'. This changes the default behavior of the html.parser and lxml tree builders, in a way that may slightly improve encoding detection but will probably have no effect. [bug=1889014]
2021-02-13	Performance improvement when processing tags that speeds up overall	Leonard Richardson
	tree construction by 2%. Patch by Morotti. [bug=1899358]
2021-02-13	Improve the warning issued when a directory name (as opposed to	Leonard Richardson
	the name of a regular file) is passed as markup into the BeautifulSoup constructor. [bug=1913628]
2021-02-13	Corrected output when the namespace prefix associated with a	Leonard Richardson
	namespaced attribute is the empty string, as opposed to None. [bug=1915583]
2020-10-03	Prepare for release.	Leonard Richardson

2020-10-02	Implemented a significant performance optimization to the process of	Leonard Richardson
	searching the parse tree. Patch by Morotti. [bug=1898212]
2020-09-26	Increment version number.	Leonard Richardson

2020-09-26	Fixed a bug that inconsistently moved elements over when passing	Leonard Richardson
	a Tag, rather than a list, into Tag.extend(). [bug=1885710]
2020-09-26	Change the signatures for BeautifulSoup.insert_before and insert_after	Leonard Richardson
	(which are not implemented) to match PageElement.insert_before and insert_after, quieting warnings in some IDEs. [bug=1897120]
2020-08-31	Specify the soupsieve dependency in a way that complies with	Leonard Richardson
	PEP 508. Patch by Mike Nerone. [bug=1893696]
2020-05-30	Fixed a bug that caused too many tags to be popped from the tag	Leonard Richardson
	stack during tree building, when encountering a closing tag that had no matching opening tag. [bug=1880420]
2020-05-17	Prep for release.	Leonard Richardson

2020-05-17	Documented some recently added customization features.	Leonard Richardson

2020-05-17	Added a keyword argument on_duplicate_attribute to the	Leonard Richardson
	BeautifulSoupHTMLParser constructor (used by the html.parser tree builder) which lets you customize the handling of markup that contains the same attribute more than once, as in: <a href="url1" href="url2"> [bug=1878209]
2020-04-24	If you encode a document with a Python-specific encoding like	Leonard Richardson
	'unicode_escape', that encoding is no longer mentioned in the final XML or HTML document. Instead, encoding information is omitted or left blank. [bug=1874955]
2020-04-21	Fixed typo.	Leonard Richardson

2020-04-21	Added two distinct UserWarning subclasses for warnings issued from the ↵	Leonard Richardson
	BeautifulSoup constructor which a caller may want to filter out. [bug=1873787]
2020-04-12	Fixed test failures when run against soupselect 2.0. Patch by Tomáš	Leonard Richardson
	Chvátal. [bug=1872279]
2020-04-07	Added a notice about the new behavior of .text to the documentation.	Leonard Richardson

2020-04-05	Embedded CSS and Javascript is now stored in distinct Stylesheet and	Leonard Richardson
	Script tags, which are ignored by methods like get_text(). This feature is not supported by the html5lib treebuilder. [bug=1868861]
2020-04-04	Added a Russian translation by 'authoress' to the repository.	Leonard Richardson

2020-03-10	Fixed a bug that happened when passing a Unicode filename containing	Leonard Richardson
	non-ASCII characters as markup into Beautiful Soup, on a system that allows Unicode filenames. [bug=1866717]
2020-03-05	Added a performance optimization to PageElement.extract(). Patch by Arthur ↵	Leonard Richardson
	Darcet.
2020-01-01	API CHANGE - Added PageElement.decomposed, a new property which lets you	Leonard Richardson
	check whether you've already called decompose() on a Tag or NavigableString.
2019-12-29	Fixed an unhandled exception when formatting a Tag that had been ↵	Leonard Richardson
	decomposed.[bug=1857767]
2019-12-24	Added docstrings for some but not all tree buidlers.	Leonard Richardson

2019-12-24	Wrote docstrings for formatter.py.	Leonard Richardson

2019-12-24	Fixed deprecation warning. [bug=1855301]	Leonard Richardson

2019-12-18	Added Python docstrings to all public methods in element.py.	Leonard Richardson

2019-11-11	The html.parser tree builder now correctly handles DOCTYPEs that are	Leonard Richardson
	not uppercase. [bug=1848401]
2019-11-11	Fixed a deprecation warning on Python 3.7. Patch by Colin	Leonard Richardson
	Watson. [bug=1847592]
2019-10-06	Added section on Python 2 sunsetting.	Leonard Richardson

2019-10-05	Avoid a crash when unpickling certain parse trees generated using html5lib ↵	Leonard Richardson
	on Python 3. [bug=1843545]
2019-09-02	Avoid a crash when trying to detect the declared encoding of a	Leonard Richardson
	Unicode document. Raise an explanatory exception when the underlying parser completely rejects the incoming markup. [bug=1838877]
2019-08-26	Fixed the definition of the default XML namespace when using	Leonard Richardson
	lxml 4.4. Patch by Isaac Muse. [bug=1840141]
2019-08-21	When instantiating a BeautifulSoup object, it's now possible to	Leonard Richardson
	provide replacement classes to be instantiated for every tag ('tag_class') or string ('string_class') encountered during parsing, rather than using the default Tag and NavigableString objects.