beautifulsoup.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2021-10-09	Moved testing.py into the same package as the tests.	Leonard Richardson

2021-09-12	Ported unit tests to use pytest.	Leonard Richardson

2021-09-07	Goodbye, Python 2. [bug=1942919]	Leonard Richardson

2021-06-01	The 'replace_with()' method now takes a variable number of arguments,	Leonard Richardson
	and can be used to replace a single element with a sequence of elements. Patch by Bill Chandos.
2021-05-31	The html.parser tree builder can now handles named entities	Leonard Richardson
	found in the HTML5 spec in much the same way that the html5lib tree builder does. Note that the lxml tree builder still handles named entities differently. [bug=1924908]
2021-04-08	Brought in fuzz tests from the oss-project into Beautiful Soup's unit test ↵	Leonard Richardson
	suite.
2021-02-14	NavigableString and its subclasses now implement the get_text()	Leonard Richardson
	method, as well as the properties .strings and .stripped_strings. These methods will either return the string itself, or nothing, so the only reason to use this is when iterating over a list of mixed Tag and NavigableString objects. [bug=1904309]
2021-02-14	The 'html5' formatter now treats attributes whose values are the	Leonard Richardson
	empty string as HTML boolean attributes. Previously (and in other formatters), an attribute value must be set as None to be treated as a boolean attribute. In a future release, I plan to also give this behavior to the 'html' formatter. Patch by Isaac Muse. [bug=1915424]
2021-02-13	The behavior of methods like .get_text() and .strings now differs	Leonard Richardson
	depending on the type of tag. The change is visible with HTML tags like <script>, <style>, and <template>. Starting in 4.9.0, methods like get_text() returned no results on such tags, because the contents of those tags are not considered 'text' within the document as a whole. But a user who calls script.get_text() is working from a different definition of 'text' than a user who calls div.get_text()--otherwise there would be no need to call script.get_text() at all. In 4.10.0, the contents of (e.g.) a <script> tag are considered 'text' during a get_text() call on the tag itself, but not considered 'text' during a get_text() call on the tag's parent. Because of this change, calling get_text() on each child of a tag may now return a different result than calling get_text() on the tag itself. That's because different tags now have different understandings of what counts as 'text'. [bug=1906226] [bug=1868861]
2021-02-13	Corrected the use of special string container classes in cases when a	Leonard Richardson
	single tag may contain strings with different containers; such as the <template> tag, which may contain both TemplateString objects and Comment objects. [bug=1913406]
2021-02-13	Added a second way to pass specify encodings to UnicodeDammit and	Leonard Richardson
	EncodingDetector, based on the order of precedence defined in the HTML5 spec, starting at: https://html.spec.whatwg.org/multipage/parsing.html#parsing-with-a-known-character-encoding Encodings in 'known_definite_encodings' are tried first, then byte-order-mark sniffing is run, then encodings in 'user_encodings' are tried. The old argument, 'override_encodings', is now a deprecated alias for 'known_definite_encodings'. This changes the default behavior of the html.parser and lxml tree builders, in a way that may slightly improve encoding detection but will probably have no effect. [bug=1889014]
2021-02-13	Improve the warning issued when a directory name (as opposed to	Leonard Richardson
	the name of a regular file) is passed as markup into the BeautifulSoup constructor. [bug=1913628]
2021-02-13	Corrected output when the namespace prefix associated with a	Leonard Richardson
	namespaced attribute is the empty string, as opposed to None. [bug=1915583]
2020-09-26	Fixed a bug that inconsistently moved elements over when passing	Leonard Richardson
	a Tag, rather than a list, into Tag.extend(). [bug=1885710]
2020-05-17	Documented some recently added customization features.	Leonard Richardson

2020-05-17	Added a keyword argument on_duplicate_attribute to the	Leonard Richardson
	BeautifulSoupHTMLParser constructor (used by the html.parser tree builder) which lets you customize the handling of markup that contains the same attribute more than once, as in: <a href="url1" href="url2"> [bug=1878209]
2020-04-21	Added two distinct UserWarning subclasses for warnings issued from the ↵	Leonard Richardson
	BeautifulSoup constructor which a caller may want to filter out. [bug=1873787]
2020-04-12	Fixed test failures when run against soupselect 2.0. Patch by Tomáš	Leonard Richardson
	Chvátal. [bug=1872279]
2020-04-05	Embedded CSS and Javascript is now stored in distinct Stylesheet and	Leonard Richardson
	Script tags, which are ignored by methods like get_text(). This feature is not supported by the html5lib treebuilder. [bug=1868861]
2020-01-01	API CHANGE - Added PageElement.decomposed, a new property which lets you	Leonard Richardson
	check whether you've already called decompose() on a Tag or NavigableString.
2019-12-29	Fixed an unhandled exception when formatting a Tag that had been ↵	Leonard Richardson
	decomposed.[bug=1857767]
2019-10-05	Avoid a crash when unpickling certain parse trees generated using html5lib ↵	Leonard Richardson
	on Python 3. [bug=1843545]
2019-09-02	Avoid a crash when trying to detect the declared encoding of a	Leonard Richardson
	Unicode document. Raise an explanatory exception when the underlying parser completely rejects the incoming markup. [bug=1838877]
2019-08-26	It's now possible to override any of the element classes.	Leonard Richardson

2019-08-22	Test the ability to build a tree using objects other than Tag and ↵	Leonard Richardson
	NavigableString.
2019-08-21	Copying a Tag preserves information that was originally obtained from	Leonard Richardson
	the TreeBuilder used to build the original Tag. [bug=1838903]
2019-08-21	Fixed a crash when pretty-printing tags that were not created	Leonard Richardson
	during initial parsing. [bug=1838903]
2019-07-21	Implemented line number tracking for html5lib.	Leonard Richardson

2019-07-21	Adapt Chris Mayo's code to track line number and position when using ↵	Leonard Richardson
	html.parser.
2019-07-16	Suppressed warnings during tests that aren't about the warnings.	Leonard Richardson

2019-07-15	Implemented Tag.smooth.	Leonard Richardson

2019-07-15	Moved the formatter to its own class and updated its documentation.	Leonard Richardson

2019-07-15	Improved comments in tests.	Leonard Richardson

2019-07-14	Give the Formatter class more control over formatting decisions.	Leonard Richardson

2019-07-07	A Formatter can now decide how (or whether) to order the attributes	Leonard Richardson
	inside a tag. [bug=1812422]
2019-07-07	' (which is valid in XML and XHTML, but not HTML 4) is now	Leonard Richardson
	recognized as a named entity and converted to a single quote. [bug=1818721]
2019-07-07	Renamed the cdata_list_attributes argument to multi_valued_attributes since ↵	Leonard Richardson
	it's facing the end-user and that's a more easily understandable name.
2019-07-07	It's now possible to override a TreeBuilder's cdata_list_attributes ↵	Leonard Richardson
	dictionary by passing in a replacement. None will disable the feature altogether. [bug=1832978]
2019-07-07	It's now possible to customize the TreeBuilder object by passing	Leonard Richardson
	keyword arguments into the BeautifulSoup constructor. The main reason to do this right now is to change how multi-valued attributes are treated. [bug=1832978]
2019-01-06	Fixed an incorrectly raised exception when inserting a tag before or	Leonard Richardson
	after an identical tag. [bug=1810692]
2019-01-06	Don't track un-prefixed namespaces	Isaac Muse

2018-12-31	Improved and tested error checking for insert_before and insert_after.	Leonard Richardson

2018-12-30	Add convienances for inserting multiple tags	Isaac Muse
	Add extend method to append a list of tags. Make insert_before and insert_after accept multiple arguments
2018-12-23	Merging Isaac Muse's Soup Sieve branch as-is before making some modifications.	Leonard Richardson

2018-12-22	Fix next and previous linkage issues. Fixes issues #1806598 and #1782928.	Isaac Muse

2018-12-19	Add Soup Sieve support	Isaac Muse

2018-07-30	Fix an exception when a custom formatter was asked to format a void	Leonard Richardson
	element. [bug=1784408]
2018-07-28	When markup contains duplicate elements, a select() call that	Leonard Richardson
	includes multiple match clauses will match all relevant elements. [bug=1770596]
2018-07-28	Correctly handle invalid HTML numeric character entities like	Leonard Richardson
	which reference code points that are not Unicode code points. Note that this is only fixed when Beautiful Soup is used with the html.parser parser -- html5lib already worked and I couldn't fix it with lxml. [bug=1782933]
2018-07-15	You can pass a dictionary of into	Leonard Richardson
	BeautifulSoup.new_tag. This makes it possible to create a tag with an attribute like 'name' that would otherwise be masked by another argument of new_tag. [bug=1779276]