beautifulsoup.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2023-06-04	Fixed a case found by Mengyuhan where html.parser giving up on	Leonard Richardson
	markup would result in an AssertionError instead of a ParserRejectedMarkup exception.
2023-02-15	When the html.parser parser decides it can't parse a document, Beautiful	Leonard Richardson
	Soup now consistently propagates this fact by raising a ParserRejectedMarkup error. [bug=2007343]
2023-01-27	Got rid of some more warnings by removing code that's not relevant anymore, ↵	Leonard Richardson
	now that the minimum supported Python version is 3.6.
2023-01-27	Warnings now do their best to provide an appropriate stacklevel,	Leonard Richardson
	improving the usefulness of the message. [bug=1978744]
2021-10-24	Issue a warning when an HTML parser is used to parse a document that	Leonard Richardson
	looks like XML but not XHTML. [bug=1939121]
2021-09-07	Goodbye, Python 2. [bug=1942919]	Leonard Richardson

2021-05-31	The html.parser tree builder can now handles named entities	Leonard Richardson
	found in the HTML5 spec in much the same way that the html5lib tree builder does. Note that the lxml tree builder still handles named entities differently. [bug=1924908]
2021-02-13	Added a second way to pass specify encodings to UnicodeDammit and	Leonard Richardson
	EncodingDetector, based on the order of precedence defined in the HTML5 spec, starting at: https://html.spec.whatwg.org/multipage/parsing.html#parsing-with-a-known-character-encoding Encodings in 'known_definite_encodings' are tried first, then byte-order-mark sniffing is run, then encodings in 'user_encodings' are tried. The old argument, 'override_encodings', is now a deprecated alias for 'known_definite_encodings'. This changes the default behavior of the html.parser and lxml tree builders, in a way that may slightly improve encoding detection but will probably have no effect. [bug=1889014]
2020-05-30	Remove explicit reference to the module name within the module, replacing it ↵	Leonard Richardson
	with __name__.
2020-05-17	Switch entirely to Python 3-style print statements, even in Python 2.	Leonard Richardson

2020-05-17	Documented some recently added customization features.	Leonard Richardson

2020-05-17	Added a keyword argument on_duplicate_attribute to the	Leonard Richardson
	BeautifulSoupHTMLParser constructor (used by the html.parser tree builder) which lets you customize the handling of markup that contains the same attribute more than once, as in: <a href="url1" href="url2"> [bug=1878209]
2019-12-24	Added docstrings for some but not all tree buidlers.	Leonard Richardson

2019-11-11	Simplified code.	Leonard Richardson

2019-11-11	The html.parser tree builder now correctly handles DOCTYPEs that are	Leonard Richardson
	not uppercase. [bug=1848401]
2019-07-21	Implemented line number tracking for html5lib.	Leonard Richardson

2019-07-21	Adapt Chris Mayo's code to track line number and position when using ↵	Leonard Richardson
	html.parser.
2019-07-07	It's now possible to override a TreeBuilder's cdata_list_attributes ↵	Leonard Richardson
	dictionary by passing in a replacement. None will disable the feature altogether. [bug=1832978]
2018-12-24	Clarified the software license.	Leonard Richardson

2018-07-28	Correctly handle invalid HTML numeric character entities like	Leonard Richardson
	which reference code points that are not Unicode code points. Note that this is only fixed when Beautiful Soup is used with the html.parser parser -- html5lib already worked and I couldn't fix it with lxml. [bug=1782933]
2018-07-21	Fixed a problem where the html.parser tree builder interpreted	Leonard Richardson
	a string like '&foo ' as the character entity '&foo;' [bug=1728706]
2018-07-15	Stop data loss when encountering an empty numeric entity, and	Leonard Richardson
	possibly in other cases. Thanks to tos.kamiya for the fix. [bug=1698503]
2018-07-14	Stopped HTMLParser from raising an exception in very rare cases of	Leonard Richardson
	bad markup. [bug=1708831]
2017-05-06	Improved the handling of empty-element tags like <br> when using the	Leonard Richardson
	html.parser parser. [bug=1676935]
2016-07-16	Removed imports to pdb, since pdb is not available in some environments. ↵	Leonard Richardson
	[bug=1491700]
2016-07-16	Added a separate class for XML processing instructions, which have a ↵	Leonard Richardson
	slightly different format from SGML processing instructions. [bug=1504383]
2016-07-16	Rename COPYING.txt to LICENSE. Add a reference to LICENSE in every source file.	Leonard Richardson

2015-06-28	It's now possible to pickle a BeautifulSoup object no matter which	Leonard Richardson
	tree builder was used to create it. However, the only tree builder that survives the pickling process is the HTMLParserTreeBuilder ('html.parser'). If you unpickle a BeautifulSoup object created with some other tree builder, soup.builder will be None. [bug=1231545]
2015-06-27	Added an exclude_encodings argument to UnicodeDammit and to the	Leonard Richardson
	Beautiful Soup constructor, which lets you prohibit the detection of an encoding that you know is wrong. [bug=1469408]
2015-06-24	Fixed an import error in Python 3.5 caused by the removal of the	Leonard Richardson

2015-06-24	Made double sure that we don't use the 'strict' constructor argument when ↵	Leonard Richardson
	it's deprecated. [bug=1341055]
2014-12-11	Improved the lxml tree builder's handling of processing	Leonard Richardson
	instructions. [bug=1294645]
2014-12-07	In Python 3.4 and above, set the new convert_charrefs argument to	Leonard Richardson
	the html.parser constructor to avoid a warning and future failures. Patch by Stefano Revera. [bug=1375721]
2014-12-07	Issue a warning if the BeautifulSoup constructor arguments do not explicitly ↵	Leonard Richardson
	name a parser.
2013-10-01	Fixed a bug in which short Unicode input was improperly encoded to ASCII ↵	Leonard Richardson
	when checking whether or not it was a file on disk. [bug=1227016]
2013-06-02	Merged in big encoding-detection refactoring branch.	Leonard Richardson

2013-05-31	The html.parser treebuilder can now handle numeric attributes in	Leonard Richardson
	text when the hexidecimal name of the attribute starts with a capital X.
2013-05-31	Create a new lxml parser object for every new parsing strategy.	Leonard Richardson

2013-05-07	Now that lxml's segfault on invalid doctype has been fixed, fix a	Leonard Richardson
	corresponding problem on the Beautiful Soup end that was previously invisible. [bug=984936]
2012-04-18	Changed wording slightly.	Leonard Richardson

2012-04-18	Print a warning on HTMLParseErrors to let people know they should install an ↵	Leonard Richardson
	external parser.
2012-04-18	Fixed a bug that made the HTMLParser treebuilder generate XML definitions ↵	Leonard Richardson
	ending with two question marks instead of one. [bug=984258]
2012-02-21	Added nsprefix argument to the tag class.	Leonard Richardson

2012-02-21	Merged from trunk.	Leonard Richardson

2012-02-20	It's now possible to copy a BeautifulSoup object created with the ↵	Leonard Richardson
	html.parser treebuilder.
2012-02-20	Changd the class structure so that the default parser test class uses ↵	Leonard Richardson
	html.parser.
2012-02-16	It's a start, at least.	Leonard Richardson

2012-02-09	As a last-ditch attempt to turn data into Unicode, use errors=replace ↵	Leonard Richardson
	instead of errors=strict.
2012-02-09	Minor Unicode, Dammit cleanup.	Leonard Richardson

2012-02-06	Monkeypatch Python 3.2 versions prior to 3.2.3 to solve some major ↵	Leonard Richardson
	HTMLParser bugs.