beautifulsoup.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2021-12-19	Remove a huge list of HTML entities that was only necessary under Python 2.	Leonard Richardson

2021-12-19	Removed support for the iconv_codec library, which doesn't seem	Leonard Richardson
	to exist anymore and was never put up on PyPI. (The closest replacement on PyPI, iconv_codecs, is GPL-licensed, so we can't use it.)
2021-12-19	If the charset-normalizer Python module	Leonard Richardson
	(https://pypi.org/project/charset-normalizer/) is installed, Beautiful Soup will use it to detect the character sets of incoming documents. This is also the module used by newer versions of the Requests library. For the sake of backwards compatibility, chardet and cchardet both take precedence if installed. [bug=1955346]
2021-09-07	Goodbye, Python 2. [bug=1942919]	Leonard Richardson

2021-05-31	The html.parser tree builder can now handles named entities	Leonard Richardson
	found in the HTML5 spec in much the same way that the html5lib tree builder does. Note that the lxml tree builder still handles named entities differently. [bug=1924908]
2021-02-13	Added a second way to pass specify encodings to UnicodeDammit and	Leonard Richardson
	EncodingDetector, based on the order of precedence defined in the HTML5 spec, starting at: https://html.spec.whatwg.org/multipage/parsing.html#parsing-with-a-known-character-encoding Encodings in 'known_definite_encodings' are tried first, then byte-order-mark sniffing is run, then encodings in 'user_encodings' are tried. The old argument, 'override_encodings', is now a deprecated alias for 'known_definite_encodings'. This changes the default behavior of the html.parser and lxml tree builders, in a way that may slightly improve encoding detection but will probably have no effect. [bug=1889014]
2020-05-17	Switch entirely to Python 3-style print statements, even in Python 2.	Leonard Richardson

2019-12-24	Fixed deprecation warning. [bug=1855301]	Leonard Richardson

2019-12-24	Added docstrings to all public methods in dammit.py.	Leonard Richardson

2019-09-02	Avoid a crash when trying to detect the declared encoding of a	Leonard Richardson
	Unicode document. Raise an explanatory exception when the underlying parser completely rejects the incoming markup. [bug=1838877]
2019-07-07	' (which is valid in XML and XHTML, but not HTML 4) is now	Leonard Richardson
	recognized as a named entity and converted to a single quote. [bug=1818721]
2018-12-24	Clarified the software license.	Leonard Richardson

2018-07-14	Fixed code that was causing deprecation warnings in recent Python 3	Leonard Richardson
	versions. Includes a patch from Ville Skyttä. [bug=1778909] [bug=1689496]
2016-12-19	Indentation change contributed by Pranav Salunke.	Leonard Richardson

2016-07-17	Use a dedicated logger instead of the root logger. [bug=1511661]	Leonard Richardson

2016-07-17	Use a dedicated logger instead of the root logger. [bug=1511661]	Leonard Richardson

2016-07-16	Removed imports to pdb, since pdb is not available in some environments. ↵	Leonard Richardson
	[bug=1491700]
2016-07-16	Rename COPYING.txt to LICENSE. Add a reference to LICENSE in every source file.	Leonard Richardson

2016-04-06	Minor change. Extra indent for character so it looks nicer.	Pranav Salunke

2015-09-28	Add a __license__ statement to all source files.	Leonard Richardson

2015-07-03	Unicode data cannot have a byte-order mark. Returning early stops a warning ↵	Leonard Richardson
	from happening.
2015-06-27	Added an exclude_encodings argument to UnicodeDammit and to the	Leonard Richardson
	Beautiful Soup constructor, which lets you prohibit the detection of an encoding that you know is wrong. [bug=1469408]
2015-06-26	Added a sanity check helper method that makes sure all the elements of a ↵	Leonard Richardson
	tree are properly connected via .next_element and .previous_element.
2015-06-25	Fixed a crash in Unicode, Dammit's encoding detector when the name	Leonard Richardson
	of the encoding itself contained invalid bytes. [bug=1360913]
2013-10-02	Fixed a bug that caused Unicode data put into UnicodeDammit to	Leonard Richardson
	return None instead of the original data. [bug=1214983]
2013-06-03	Inlined some commonly called code to save a function call.	Leonard Richardson

2013-06-03	Limit how much of the document is searched via regular expression for a ↵	Leonard Richardson
	declared encoding.
2013-06-02	Turns out we had two bits of code to strip byte-order marks.	Leonard Richardson

2013-06-02	It turns out most of the untested code wasn't doing anything useful.	Leonard Richardson

2013-05-31	Create a new lxml parser object for every new parsing strategy.	Leonard Richardson

2013-05-30	Refactored code a bit.	Leonard Richardson

2013-05-30	Split out the code that guesses at encodings from the code that tries to ↵	Leonard Richardson
	decode a bytestring based on those encodings. This is necessary because lxml wants to do the decoding itself.
2013-05-20	The default XML formatter will now replace ampersands even if they appear to ↵	Leonard Richardson
	be part of entities. That is, "<" will become "&lt;".[bug=1182183]
2012-11-03	Doc fixes.	Leonard Richardson

2012-08-17	Fixed cchardet import.	Leonard Richardson

2012-07-03	Mentioned cchardet in docs.	Leonard Richardson

2012-07-03	When sniffing encodings, if the cchardet library is installed, use it ↵	Leonard Richardson
	instead of chardet. It's much faster. [bug=1020748]
2012-07-03	Use logging.warning() instead of warning.warn() to notify the user that ↵	Leonard Richardson
	characters were replaced with REPLACEMENT CHARACTER. [bug=1013862]
2012-05-24	Comments, processing instructions, document type declarations, and markup ↵	Leonard Richardson
	declarations are now treated as preformatted strings, the way CData blocks are. [bug=1001025] Also in this commit: renamed detwingle method to detwingle().
2012-05-03	Fixed the handling of " with the built-in parser. [bug=993871]	Leonard Richardson

2012-04-27	Added experimental support for fixing Windows-1252 characters embedded in ↵	Leonard Richardson
	UTF-8 documents.
2012-04-26	Fixed a bug in decoding data that contained a byte-order mark, such as data ↵	Leonard Richardson
	encoded in UTF-16LE. [bug=988980]
2012-04-16	Unicode, Dammit now has an option to turn MS smart quotes into ASCII characters.	Leonard Richardson

2012-04-16	Attribute values are now run through the provided output formatter. ↵	Leonard Richardson
	Previously they were always run through the 'minimal' formatter. [bug=980237]
2012-02-16	Issue a warning if characters were replaced with REPLACEMENT CHARACTER ↵	Leonard Richardson
	during Unicode conversion.
2012-02-09	As a last-ditch attempt to turn data into Unicode, use errors=replace ↵	Leonard Richardson
	instead of errors=strict.
2012-02-09	Unicode, Dammit now detects the encoding in HTML 5-style <meta> tags like ↵	Leonard Richardson
	<meta charset="utf-8" />. [bug=837268]
2012-02-09	Minor Unicode, Dammit cleanup.	Leonard Richardson

2012-02-09	Improved Unicode, Dammit's behavior when you give it Unicode to begin with.	Leonard Richardson

2011-06-29	Various changes so most tests pass on Python 3.	Thomas Kluyver