beautifulsoup.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2024-08-21	* Changes to make tests work whether tests are run under soupsieve 2.6	Leonard Richardson
	or an earlier version. Based on a patch by Stefano Rivera. * Removed the strip_cdata argument to lxml's HTMLParser constructor, which never did anything and is deprecated as of lxml 5.3.0. Patch by Stefano Rivera. [bug=2076897]
2024-02-12	Applied patch from Marc Müller to add a stacklevel to a warning that was ↵	Leonard Richardson
	missing it.
2024-01-17	Added the correct stacklevel to instances of the XMLParsedAsHTMLWarning.	Leonard Richardson
	[bug=2034451]
2023-06-04	Fixed a case found by Mengyuhan where html.parser giving up on	Leonard Richardson
	markup would result in an AssertionError instead of a ParserRejectedMarkup exception.
2023-02-15	When the html.parser parser decides it can't parse a document, Beautiful	Leonard Richardson
	Soup now consistently propagates this fact by raising a ParserRejectedMarkup error. [bug=2007343]
2023-01-27	Got rid of some more warnings by removing code that's not relevant anymore, ↵	Leonard Richardson
	now that the minimum supported Python version is 3.6.
2023-01-27	Warnings now do their best to provide an appropriate stacklevel,	Leonard Richardson
	improving the usefulness of the message. [bug=1978744]
2022-04-10	Fixed another crash when overriding multi_valued_attributes and using the	Leonard Richardson
	html5lib parser. [bug=1948488]
2021-11-29	Do a better job of keeping track of namespaces as an XML document is	Leonard Richardson
	parsed, so that CSS selectors that use namespaces will do the right thing more often. [bug=1946243]
2021-10-24	Issue a warning when an HTML parser is used to parse a document that	Leonard Richardson
	looks like XML but not XHTML. [bug=1939121]
2021-10-23	Added a workaround for an lxml bug ↵	Leonard Richardson
	(https://bugs.launchpad.net/lxml/+bug/1948551) that caused problems when parsing a Unicode string beginning with BYTE ORDER MARK. [bug=1947768]
2021-10-23	Fixed a crash when overriding multi_valued_attributes and using the	Leonard Richardson
	html5lib parser. [bug=1948488]
2021-10-11	Added special string classes, RubyParenthesisString and RubyTextString,	Leonard Richardson
	to make it possible to treat ruby text specially in get_text() calls. [bug=1941980]
2021-09-07	Goodbye, Python 2. [bug=1942919]	Leonard Richardson

2021-05-31	The html.parser tree builder can now handles named entities	Leonard Richardson
	found in the HTML5 spec in much the same way that the html5lib tree builder does. Note that the lxml tree builder still handles named entities differently. [bug=1924908]
2021-02-13	Added a second way to pass specify encodings to UnicodeDammit and	Leonard Richardson
	EncodingDetector, based on the order of precedence defined in the HTML5 spec, starting at: https://html.spec.whatwg.org/multipage/parsing.html#parsing-with-a-known-character-encoding Encodings in 'known_definite_encodings' are tried first, then byte-order-mark sniffing is run, then encodings in 'user_encodings' are tried. The old argument, 'override_encodings', is now a deprecated alias for 'known_definite_encodings'. This changes the default behavior of the html.parser and lxml tree builders, in a way that may slightly improve encoding detection but will probably have no effect. [bug=1889014]
2020-05-30	Remove explicit reference to the module name within the module, replacing it ↵	Leonard Richardson
	with __name__.
2020-05-17	Switch entirely to Python 3-style print statements, even in Python 2.	Leonard Richardson

2020-05-17	Documented some recently added customization features.	Leonard Richardson

2020-05-17	Added a keyword argument on_duplicate_attribute to the	Leonard Richardson
	BeautifulSoupHTMLParser constructor (used by the html.parser tree builder) which lets you customize the handling of markup that contains the same attribute more than once, as in: <a href="url1" href="url2"> [bug=1878209]
2020-04-05	Embedded CSS and Javascript is now stored in distinct Stylesheet and	Leonard Richardson
	Script tags, which are ignored by methods like get_text(). This feature is not supported by the html5lib treebuilder. [bug=1868861]
2019-12-24	Added docstrings for some but not all tree buidlers.	Leonard Richardson

2019-11-11	Simplified code.	Leonard Richardson

2019-11-11	The html.parser tree builder now correctly handles DOCTYPEs that are	Leonard Richardson
	not uppercase. [bug=1848401]
2019-11-11	Added a Brazilian Portuguese translation by Cezar Peixeiro.	Leonard Richardson

2019-09-02	Avoid a crash when trying to detect the declared encoding of a	Leonard Richardson
	Unicode document. Raise an explanatory exception when the underlying parser completely rejects the incoming markup. [bug=1838877]
2019-07-21	Implemented line number tracking for html5lib.	Leonard Richardson

2019-07-21	Adapt Chris Mayo's code to track line number and position when using ↵	Leonard Richardson
	html.parser.
2019-07-14	Give the Formatter class more control over formatting decisions.	Leonard Richardson

2019-07-07	Renamed the cdata_list_attributes argument to multi_valued_attributes since ↵	Leonard Richardson
	it's facing the end-user and that's a more easily understandable name.
2019-07-07	It's now possible to override a TreeBuilder's cdata_list_attributes ↵	Leonard Richardson
	dictionary by passing in a replacement. None will disable the feature altogether. [bug=1832978]
2019-01-06	Don't track un-prefixed namespaces	Isaac Muse

2018-12-30	Fixed a problem with multi-valued attributes where the value	Leonard Richardson
	contained whitespace. Thanks to Jens Svalgaard for the fix. [bug=1787453]
2018-12-24	Clarified the software license.	Leonard Richardson

2018-12-24	Keep track of the namespace abbreviations found while parsing the document. ↵	Leonard Richardson
	This makes select() work most of the time without requiring a value for 'namespaces'.
2018-12-22	Fix next and previous linkage issues. Fixes issues #1806598 and #1782928.	Isaac Muse

2018-08-12	Converted README to Markdown format.	Leonard Richardson

2018-07-28	Correctly handle invalid HTML numeric character entities like	Leonard Richardson
	which reference code points that are not Unicode code points. Note that this is only fixed when Beautiful Soup is used with the html.parser parser -- html5lib already worked and I couldn't fix it with lxml. [bug=1782933]
2018-07-21	Fixed a problem where the html.parser tree builder interpreted	Leonard Richardson
	a string like '&foo ' as the character entity '&foo;' [bug=1728706]
2018-07-18	Preserve XML namespaces when they are introduced inside an XML	Leonard Richardson
	document, not just the ones introduced at the top level. [bug=1718787]
2018-07-15	Introduced the Formatter system. [bug=1716272].	Leonard Richardson

2018-07-15	It's possible for a TreeBuilder subclass to specify that void	Leonard Richardson
	elements should be represented as <element> rather than <element/>, by setting TreeBuilder.void_element_close_prefix to the empty string. [bug=1716272]
2018-07-15	Stop data loss when encountering an empty numeric entity, and	Leonard Richardson
	possibly in other cases. Thanks to tos.kamiya for the fix. [bug=1698503]
2018-07-14	Stopped HTMLParser from raising an exception in very rare cases of	Leonard Richardson
	bad markup. [bug=1708831]
2017-05-06	Improved the handling of empty-element tags like <br> when using the	Leonard Richardson
	html.parser parser. [bug=1676935]
2017-05-06	HTML parsers treat all HTML4 and HTML5 empty element tags (aka void element ↵	Leonard Richardson
	tags) correctly. [bug=1656909]
2016-12-19	Fixed foster parenting when html5lib is the tree builder. Thanks to Geoffrey ↵	Leonard Richardson
	Sneddon for a patch and test.
2016-12-19	Fixed yet another problem that caused the html5lib tree builder to	Leonard Richardson

2016-07-30	Explained why we test both unicode and bytestring processing instructions.	Leonard Richardson

2016-07-26	Fixed a reported (but not duplicated) bug involving processing instructions ↵	Leonard Richardson
	fed into the lxml HTML parser.