Age | Commit message (Collapse) | Author | |
---|---|---|---|
2024-02-12 | Applied patch from Marc Müller to add a stacklevel to a warning that was ↵ | Leonard Richardson | |
missing it. | |||
2024-01-17 | Added the correct stacklevel to instances of the XMLParsedAsHTMLWarning. | Leonard Richardson | |
[bug=2034451] | |||
2022-04-10 | Fixed another crash when overriding multi_valued_attributes and using the | Leonard Richardson | |
html5lib parser. [bug=1948488] | |||
2021-10-24 | Issue a warning when an HTML parser is used to parse a document that | Leonard Richardson | |
looks like XML but not XHTML. [bug=1939121] | |||
2021-10-11 | Added special string classes, RubyParenthesisString and RubyTextString, | Leonard Richardson | |
to make it possible to treat ruby text specially in get_text() calls. [bug=1941980] | |||
2021-09-07 | Goodbye, Python 2. [bug=1942919] | Leonard Richardson | |
2021-02-13 | Added a second way to pass specify encodings to UnicodeDammit and | Leonard Richardson | |
EncodingDetector, based on the order of precedence defined in the HTML5 spec, starting at: https://html.spec.whatwg.org/multipage/parsing.html#parsing-with-a-known-character-encoding Encodings in 'known_definite_encodings' are tried first, then byte-order-mark sniffing is run, then encodings in 'user_encodings' are tried. The old argument, 'override_encodings', is now a deprecated alias for 'known_definite_encodings'. This changes the default behavior of the html.parser and lxml tree builders, in a way that may slightly improve encoding detection but will probably have no effect. [bug=1889014] | |||
2020-05-30 | Remove explicit reference to the module name within the module, replacing it ↵ | Leonard Richardson | |
with __name__. | |||
2020-05-17 | Switch entirely to Python 3-style print statements, even in Python 2. | Leonard Richardson | |
2020-04-05 | Embedded CSS and Javascript is now stored in distinct Stylesheet and | Leonard Richardson | |
Script tags, which are ignored by methods like get_text(). This feature is not supported by the html5lib treebuilder. [bug=1868861] | |||
2019-12-24 | Added docstrings for some but not all tree buidlers. | Leonard Richardson | |
2019-09-02 | Avoid a crash when trying to detect the declared encoding of a | Leonard Richardson | |
Unicode document. Raise an explanatory exception when the underlying parser completely rejects the incoming markup. [bug=1838877] | |||
2019-07-21 | Adapt Chris Mayo's code to track line number and position when using ↵ | Leonard Richardson | |
html.parser. | |||
2019-07-14 | Give the Formatter class more control over formatting decisions. | Leonard Richardson | |
2019-07-07 | Renamed the cdata_list_attributes argument to multi_valued_attributes since ↵ | Leonard Richardson | |
it's facing the end-user and that's a more easily understandable name. | |||
2019-07-07 | It's now possible to override a TreeBuilder's cdata_list_attributes ↵ | Leonard Richardson | |
dictionary by passing in a replacement. None will disable the feature altogether. [bug=1832978] | |||
2018-12-30 | Fixed a problem with multi-valued attributes where the value | Leonard Richardson | |
contained whitespace. Thanks to Jens Svalgaard for the fix. [bug=1787453] | |||
2018-12-24 | Clarified the software license. | Leonard Richardson | |
2018-12-24 | Keep track of the namespace abbreviations found while parsing the document. ↵ | Leonard Richardson | |
This makes select() work most of the time without requiring a value for 'namespaces'. | |||
2018-08-12 | Converted README to Markdown format. | Leonard Richardson | |
2018-07-15 | Introduced the Formatter system. [bug=1716272]. | Leonard Richardson | |
2018-07-15 | It's possible for a TreeBuilder subclass to specify that void | Leonard Richardson | |
elements should be represented as <element> rather than <element/>, by setting TreeBuilder.void_element_close_prefix to the empty string. [bug=1716272] | |||
2017-05-06 | HTML parsers treat all HTML4 and HTML5 empty element tags (aka void element ↵ | Leonard Richardson | |
tags) correctly. [bug=1656909] | |||
2016-07-16 | The contents of <textarea> tags will no longer be modified when the | Leonard Richardson | |
tree is prettified. [bug=1555829] | |||
2016-07-16 | Rename COPYING.txt to LICENSE. Add a reference to LICENSE in every source file. | Leonard Richardson | |
2015-06-28 | It's now possible to pickle a BeautifulSoup object no matter which | Leonard Richardson | |
tree builder was used to create it. However, the only tree builder that survives the pickling process is the HTMLParserTreeBuilder ('html.parser'). If you unpickle a BeautifulSoup object created with some other tree builder, soup.builder will be None. [bug=1231545] | |||
2014-12-07 | Tweaked the parser warning. | Leonard Richardson | |
2014-12-07 | Issue a warning if the BeautifulSoup constructor arguments do not explicitly ↵ | Leonard Richardson | |
name a parser. | |||
2013-06-03 | Improved performance of _replace_cdata_list_attribute_values, and greatly ↵ | Leonard Richardson | |
reduced the number of times it is called. | |||
2013-05-31 | Create a new lxml parser object for every new parsing strategy. | Leonard Richardson | |
2013-05-20 | The default XML formatter will now replace ampersands even if they appear to ↵ | Leonard Richardson | |
be part of entities. That is, "<" will become "&lt;".[bug=1182183] | |||
2012-06-30 | Fixed an html5lib tree builder crash which happened when html5lib | Leonard Richardson | |
moved a tag with a multivalued attribute from one part of the tree to another. [bug=1019603] | |||
2012-04-26 | The test suite now passes when lxml is not installed, whether or not ↵ | Leonard Richardson | |
html5lib is installed. [bug=987004] | |||
2012-04-18 | Made encoding substitution in <meta> tags completely transparent (no more ↵ | Leonard Richardson | |
%SOUP-ENCODING%). | |||
2012-03-30 | Fixed a typo that caused some versions of Python 3 to convert the Beautiful ↵ | Leonard Richardson | |
Soup codebase incorrectly. | |||
2012-03-01 | In HTML5-style <meta charset="foo"> tags, the value of the "charset" ↵ | Leonard Richardson | |
attribute is now replaced with the appropriate encoding on output. [bug=942714] | |||
2012-02-15 | Some cdata-list attributes are only cdata lists for certain tags. | Leonard Richardson | |
2012-02-09 | As a last-ditch attempt to turn data into Unicode, use errors=replace ↵ | Leonard Richardson | |
instead of errors=strict. | |||
2012-02-08 | Rationalized the treatment of multi-valued HTML attributes such as 'class' | Leonard Richardson | |
2012-02-07 | Newly created tags use the same empty-element rules as the builder used to ↵ | Leonard Richardson | |
originally create the soup. | |||
2011-05-21 | More Python 3 compatibility. | Leonard Richardson | |
2011-05-21 | More Python 3 compatibility. | Leonard Richardson | |
2011-02-27 | Got rid of __package__; hopefully this is the only thing holding up 2.5 support. | Leonard Richardson | |
2011-02-27 | Added a tree builder for the built-in HTMLParser, and tests. | Leonard Richardson | |