Age | Commit message (Collapse) | Author | |
---|---|---|---|
2024-08-21 | * Changes to make tests work whether tests are run under soupsieve 2.6 | Leonard Richardson | |
or an earlier version. Based on a patch by Stefano Rivera. * Removed the strip_cdata argument to lxml's HTMLParser constructor, which never did anything and is deprecated as of lxml 5.3.0. Patch by Stefano Rivera. [bug=2076897] | |||
2024-02-12 | Applied patch from Marc Müller to add a stacklevel to a warning that was ↵ | Leonard Richardson | |
missing it. | |||
2024-01-17 | Added the correct stacklevel to instances of the XMLParsedAsHTMLWarning. | Leonard Richardson | |
[bug=2034451] | |||
2023-06-04 | Fixed a case found by Mengyuhan where html.parser giving up on | Leonard Richardson | |
markup would result in an AssertionError instead of a ParserRejectedMarkup exception. | |||
2023-02-15 | When the html.parser parser decides it can't parse a document, Beautiful | Leonard Richardson | |
Soup now consistently propagates this fact by raising a ParserRejectedMarkup error. [bug=2007343] | |||
2023-01-27 | Got rid of some more warnings by removing code that's not relevant anymore, ↵ | Leonard Richardson | |
now that the minimum supported Python version is 3.6. | |||
2023-01-27 | Warnings now do their best to provide an appropriate stacklevel, | Leonard Richardson | |
improving the usefulness of the message. [bug=1978744] | |||
2022-04-10 | Fixed another crash when overriding multi_valued_attributes and using the | Leonard Richardson | |
html5lib parser. [bug=1948488] | |||
2021-11-29 | Do a better job of keeping track of namespaces as an XML document is | Leonard Richardson | |
parsed, so that CSS selectors that use namespaces will do the right thing more often. [bug=1946243] | |||
2021-10-24 | Issue a warning when an HTML parser is used to parse a document that | Leonard Richardson | |
looks like XML but not XHTML. [bug=1939121] | |||
2021-10-23 | Added a workaround for an lxml bug ↵ | Leonard Richardson | |
(https://bugs.launchpad.net/lxml/+bug/1948551) that caused problems when parsing a Unicode string beginning with BYTE ORDER MARK. [bug=1947768] | |||
2021-10-23 | Fixed a crash when overriding multi_valued_attributes and using the | Leonard Richardson | |
html5lib parser. [bug=1948488] | |||
2021-10-11 | Added special string classes, RubyParenthesisString and RubyTextString, | Leonard Richardson | |
to make it possible to treat ruby text specially in get_text() calls. [bug=1941980] | |||
2021-09-07 | Goodbye, Python 2. [bug=1942919] | Leonard Richardson | |
2021-05-31 | The html.parser tree builder can now handles named entities | Leonard Richardson | |
found in the HTML5 spec in much the same way that the html5lib tree builder does. Note that the lxml tree builder still handles named entities differently. [bug=1924908] | |||
2021-02-13 | Added a second way to pass specify encodings to UnicodeDammit and | Leonard Richardson | |
EncodingDetector, based on the order of precedence defined in the HTML5 spec, starting at: https://html.spec.whatwg.org/multipage/parsing.html#parsing-with-a-known-character-encoding Encodings in 'known_definite_encodings' are tried first, then byte-order-mark sniffing is run, then encodings in 'user_encodings' are tried. The old argument, 'override_encodings', is now a deprecated alias for 'known_definite_encodings'. This changes the default behavior of the html.parser and lxml tree builders, in a way that may slightly improve encoding detection but will probably have no effect. [bug=1889014] | |||
2020-05-30 | Remove explicit reference to the module name within the module, replacing it ↵ | Leonard Richardson | |
with __name__. | |||
2020-05-17 | Switch entirely to Python 3-style print statements, even in Python 2. | Leonard Richardson | |
2020-05-17 | Documented some recently added customization features. | Leonard Richardson | |
2020-05-17 | Added a keyword argument on_duplicate_attribute to the | Leonard Richardson | |
BeautifulSoupHTMLParser constructor (used by the html.parser tree builder) which lets you customize the handling of markup that contains the same attribute more than once, as in: <a href="url1" href="url2"> [bug=1878209] | |||
2020-04-05 | Embedded CSS and Javascript is now stored in distinct Stylesheet and | Leonard Richardson | |
Script tags, which are ignored by methods like get_text(). This feature is not supported by the html5lib treebuilder. [bug=1868861] | |||
2019-12-24 | Added docstrings for some but not all tree buidlers. | Leonard Richardson | |
2019-11-11 | Simplified code. | Leonard Richardson | |
2019-11-11 | The html.parser tree builder now correctly handles DOCTYPEs that are | Leonard Richardson | |
not uppercase. [bug=1848401] | |||
2019-11-11 | Added a Brazilian Portuguese translation by Cezar Peixeiro. | Leonard Richardson | |
2019-09-02 | Avoid a crash when trying to detect the declared encoding of a | Leonard Richardson | |
Unicode document. Raise an explanatory exception when the underlying parser completely rejects the incoming markup. [bug=1838877] | |||
2019-07-21 | Implemented line number tracking for html5lib. | Leonard Richardson | |
2019-07-21 | Adapt Chris Mayo's code to track line number and position when using ↵ | Leonard Richardson | |
html.parser. | |||
2019-07-14 | Give the Formatter class more control over formatting decisions. | Leonard Richardson | |
2019-07-07 | Renamed the cdata_list_attributes argument to multi_valued_attributes since ↵ | Leonard Richardson | |
it's facing the end-user and that's a more easily understandable name. | |||
2019-07-07 | It's now possible to override a TreeBuilder's cdata_list_attributes ↵ | Leonard Richardson | |
dictionary by passing in a replacement. None will disable the feature altogether. [bug=1832978] | |||
2019-01-06 | Don't track un-prefixed namespaces | Isaac Muse | |
2018-12-30 | Fixed a problem with multi-valued attributes where the value | Leonard Richardson | |
contained whitespace. Thanks to Jens Svalgaard for the fix. [bug=1787453] | |||
2018-12-24 | Clarified the software license. | Leonard Richardson | |
2018-12-24 | Keep track of the namespace abbreviations found while parsing the document. ↵ | Leonard Richardson | |
This makes select() work most of the time without requiring a value for 'namespaces'. | |||
2018-12-22 | Fix next and previous linkage issues. Fixes issues #1806598 and #1782928. | Isaac Muse | |
2018-08-12 | Converted README to Markdown format. | Leonard Richardson | |
2018-07-28 | Correctly handle invalid HTML numeric character entities like “ | Leonard Richardson | |
which reference code points that are not Unicode code points. Note that this is only fixed when Beautiful Soup is used with the html.parser parser -- html5lib already worked and I couldn't fix it with lxml. [bug=1782933] | |||
2018-07-21 | Fixed a problem where the html.parser tree builder interpreted | Leonard Richardson | |
a string like '&foo ' as the character entity '&foo;' [bug=1728706] | |||
2018-07-18 | Preserve XML namespaces when they are introduced inside an XML | Leonard Richardson | |
document, not just the ones introduced at the top level. [bug=1718787] | |||
2018-07-15 | Introduced the Formatter system. [bug=1716272]. | Leonard Richardson | |
2018-07-15 | It's possible for a TreeBuilder subclass to specify that void | Leonard Richardson | |
elements should be represented as <element> rather than <element/>, by setting TreeBuilder.void_element_close_prefix to the empty string. [bug=1716272] | |||
2018-07-15 | Stop data loss when encountering an empty numeric entity, and | Leonard Richardson | |
possibly in other cases. Thanks to tos.kamiya for the fix. [bug=1698503] | |||
2018-07-14 | Stopped HTMLParser from raising an exception in very rare cases of | Leonard Richardson | |
bad markup. [bug=1708831] | |||
2017-05-06 | Improved the handling of empty-element tags like <br> when using the | Leonard Richardson | |
html.parser parser. [bug=1676935] | |||
2017-05-06 | HTML parsers treat all HTML4 and HTML5 empty element tags (aka void element ↵ | Leonard Richardson | |
tags) correctly. [bug=1656909] | |||
2016-12-19 | Fixed foster parenting when html5lib is the tree builder. Thanks to Geoffrey ↵ | Leonard Richardson | |
Sneddon for a patch and test. | |||
2016-12-19 | Fixed yet another problem that caused the html5lib tree builder to | Leonard Richardson | |
2016-07-30 | Explained why we test both unicode and bytestring processing instructions. | Leonard Richardson | |
2016-07-26 | Fixed a reported (but not duplicated) bug involving processing instructions ↵ | Leonard Richardson | |
fed into the lxml HTML parser. |