summaryrefslogtreecommitdiff
path: root/bs4/testing.py
AgeCommit message (Collapse)Author
2021-10-09Moved testing.py into the same package as the tests.Leonard Richardson
2021-09-12Ported unit tests to use pytest.Leonard Richardson
2021-09-07Goodbye, Python 2. [bug=1942919]Leonard Richardson
2021-05-31The html.parser tree builder can now handles named entitiesLeonard Richardson
found in the HTML5 spec in much the same way that the html5lib tree builder does. Note that the lxml tree builder still handles named entities differently. [bug=1924908]
2021-04-08Brought in fuzz tests from the oss-project into Beautiful Soup's unit test ↵Leonard Richardson
suite.
2020-05-30Fixed a bug that caused too many tags to be popped from the tagLeonard Richardson
stack during tree building, when encountering a closing tag that had no matching opening tag. [bug=1880420]
2020-04-24If you encode a document with a Python-specific encoding likeLeonard Richardson
'unicode_escape', that encoding is no longer mentioned in the final XML or HTML document. Instead, encoding information is omitted or left blank. [bug=1874955]
2020-04-05Embedded CSS and Javascript is now stored in distinct Stylesheet andLeonard Richardson
Script tags, which are ignored by methods like get_text(). This feature is not supported by the html5lib treebuilder. [bug=1868861]
2019-11-11The html.parser tree builder now correctly handles DOCTYPEs that areLeonard Richardson
not uppercase. [bug=1848401]
2019-07-21Implemented line number tracking for html5lib.Leonard Richardson
2019-07-21Adapt Chris Mayo's code to track line number and position when using ↵Leonard Richardson
html.parser.
2019-07-07' (which is valid in XML and XHTML, but not HTML 4) is nowLeonard Richardson
recognized as a named entity and converted to a single quote. [bug=1818721]
2019-07-07It's now possible to override a TreeBuilder's cdata_list_attributes ↵Leonard Richardson
dictionary by passing in a replacement. None will disable the feature altogether. [bug=1832978]
2018-12-30Fixed a problem with multi-valued attributes where the valueLeonard Richardson
contained whitespace. Thanks to Jens Svalgaard for the fix. [bug=1787453]
2018-12-30Merging the linkage checker and html5lib fixes by Isaac Muse found in ↵Leonard Richardson
https://code.launchpad.net/~facelessuser/beautifulsoup/html5lib-fix/+merge/361282. [bug=1809910]
2018-12-26Remove dead line of codeIsaac Muse
2018-12-25Ensure html5lib always has valid internal linkageIsaac Muse
html5lib, with malformed HTML, can end up with detached linkage internally. Improve the current code to ensure html5lib always has proper linkage.
2018-12-24Clarified the software license.Leonard Richardson
2018-07-28Correctly handle invalid HTML numeric character entities like “Leonard Richardson
which reference code points that are not Unicode code points. Note that this is only fixed when Beautiful Soup is used with the html.parser parser -- html5lib already worked and I couldn't fix it with lxml. [bug=1782933]
2018-07-21Fixed a problem where the html.parser tree builder interpretedLeonard Richardson
a string like '&foo ' as the character entity '&foo;' [bug=1728706]
2018-07-18Fixed a bug where find_all() was not working when asked to find aLeonard Richardson
tag with a namespaced name in an XML document that was parsed as HTML. [bug=1723783]
2018-07-18Preserve XML namespaces when they are introduced inside an XMLLeonard Richardson
document, not just the ones introduced at the top level. [bug=1718787]
2018-07-15Stop data loss when encountering an empty numeric entity, andLeonard Richardson
possibly in other cases. Thanks to tos.kamiya for the fix. [bug=1698503]
2017-05-07Namespace prefix is preserved when an XML tag is copied. ThanksLeonard Richardson
to Vikas for a patch and test. [bug=1685172]
2017-05-06 Improved the handling of empty-element tags like <br> when using theLeonard Richardson
html.parser parser. [bug=1676935]
2017-05-06HTML parsers treat all HTML4 and HTML5 empty element tags (aka void element ↵Leonard Richardson
tags) correctly. [bug=1656909]
2017-05-06It's now possible to use a tag's namespace prefix when searching,Leonard Richardson
e.g. soup.find('namespace:tag') [bug=1655332]
2016-07-30Explained why we test both unicode and bytestring processing instructions.Leonard Richardson
2016-07-16Beautiful Soup will now work with versions of html5lib greater thanLeonard Richardson
0.99999999. [bug=1603299]
2016-07-16The contents of <textarea> tags will no longer be modified when theLeonard Richardson
tree is prettified. [bug=1555829]
2016-07-16Added a separate class for XML processing instructions, which have a ↵Leonard Richardson
slightly different format from SGML processing instructions. [bug=1504383]
2016-07-16Rename COPYING.txt to LICENSE. Add a reference to LICENSE in every source file.Leonard Richardson
2015-09-28Add a __license__ statement to all source files.Leonard Richardson
2015-09-28Corrected the output of Declaration objects. [bug=1477847]Leonard Richardson
2015-06-28 It's now possible to pickle a BeautifulSoup object no matter whichLeonard Richardson
tree builder was used to create it. However, the only tree builder that survives the pickling process is the HTMLParserTreeBuilder ('html.parser'). If you unpickle a BeautifulSoup object created with some other tree builder, soup.builder will be None. [bug=1231545]
2015-06-26Added a sanity check helper method that makes sure all the elements of a ↵Leonard Richardson
tree are properly connected via .next_element and .previous_element.
2015-06-24If the initial <html> tag contains a CDATA list attribute such asLeonard Richardson
'class', the html5lib tree builder will now turn its value into a list, as it would with any other tag. [bug=1296481]
2015-06-23Got a hacky fix for the latest html5lib problem.Leonard Richardson
2015-06-23Force object_was_parsed() to keep the tree intact even when an elementLeonard Richardson
from later in the document is moved into place. [bug=1430633]
2014-12-11Improved the lxml tree builder's handling of processingLeonard Richardson
instructions. [bug=1294645]
2014-12-07Issue a warning if the BeautifulSoup constructor arguments do not explicitly ↵Leonard Richardson
name a parser.
2013-10-18Fixed yet another problem that caused the html5lib tree builder toLeonard Richardson
create a disconnected parse tree. [bug=1237763]
2013-06-02Merged in big encoding-detection refactoring branch.Leonard Richardson
2013-05-31The html.parser treebuilder can now handle numeric attributes inLeonard Richardson
text when the hexidecimal name of the attribute starts with a capital X.
2013-05-31Create a new lxml parser object for every new parsing strategy.Leonard Richardson
2013-05-20Fixed another bug by which the html5lib tree builder could create aLeonard Richardson
disconnected tree. [bug=1182089]
2013-05-20Fixed test failures when lxml is not installed.Leonard Richardson
2013-05-07Now that lxml's segfault on invalid doctype has been fixed, fix aLeonard Richardson
corresponding problem on the Beautiful Soup end that was previously invisible. [bug=984936]
2013-05-06Added failing test.Leonard Richardson
2012-10-11Fix a bug in the lxml treebuilder which crashed when a tag includedLeonard Richardson
an attribute from the predefined xml: namespace. [bug=1065617]