summaryrefslogtreecommitdiff
path: root/bs4/testing.py
AgeCommit message (Collapse)Author
2019-07-21Implemented line number tracking for html5lib.Leonard Richardson
2019-07-21Adapt Chris Mayo's code to track line number and position when using ↵Leonard Richardson
html.parser.
2019-07-07' (which is valid in XML and XHTML, but not HTML 4) is nowLeonard Richardson
recognized as a named entity and converted to a single quote. [bug=1818721]
2019-07-07It's now possible to override a TreeBuilder's cdata_list_attributes ↵Leonard Richardson
dictionary by passing in a replacement. None will disable the feature altogether. [bug=1832978]
2018-12-30Fixed a problem with multi-valued attributes where the valueLeonard Richardson
contained whitespace. Thanks to Jens Svalgaard for the fix. [bug=1787453]
2018-12-30Merging the linkage checker and html5lib fixes by Isaac Muse found in ↵Leonard Richardson
https://code.launchpad.net/~facelessuser/beautifulsoup/html5lib-fix/+merge/361282. [bug=1809910]
2018-12-26Remove dead line of codeIsaac Muse
2018-12-25Ensure html5lib always has valid internal linkageIsaac Muse
html5lib, with malformed HTML, can end up with detached linkage internally. Improve the current code to ensure html5lib always has proper linkage.
2018-12-24Clarified the software license.Leonard Richardson
2018-07-28Correctly handle invalid HTML numeric character entities like “Leonard Richardson
which reference code points that are not Unicode code points. Note that this is only fixed when Beautiful Soup is used with the html.parser parser -- html5lib already worked and I couldn't fix it with lxml. [bug=1782933]
2018-07-21Fixed a problem where the html.parser tree builder interpretedLeonard Richardson
a string like '&foo ' as the character entity '&foo;' [bug=1728706]
2018-07-18Fixed a bug where find_all() was not working when asked to find aLeonard Richardson
tag with a namespaced name in an XML document that was parsed as HTML. [bug=1723783]
2018-07-18Preserve XML namespaces when they are introduced inside an XMLLeonard Richardson
document, not just the ones introduced at the top level. [bug=1718787]
2018-07-15Stop data loss when encountering an empty numeric entity, andLeonard Richardson
possibly in other cases. Thanks to tos.kamiya for the fix. [bug=1698503]
2017-05-07Namespace prefix is preserved when an XML tag is copied. ThanksLeonard Richardson
to Vikas for a patch and test. [bug=1685172]
2017-05-06 Improved the handling of empty-element tags like <br> when using theLeonard Richardson
html.parser parser. [bug=1676935]
2017-05-06HTML parsers treat all HTML4 and HTML5 empty element tags (aka void element ↵Leonard Richardson
tags) correctly. [bug=1656909]
2017-05-06It's now possible to use a tag's namespace prefix when searching,Leonard Richardson
e.g. soup.find('namespace:tag') [bug=1655332]
2016-07-30Explained why we test both unicode and bytestring processing instructions.Leonard Richardson
2016-07-16Beautiful Soup will now work with versions of html5lib greater thanLeonard Richardson
0.99999999. [bug=1603299]
2016-07-16The contents of <textarea> tags will no longer be modified when theLeonard Richardson
tree is prettified. [bug=1555829]
2016-07-16Added a separate class for XML processing instructions, which have a ↵Leonard Richardson
slightly different format from SGML processing instructions. [bug=1504383]
2016-07-16Rename COPYING.txt to LICENSE. Add a reference to LICENSE in every source file.Leonard Richardson
2015-09-28Add a __license__ statement to all source files.Leonard Richardson
2015-09-28Corrected the output of Declaration objects. [bug=1477847]Leonard Richardson
2015-06-28 It's now possible to pickle a BeautifulSoup object no matter whichLeonard Richardson
tree builder was used to create it. However, the only tree builder that survives the pickling process is the HTMLParserTreeBuilder ('html.parser'). If you unpickle a BeautifulSoup object created with some other tree builder, soup.builder will be None. [bug=1231545]
2015-06-26Added a sanity check helper method that makes sure all the elements of a ↵Leonard Richardson
tree are properly connected via .next_element and .previous_element.
2015-06-24If the initial <html> tag contains a CDATA list attribute such asLeonard Richardson
'class', the html5lib tree builder will now turn its value into a list, as it would with any other tag. [bug=1296481]
2015-06-23Got a hacky fix for the latest html5lib problem.Leonard Richardson
2015-06-23Force object_was_parsed() to keep the tree intact even when an elementLeonard Richardson
from later in the document is moved into place. [bug=1430633]
2014-12-11Improved the lxml tree builder's handling of processingLeonard Richardson
instructions. [bug=1294645]
2014-12-07Issue a warning if the BeautifulSoup constructor arguments do not explicitly ↵Leonard Richardson
name a parser.
2013-10-18Fixed yet another problem that caused the html5lib tree builder toLeonard Richardson
create a disconnected parse tree. [bug=1237763]
2013-06-02Merged in big encoding-detection refactoring branch.Leonard Richardson
2013-05-31The html.parser treebuilder can now handle numeric attributes inLeonard Richardson
text when the hexidecimal name of the attribute starts with a capital X.
2013-05-31Create a new lxml parser object for every new parsing strategy.Leonard Richardson
2013-05-20Fixed another bug by which the html5lib tree builder could create aLeonard Richardson
disconnected tree. [bug=1182089]
2013-05-20Fixed test failures when lxml is not installed.Leonard Richardson
2013-05-07Now that lxml's segfault on invalid doctype has been fixed, fix aLeonard Richardson
corresponding problem on the Beautiful Soup end that was previously invisible. [bug=984936]
2013-05-06Added failing test.Leonard Richardson
2012-10-11Fix a bug in the lxml treebuilder which crashed when a tag includedLeonard Richardson
an attribute from the predefined xml: namespace. [bug=1065617]
2012-08-21Fixed a problem with the html5lib builder not handling comments correctly.Leonard Richardson
2012-08-16Use namespace prefixes for namespaced attribute names, instead ofLeonard Richardson
the fully-qualified names given by the lxml parser. [bug=1037597]
2012-07-03Added test for bug 1020300.Leonard Richardson
2012-07-02Correctly handle closing tags with an XML namespace declared. Patch by ↵Leonard Richardson
Andreas Kostyrka. [bug=1019635]
2012-06-30Fixed an html5lib tree builder crash which happened when html5libLeonard Richardson
moved a tag with a multivalued attribute from one part of the tree to another. [bug=1019603]
2012-05-24Fixed a bug with the lxml treebuilder that prevented the user from adding ↵Leonard Richardson
attributes to a tag that didn't originally have any. [bug=1002378] Thanks to Oliver Beattie for the patch.
2012-05-03Fixed the handling of &quot; with the built-in parser. [bug=993871]Leonard Richardson
2012-04-26The test suite now passes when lxml is not installed, whether or not ↵Leonard Richardson
html5lib is installed. [bug=987004]
2012-04-18Made encoding substitution in <meta> tags completely transparent (no more ↵Leonard Richardson
%SOUP-ENCODING%).