summaryrefslogtreecommitdiff
path: root/bs4/builder
AgeCommit message (Collapse)Author
2019-09-02Avoid a crash when trying to detect the declared encoding of aLeonard Richardson
Unicode document. Raise an explanatory exception when the underlying parser completely rejects the incoming markup. [bug=1838877]
2019-07-21Implemented line number tracking for html5lib.Leonard Richardson
2019-07-21Adapt Chris Mayo's code to track line number and position when using ↵Leonard Richardson
html.parser.
2019-07-14Give the Formatter class more control over formatting decisions.Leonard Richardson
2019-07-07Renamed the cdata_list_attributes argument to multi_valued_attributes since ↵Leonard Richardson
it's facing the end-user and that's a more easily understandable name.
2019-07-07It's now possible to override a TreeBuilder's cdata_list_attributes ↵Leonard Richardson
dictionary by passing in a replacement. None will disable the feature altogether. [bug=1832978]
2019-01-06Don't track un-prefixed namespacesIsaac Muse
2018-12-30Fixed a problem with multi-valued attributes where the valueLeonard Richardson
contained whitespace. Thanks to Jens Svalgaard for the fix. [bug=1787453]
2018-12-24Clarified the software license.Leonard Richardson
2018-12-24Keep track of the namespace abbreviations found while parsing the document. ↵Leonard Richardson
This makes select() work most of the time without requiring a value for 'namespaces'.
2018-12-22Fix next and previous linkage issues. Fixes issues #1806598 and #1782928.Isaac Muse
2018-08-12Converted README to Markdown format.Leonard Richardson
2018-07-28Correctly handle invalid HTML numeric character entities like “Leonard Richardson
which reference code points that are not Unicode code points. Note that this is only fixed when Beautiful Soup is used with the html.parser parser -- html5lib already worked and I couldn't fix it with lxml. [bug=1782933]
2018-07-21Fixed a problem where the html.parser tree builder interpretedLeonard Richardson
a string like '&foo ' as the character entity '&foo;' [bug=1728706]
2018-07-18Preserve XML namespaces when they are introduced inside an XMLLeonard Richardson
document, not just the ones introduced at the top level. [bug=1718787]
2018-07-15Introduced the Formatter system. [bug=1716272].Leonard Richardson
2018-07-15It's possible for a TreeBuilder subclass to specify that voidLeonard Richardson
elements should be represented as <element> rather than <element/>, by setting TreeBuilder.void_element_close_prefix to the empty string. [bug=1716272]
2018-07-15Stop data loss when encountering an empty numeric entity, andLeonard Richardson
possibly in other cases. Thanks to tos.kamiya for the fix. [bug=1698503]
2018-07-14Stopped HTMLParser from raising an exception in very rare cases ofLeonard Richardson
bad markup. [bug=1708831]
2017-05-06 Improved the handling of empty-element tags like <br> when using theLeonard Richardson
html.parser parser. [bug=1676935]
2017-05-06HTML parsers treat all HTML4 and HTML5 empty element tags (aka void element ↵Leonard Richardson
tags) correctly. [bug=1656909]
2016-12-19Fixed foster parenting when html5lib is the tree builder. Thanks to Geoffrey ↵Leonard Richardson
Sneddon for a patch and test.
2016-12-19Fixed yet another problem that caused the html5lib tree builder toLeonard Richardson
2016-07-30Explained why we test both unicode and bytestring processing instructions.Leonard Richardson
2016-07-26Fixed a reported (but not duplicated) bug involving processing instructions ↵Leonard Richardson
fed into the lxml HTML parser.
2016-07-16Beautiful Soup will now work with versions of html5lib greater thanLeonard Richardson
0.99999999. [bug=1603299]
2016-07-16Removed imports to pdb, since pdb is not available in some environments. ↵Leonard Richardson
[bug=1491700]
2016-07-16The contents of <textarea> tags will no longer be modified when theLeonard Richardson
tree is prettified. [bug=1555829]
2016-07-16Added a separate class for XML processing instructions, which have a ↵Leonard Richardson
slightly different format from SGML processing instructions. [bug=1504383]
2016-07-16Rename COPYING.txt to LICENSE. Add a reference to LICENSE in every source file.Leonard Richardson
2015-12-08Fix foster parenting with html5lib.Geoffrey Sneddon
This makes all of the html5lib tests pass. Yay!
2015-12-08Make TreeBuilderForHtml5lib strictly follow the html5lib API.Geoffrey Sneddon
This slightly changes the constructor (to make soup optional), and adds a testSerializer method so the tests can be run against it.
2015-09-28Fixed a parse bug with the html5lib tree-builder. Thanks to RoelLeonard Richardson
Kramer for the patch. [bug=1483781]
2015-06-28 It's now possible to pickle a BeautifulSoup object no matter whichLeonard Richardson
tree builder was used to create it. However, the only tree builder that survives the pickling process is the HTMLParserTreeBuilder ('html.parser'). If you unpickle a BeautifulSoup object created with some other tree builder, soup.builder will be None. [bug=1231545]
2015-06-28Changed the way soup objects work under copy.copy(). Copying aLeonard Richardson
NavigableString or a Tag will give you a new NavigableString that's equal to the old one but not connected to the parse tree. Patch by Martijn Peters. [bug=1307490]
2015-06-28Fixed a bug where Element.extract() could create an infinite loop inLeonard Richardson
the remaining tree.
2015-06-28Accept 'xml' as an unambiguous identifier for the lxml XML parser, since ↵Leonard Richardson
it's the only XML parser supported at the moment.
2015-06-27Added an exclude_encodings argument to UnicodeDammit and to theLeonard Richardson
Beautiful Soup constructor, which lets you prohibit the detection of an encoding that you know is wrong. [bug=1469408]
2015-06-26Added a sanity check helper method that makes sure all the elements of a ↵Leonard Richardson
tree are properly connected via .next_element and .previous_element.
2015-06-24Fixed an import error in Python 3.5 caused by the removal of theLeonard Richardson
2015-06-24Made double sure that we don't use the 'strict' constructor argument when ↵Leonard Richardson
it's deprecated. [bug=1341055]
2015-06-24If the initial <html> tag contains a CDATA list attribute such asLeonard Richardson
'class', the html5lib tree builder will now turn its value into a list, as it would with any other tag. [bug=1296481]
2015-06-23Got a hacky fix for the latest html5lib problem.Leonard Richardson
2014-12-11Improved the lxml tree builder's handling of processingLeonard Richardson
instructions. [bug=1294645]
2014-12-07In Python 3.4 and above, set the new convert_charrefs argument toLeonard Richardson
the html.parser constructor to avoid a warning and future failures. Patch by Stefano Revera. [bug=1375721]
2014-12-07Tweaked the parser warning.Leonard Richardson
2014-12-07Issue a warning if the BeautifulSoup constructor arguments do not explicitly ↵Leonard Richardson
name a parser.
2013-10-18Fixed yet another problem that caused the html5lib tree builder toLeonard Richardson
create a disconnected parse tree. [bug=1237763]
2013-10-01Fixed a bug in which short Unicode input was improperly encoded to ASCII ↵Leonard Richardson
when checking whether or not it was a file on disk. [bug=1227016]
2013-08-13* Fixed yet another problem with the html5lib tree builder, caused byLeonard Richardson
html5lib's tendency to rearrange the tree during parsing. [bug=1189267]