summaryrefslogtreecommitdiff
path: root/bs4/builder
AgeCommit message (Collapse)Author
2015-09-28Fixed a parse bug with the html5lib tree-builder. Thanks to RoelLeonard Richardson
Kramer for the patch. [bug=1483781]
2015-06-28 It's now possible to pickle a BeautifulSoup object no matter whichLeonard Richardson
tree builder was used to create it. However, the only tree builder that survives the pickling process is the HTMLParserTreeBuilder ('html.parser'). If you unpickle a BeautifulSoup object created with some other tree builder, soup.builder will be None. [bug=1231545]
2015-06-28Changed the way soup objects work under copy.copy(). Copying aLeonard Richardson
NavigableString or a Tag will give you a new NavigableString that's equal to the old one but not connected to the parse tree. Patch by Martijn Peters. [bug=1307490]
2015-06-28Fixed a bug where Element.extract() could create an infinite loop inLeonard Richardson
the remaining tree.
2015-06-28Accept 'xml' as an unambiguous identifier for the lxml XML parser, since ↵Leonard Richardson
it's the only XML parser supported at the moment.
2015-06-27Added an exclude_encodings argument to UnicodeDammit and to theLeonard Richardson
Beautiful Soup constructor, which lets you prohibit the detection of an encoding that you know is wrong. [bug=1469408]
2015-06-26Added a sanity check helper method that makes sure all the elements of a ↵Leonard Richardson
tree are properly connected via .next_element and .previous_element.
2015-06-24Fixed an import error in Python 3.5 caused by the removal of theLeonard Richardson
2015-06-24Made double sure that we don't use the 'strict' constructor argument when ↵Leonard Richardson
it's deprecated. [bug=1341055]
2015-06-24If the initial <html> tag contains a CDATA list attribute such asLeonard Richardson
'class', the html5lib tree builder will now turn its value into a list, as it would with any other tag. [bug=1296481]
2015-06-23Got a hacky fix for the latest html5lib problem.Leonard Richardson
2014-12-11Improved the lxml tree builder's handling of processingLeonard Richardson
instructions. [bug=1294645]
2014-12-07In Python 3.4 and above, set the new convert_charrefs argument toLeonard Richardson
the html.parser constructor to avoid a warning and future failures. Patch by Stefano Revera. [bug=1375721]
2014-12-07Tweaked the parser warning.Leonard Richardson
2014-12-07Issue a warning if the BeautifulSoup constructor arguments do not explicitly ↵Leonard Richardson
name a parser.
2013-10-18Fixed yet another problem that caused the html5lib tree builder toLeonard Richardson
create a disconnected parse tree. [bug=1237763]
2013-10-01Fixed a bug in which short Unicode input was improperly encoded to ASCII ↵Leonard Richardson
when checking whether or not it was a file on disk. [bug=1227016]
2013-08-13* Fixed yet another problem with the html5lib tree builder, caused byLeonard Richardson
html5lib's tendency to rearrange the tree during parsing. [bug=1189267]
2013-06-03Save another Element creation.Leonard Richardson
2013-06-03Improved performance for html5lib.Leonard Richardson
2013-06-03Improved performance of _replace_cdata_list_attribute_values, and greatly ↵Leonard Richardson
reduced the number of times it is called.
2013-06-02Merged in big encoding-detection refactoring branch.Leonard Richardson
2013-06-02Turns out we had two bits of code to strip byte-order marks.Leonard Richardson
2013-06-02It turns out most of the untested code wasn't doing anything useful.Leonard Richardson
2013-06-02Treat an lxml ParserError as a ParserRejectedMarkup.Leonard Richardson
2013-05-31The html.parser treebuilder can now handle numeric attributes inLeonard Richardson
text when the hexidecimal name of the attribute starts with a capital X.
2013-05-31Create a new lxml parser object for every new parsing strategy.Leonard Richardson
2013-05-20The default XML formatter will now replace ampersands even if they appear to ↵Leonard Richardson
be part of entities. That is, "&lt;" will become "&amp;lt;".[bug=1182183]
2013-05-20The .next_element attribute used during parsing was confusingly similar to ↵Leonard Richardson
the .next_element navigation attribute. Renamed the former to _most_recent_element.
2013-05-20Fixed another bug by which the html5lib tree builder could create aLeonard Richardson
disconnected tree. [bug=1182089]
2013-05-09Changed lxml.feed() to handle the eventuality that it may be given a bytestring.Leonard Richardson
2013-05-09Added a diagnostic function for randomly generating a simple, invalid HTML ↵Leonard Richardson
document.
2013-05-07Now that lxml's segfault on invalid doctype has been fixed, fix aLeonard Richardson
corresponding problem on the Beautiful Soup end that was previously invisible. [bug=984936]
2012-10-11Fix a bug in the lxml treebuilder which crashed when a tag includedLeonard Richardson
an attribute from the predefined xml: namespace. [bug=1065617]
2012-09-28Fixed package name.Leonard Richardson
2012-08-21We don't need a special insertComment method, we just need to make ↵Leonard Richardson
Element.appendChild call object_was_parsed.
2012-08-21Fixed a problem with the html5lib builder not handling comments correctly.Leonard Richardson
2012-08-16Use namespace prefixes for namespaced attribute names, instead ofLeonard Richardson
the fully-qualified names given by the lxml parser. [bug=1037597]
2012-06-30Fixed an html5lib tree builder crash which happened when html5libLeonard Richardson
moved a tag with a multivalued attribute from one part of the tree to another. [bug=1019603]
2012-05-29Removed breakpoints.Leonard Richardson
2012-05-29Prep for release.Leonard Richardson
2012-05-24Fixed a bug with the lxml treebuilder that prevented the user from adding ↵Leonard Richardson
attributes to a tag that didn't originally have any. [bug=1002378] Thanks to Oliver Beattie for the patch.
2012-04-26The test suite now passes when lxml is not installed, whether or not ↵Leonard Richardson
html5lib is installed. [bug=987004]
2012-04-18Got rid of contains_substitutions.Leonard Richardson
2012-04-18Made encoding substitution in <meta> tags completely transparent (no more ↵Leonard Richardson
%SOUP-ENCODING%).
2012-04-18Changed wording slightly.Leonard Richardson
2012-04-18Print a warning on HTMLParseErrors to let people know they should install an ↵Leonard Richardson
external parser.
2012-04-18Fixed a bug that made the HTMLParser treebuilder generate XML definitions ↵Leonard Richardson
ending with two question marks instead of one. [bug=984258]
2012-04-03Got rid of the 4.0.2 workaround for HTML documents--it was unnecessary and ↵Leonard Richardson
the workaround was triggering a (possibly different, but related) bug in lxml. [bug=972466]
2012-04-03Don't split up the markup into chunks when using the lxml HTML parser, which ↵Leonard Richardson
doesn't have the problems of the XML parser.