Age | Commit message (Collapse) | Author | |
---|---|---|---|
2015-09-28 | Fixed a parse bug with the html5lib tree-builder. Thanks to Roel | Leonard Richardson | |
Kramer for the patch. [bug=1483781] | |||
2015-06-28 | It's now possible to pickle a BeautifulSoup object no matter which | Leonard Richardson | |
tree builder was used to create it. However, the only tree builder that survives the pickling process is the HTMLParserTreeBuilder ('html.parser'). If you unpickle a BeautifulSoup object created with some other tree builder, soup.builder will be None. [bug=1231545] | |||
2015-06-28 | Changed the way soup objects work under copy.copy(). Copying a | Leonard Richardson | |
NavigableString or a Tag will give you a new NavigableString that's equal to the old one but not connected to the parse tree. Patch by Martijn Peters. [bug=1307490] | |||
2015-06-28 | Fixed a bug where Element.extract() could create an infinite loop in | Leonard Richardson | |
the remaining tree. | |||
2015-06-28 | Accept 'xml' as an unambiguous identifier for the lxml XML parser, since ↵ | Leonard Richardson | |
it's the only XML parser supported at the moment. | |||
2015-06-27 | Added an exclude_encodings argument to UnicodeDammit and to the | Leonard Richardson | |
Beautiful Soup constructor, which lets you prohibit the detection of an encoding that you know is wrong. [bug=1469408] | |||
2015-06-26 | Added a sanity check helper method that makes sure all the elements of a ↵ | Leonard Richardson | |
tree are properly connected via .next_element and .previous_element. | |||
2015-06-24 | Fixed an import error in Python 3.5 caused by the removal of the | Leonard Richardson | |
2015-06-24 | Made double sure that we don't use the 'strict' constructor argument when ↵ | Leonard Richardson | |
it's deprecated. [bug=1341055] | |||
2015-06-24 | If the initial <html> tag contains a CDATA list attribute such as | Leonard Richardson | |
'class', the html5lib tree builder will now turn its value into a list, as it would with any other tag. [bug=1296481] | |||
2015-06-23 | Got a hacky fix for the latest html5lib problem. | Leonard Richardson | |
2014-12-11 | Improved the lxml tree builder's handling of processing | Leonard Richardson | |
instructions. [bug=1294645] | |||
2014-12-07 | In Python 3.4 and above, set the new convert_charrefs argument to | Leonard Richardson | |
the html.parser constructor to avoid a warning and future failures. Patch by Stefano Revera. [bug=1375721] | |||
2014-12-07 | Tweaked the parser warning. | Leonard Richardson | |
2014-12-07 | Issue a warning if the BeautifulSoup constructor arguments do not explicitly ↵ | Leonard Richardson | |
name a parser. | |||
2013-10-18 | Fixed yet another problem that caused the html5lib tree builder to | Leonard Richardson | |
create a disconnected parse tree. [bug=1237763] | |||
2013-10-01 | Fixed a bug in which short Unicode input was improperly encoded to ASCII ↵ | Leonard Richardson | |
when checking whether or not it was a file on disk. [bug=1227016] | |||
2013-08-13 | * Fixed yet another problem with the html5lib tree builder, caused by | Leonard Richardson | |
html5lib's tendency to rearrange the tree during parsing. [bug=1189267] | |||
2013-06-03 | Save another Element creation. | Leonard Richardson | |
2013-06-03 | Improved performance for html5lib. | Leonard Richardson | |
2013-06-03 | Improved performance of _replace_cdata_list_attribute_values, and greatly ↵ | Leonard Richardson | |
reduced the number of times it is called. | |||
2013-06-02 | Merged in big encoding-detection refactoring branch. | Leonard Richardson | |
2013-06-02 | Turns out we had two bits of code to strip byte-order marks. | Leonard Richardson | |
2013-06-02 | It turns out most of the untested code wasn't doing anything useful. | Leonard Richardson | |
2013-06-02 | Treat an lxml ParserError as a ParserRejectedMarkup. | Leonard Richardson | |
2013-05-31 | The html.parser treebuilder can now handle numeric attributes in | Leonard Richardson | |
text when the hexidecimal name of the attribute starts with a capital X. | |||
2013-05-31 | Create a new lxml parser object for every new parsing strategy. | Leonard Richardson | |
2013-05-20 | The default XML formatter will now replace ampersands even if they appear to ↵ | Leonard Richardson | |
be part of entities. That is, "<" will become "&lt;".[bug=1182183] | |||
2013-05-20 | The .next_element attribute used during parsing was confusingly similar to ↵ | Leonard Richardson | |
the .next_element navigation attribute. Renamed the former to _most_recent_element. | |||
2013-05-20 | Fixed another bug by which the html5lib tree builder could create a | Leonard Richardson | |
disconnected tree. [bug=1182089] | |||
2013-05-09 | Changed lxml.feed() to handle the eventuality that it may be given a bytestring. | Leonard Richardson | |
2013-05-09 | Added a diagnostic function for randomly generating a simple, invalid HTML ↵ | Leonard Richardson | |
document. | |||
2013-05-07 | Now that lxml's segfault on invalid doctype has been fixed, fix a | Leonard Richardson | |
corresponding problem on the Beautiful Soup end that was previously invisible. [bug=984936] | |||
2012-10-11 | Fix a bug in the lxml treebuilder which crashed when a tag included | Leonard Richardson | |
an attribute from the predefined xml: namespace. [bug=1065617] | |||
2012-09-28 | Fixed package name. | Leonard Richardson | |
2012-08-21 | We don't need a special insertComment method, we just need to make ↵ | Leonard Richardson | |
Element.appendChild call object_was_parsed. | |||
2012-08-21 | Fixed a problem with the html5lib builder not handling comments correctly. | Leonard Richardson | |
2012-08-16 | Use namespace prefixes for namespaced attribute names, instead of | Leonard Richardson | |
the fully-qualified names given by the lxml parser. [bug=1037597] | |||
2012-06-30 | Fixed an html5lib tree builder crash which happened when html5lib | Leonard Richardson | |
moved a tag with a multivalued attribute from one part of the tree to another. [bug=1019603] | |||
2012-05-29 | Removed breakpoints. | Leonard Richardson | |
2012-05-29 | Prep for release. | Leonard Richardson | |
2012-05-24 | Fixed a bug with the lxml treebuilder that prevented the user from adding ↵ | Leonard Richardson | |
attributes to a tag that didn't originally have any. [bug=1002378] Thanks to Oliver Beattie for the patch. | |||
2012-04-26 | The test suite now passes when lxml is not installed, whether or not ↵ | Leonard Richardson | |
html5lib is installed. [bug=987004] | |||
2012-04-18 | Got rid of contains_substitutions. | Leonard Richardson | |
2012-04-18 | Made encoding substitution in <meta> tags completely transparent (no more ↵ | Leonard Richardson | |
%SOUP-ENCODING%). | |||
2012-04-18 | Changed wording slightly. | Leonard Richardson | |
2012-04-18 | Print a warning on HTMLParseErrors to let people know they should install an ↵ | Leonard Richardson | |
external parser. | |||
2012-04-18 | Fixed a bug that made the HTMLParser treebuilder generate XML definitions ↵ | Leonard Richardson | |
ending with two question marks instead of one. [bug=984258] | |||
2012-04-03 | Got rid of the 4.0.2 workaround for HTML documents--it was unnecessary and ↵ | Leonard Richardson | |
the workaround was triggering a (possibly different, but related) bug in lxml. [bug=972466] | |||
2012-04-03 | Don't split up the markup into chunks when using the lxml HTML parser, which ↵ | Leonard Richardson | |
doesn't have the problems of the XML parser. |