Age | Commit message (Collapse) | Author | |
---|---|---|---|
2018-12-24 | Keep track of the namespace abbreviations found while parsing the document. ↵ | Leonard Richardson | |
This makes select() work most of the time without requiring a value for 'namespaces'. | |||
2018-12-22 | Fix next and previous linkage issues. Fixes issues #1806598 and #1782928. | Isaac Muse | |
2018-08-12 | Converted README to Markdown format. | Leonard Richardson | |
2018-07-28 | Correctly handle invalid HTML numeric character entities like “ | Leonard Richardson | |
which reference code points that are not Unicode code points. Note that this is only fixed when Beautiful Soup is used with the html.parser parser -- html5lib already worked and I couldn't fix it with lxml. [bug=1782933] | |||
2018-07-21 | Fixed a problem where the html.parser tree builder interpreted | Leonard Richardson | |
a string like '&foo ' as the character entity '&foo;' [bug=1728706] | |||
2018-07-18 | Preserve XML namespaces when they are introduced inside an XML | Leonard Richardson | |
document, not just the ones introduced at the top level. [bug=1718787] | |||
2018-07-15 | Introduced the Formatter system. [bug=1716272]. | Leonard Richardson | |
2018-07-15 | It's possible for a TreeBuilder subclass to specify that void | Leonard Richardson | |
elements should be represented as <element> rather than <element/>, by setting TreeBuilder.void_element_close_prefix to the empty string. [bug=1716272] | |||
2018-07-15 | Stop data loss when encountering an empty numeric entity, and | Leonard Richardson | |
possibly in other cases. Thanks to tos.kamiya for the fix. [bug=1698503] | |||
2018-07-14 | Stopped HTMLParser from raising an exception in very rare cases of | Leonard Richardson | |
bad markup. [bug=1708831] | |||
2017-05-06 | Improved the handling of empty-element tags like <br> when using the | Leonard Richardson | |
html.parser parser. [bug=1676935] | |||
2017-05-06 | HTML parsers treat all HTML4 and HTML5 empty element tags (aka void element ↵ | Leonard Richardson | |
tags) correctly. [bug=1656909] | |||
2016-12-19 | Fixed foster parenting when html5lib is the tree builder. Thanks to Geoffrey ↵ | Leonard Richardson | |
Sneddon for a patch and test. | |||
2016-12-19 | Fixed yet another problem that caused the html5lib tree builder to | Leonard Richardson | |
2016-07-30 | Explained why we test both unicode and bytestring processing instructions. | Leonard Richardson | |
2016-07-26 | Fixed a reported (but not duplicated) bug involving processing instructions ↵ | Leonard Richardson | |
fed into the lxml HTML parser. | |||
2016-07-16 | Beautiful Soup will now work with versions of html5lib greater than | Leonard Richardson | |
0.99999999. [bug=1603299] | |||
2016-07-16 | Removed imports to pdb, since pdb is not available in some environments. ↵ | Leonard Richardson | |
[bug=1491700] | |||
2016-07-16 | The contents of <textarea> tags will no longer be modified when the | Leonard Richardson | |
tree is prettified. [bug=1555829] | |||
2016-07-16 | Added a separate class for XML processing instructions, which have a ↵ | Leonard Richardson | |
slightly different format from SGML processing instructions. [bug=1504383] | |||
2016-07-16 | Rename COPYING.txt to LICENSE. Add a reference to LICENSE in every source file. | Leonard Richardson | |
2015-12-08 | Fix foster parenting with html5lib. | Geoffrey Sneddon | |
This makes all of the html5lib tests pass. Yay! | |||
2015-12-08 | Make TreeBuilderForHtml5lib strictly follow the html5lib API. | Geoffrey Sneddon | |
This slightly changes the constructor (to make soup optional), and adds a testSerializer method so the tests can be run against it. | |||
2015-09-28 | Fixed a parse bug with the html5lib tree-builder. Thanks to Roel | Leonard Richardson | |
Kramer for the patch. [bug=1483781] | |||
2015-06-28 | It's now possible to pickle a BeautifulSoup object no matter which | Leonard Richardson | |
tree builder was used to create it. However, the only tree builder that survives the pickling process is the HTMLParserTreeBuilder ('html.parser'). If you unpickle a BeautifulSoup object created with some other tree builder, soup.builder will be None. [bug=1231545] | |||
2015-06-28 | Changed the way soup objects work under copy.copy(). Copying a | Leonard Richardson | |
NavigableString or a Tag will give you a new NavigableString that's equal to the old one but not connected to the parse tree. Patch by Martijn Peters. [bug=1307490] | |||
2015-06-28 | Fixed a bug where Element.extract() could create an infinite loop in | Leonard Richardson | |
the remaining tree. | |||
2015-06-28 | Accept 'xml' as an unambiguous identifier for the lxml XML parser, since ↵ | Leonard Richardson | |
it's the only XML parser supported at the moment. | |||
2015-06-27 | Added an exclude_encodings argument to UnicodeDammit and to the | Leonard Richardson | |
Beautiful Soup constructor, which lets you prohibit the detection of an encoding that you know is wrong. [bug=1469408] | |||
2015-06-26 | Added a sanity check helper method that makes sure all the elements of a ↵ | Leonard Richardson | |
tree are properly connected via .next_element and .previous_element. | |||
2015-06-24 | Fixed an import error in Python 3.5 caused by the removal of the | Leonard Richardson | |
2015-06-24 | Made double sure that we don't use the 'strict' constructor argument when ↵ | Leonard Richardson | |
it's deprecated. [bug=1341055] | |||
2015-06-24 | If the initial <html> tag contains a CDATA list attribute such as | Leonard Richardson | |
'class', the html5lib tree builder will now turn its value into a list, as it would with any other tag. [bug=1296481] | |||
2015-06-23 | Got a hacky fix for the latest html5lib problem. | Leonard Richardson | |
2014-12-11 | Improved the lxml tree builder's handling of processing | Leonard Richardson | |
instructions. [bug=1294645] | |||
2014-12-07 | In Python 3.4 and above, set the new convert_charrefs argument to | Leonard Richardson | |
the html.parser constructor to avoid a warning and future failures. Patch by Stefano Revera. [bug=1375721] | |||
2014-12-07 | Tweaked the parser warning. | Leonard Richardson | |
2014-12-07 | Issue a warning if the BeautifulSoup constructor arguments do not explicitly ↵ | Leonard Richardson | |
name a parser. | |||
2013-10-18 | Fixed yet another problem that caused the html5lib tree builder to | Leonard Richardson | |
create a disconnected parse tree. [bug=1237763] | |||
2013-10-01 | Fixed a bug in which short Unicode input was improperly encoded to ASCII ↵ | Leonard Richardson | |
when checking whether or not it was a file on disk. [bug=1227016] | |||
2013-08-13 | * Fixed yet another problem with the html5lib tree builder, caused by | Leonard Richardson | |
html5lib's tendency to rearrange the tree during parsing. [bug=1189267] | |||
2013-06-03 | Save another Element creation. | Leonard Richardson | |
2013-06-03 | Improved performance for html5lib. | Leonard Richardson | |
2013-06-03 | Improved performance of _replace_cdata_list_attribute_values, and greatly ↵ | Leonard Richardson | |
reduced the number of times it is called. | |||
2013-06-02 | Merged in big encoding-detection refactoring branch. | Leonard Richardson | |
2013-06-02 | Turns out we had two bits of code to strip byte-order marks. | Leonard Richardson | |
2013-06-02 | It turns out most of the untested code wasn't doing anything useful. | Leonard Richardson | |
2013-06-02 | Treat an lxml ParserError as a ParserRejectedMarkup. | Leonard Richardson | |
2013-05-31 | The html.parser treebuilder can now handle numeric attributes in | Leonard Richardson | |
text when the hexidecimal name of the attribute starts with a capital X. | |||
2013-05-31 | Create a new lxml parser object for every new parsing strategy. | Leonard Richardson | |