Age | Commit message (Collapse) | Author | |
---|---|---|---|
2021-10-09 | Moved testing.py into the same package as the tests. | Leonard Richardson | |
2021-09-12 | Ported unit tests to use pytest. | Leonard Richardson | |
2021-09-07 | Goodbye, Python 2. [bug=1942919] | Leonard Richardson | |
2021-05-31 | The html.parser tree builder can now handles named entities | Leonard Richardson | |
found in the HTML5 spec in much the same way that the html5lib tree builder does. Note that the lxml tree builder still handles named entities differently. [bug=1924908] | |||
2021-04-08 | Brought in fuzz tests from the oss-project into Beautiful Soup's unit test ↵ | Leonard Richardson | |
suite. | |||
2020-05-30 | Fixed a bug that caused too many tags to be popped from the tag | Leonard Richardson | |
stack during tree building, when encountering a closing tag that had no matching opening tag. [bug=1880420] | |||
2020-04-24 | If you encode a document with a Python-specific encoding like | Leonard Richardson | |
'unicode_escape', that encoding is no longer mentioned in the final XML or HTML document. Instead, encoding information is omitted or left blank. [bug=1874955] | |||
2020-04-05 | Embedded CSS and Javascript is now stored in distinct Stylesheet and | Leonard Richardson | |
Script tags, which are ignored by methods like get_text(). This feature is not supported by the html5lib treebuilder. [bug=1868861] | |||
2019-11-11 | The html.parser tree builder now correctly handles DOCTYPEs that are | Leonard Richardson | |
not uppercase. [bug=1848401] | |||
2019-07-21 | Implemented line number tracking for html5lib. | Leonard Richardson | |
2019-07-21 | Adapt Chris Mayo's code to track line number and position when using ↵ | Leonard Richardson | |
html.parser. | |||
2019-07-07 | ' (which is valid in XML and XHTML, but not HTML 4) is now | Leonard Richardson | |
recognized as a named entity and converted to a single quote. [bug=1818721] | |||
2019-07-07 | It's now possible to override a TreeBuilder's cdata_list_attributes ↵ | Leonard Richardson | |
dictionary by passing in a replacement. None will disable the feature altogether. [bug=1832978] | |||
2018-12-30 | Fixed a problem with multi-valued attributes where the value | Leonard Richardson | |
contained whitespace. Thanks to Jens Svalgaard for the fix. [bug=1787453] | |||
2018-12-30 | Merging the linkage checker and html5lib fixes by Isaac Muse found in ↵ | Leonard Richardson | |
https://code.launchpad.net/~facelessuser/beautifulsoup/html5lib-fix/+merge/361282. [bug=1809910] | |||
2018-12-26 | Remove dead line of code | Isaac Muse | |
2018-12-25 | Ensure html5lib always has valid internal linkage | Isaac Muse | |
html5lib, with malformed HTML, can end up with detached linkage internally. Improve the current code to ensure html5lib always has proper linkage. | |||
2018-12-24 | Clarified the software license. | Leonard Richardson | |
2018-07-28 | Correctly handle invalid HTML numeric character entities like “ | Leonard Richardson | |
which reference code points that are not Unicode code points. Note that this is only fixed when Beautiful Soup is used with the html.parser parser -- html5lib already worked and I couldn't fix it with lxml. [bug=1782933] | |||
2018-07-21 | Fixed a problem where the html.parser tree builder interpreted | Leonard Richardson | |
a string like '&foo ' as the character entity '&foo;' [bug=1728706] | |||
2018-07-18 | Fixed a bug where find_all() was not working when asked to find a | Leonard Richardson | |
tag with a namespaced name in an XML document that was parsed as HTML. [bug=1723783] | |||
2018-07-18 | Preserve XML namespaces when they are introduced inside an XML | Leonard Richardson | |
document, not just the ones introduced at the top level. [bug=1718787] | |||
2018-07-15 | Stop data loss when encountering an empty numeric entity, and | Leonard Richardson | |
possibly in other cases. Thanks to tos.kamiya for the fix. [bug=1698503] | |||
2017-05-07 | Namespace prefix is preserved when an XML tag is copied. Thanks | Leonard Richardson | |
to Vikas for a patch and test. [bug=1685172] | |||
2017-05-06 | Improved the handling of empty-element tags like <br> when using the | Leonard Richardson | |
html.parser parser. [bug=1676935] | |||
2017-05-06 | HTML parsers treat all HTML4 and HTML5 empty element tags (aka void element ↵ | Leonard Richardson | |
tags) correctly. [bug=1656909] | |||
2017-05-06 | It's now possible to use a tag's namespace prefix when searching, | Leonard Richardson | |
e.g. soup.find('namespace:tag') [bug=1655332] | |||
2016-07-30 | Explained why we test both unicode and bytestring processing instructions. | Leonard Richardson | |
2016-07-16 | Beautiful Soup will now work with versions of html5lib greater than | Leonard Richardson | |
0.99999999. [bug=1603299] | |||
2016-07-16 | The contents of <textarea> tags will no longer be modified when the | Leonard Richardson | |
tree is prettified. [bug=1555829] | |||
2016-07-16 | Added a separate class for XML processing instructions, which have a ↵ | Leonard Richardson | |
slightly different format from SGML processing instructions. [bug=1504383] | |||
2016-07-16 | Rename COPYING.txt to LICENSE. Add a reference to LICENSE in every source file. | Leonard Richardson | |
2015-09-28 | Add a __license__ statement to all source files. | Leonard Richardson | |
2015-09-28 | Corrected the output of Declaration objects. [bug=1477847] | Leonard Richardson | |
2015-06-28 | It's now possible to pickle a BeautifulSoup object no matter which | Leonard Richardson | |
tree builder was used to create it. However, the only tree builder that survives the pickling process is the HTMLParserTreeBuilder ('html.parser'). If you unpickle a BeautifulSoup object created with some other tree builder, soup.builder will be None. [bug=1231545] | |||
2015-06-26 | Added a sanity check helper method that makes sure all the elements of a ↵ | Leonard Richardson | |
tree are properly connected via .next_element and .previous_element. | |||
2015-06-24 | If the initial <html> tag contains a CDATA list attribute such as | Leonard Richardson | |
'class', the html5lib tree builder will now turn its value into a list, as it would with any other tag. [bug=1296481] | |||
2015-06-23 | Got a hacky fix for the latest html5lib problem. | Leonard Richardson | |
2015-06-23 | Force object_was_parsed() to keep the tree intact even when an element | Leonard Richardson | |
from later in the document is moved into place. [bug=1430633] | |||
2014-12-11 | Improved the lxml tree builder's handling of processing | Leonard Richardson | |
instructions. [bug=1294645] | |||
2014-12-07 | Issue a warning if the BeautifulSoup constructor arguments do not explicitly ↵ | Leonard Richardson | |
name a parser. | |||
2013-10-18 | Fixed yet another problem that caused the html5lib tree builder to | Leonard Richardson | |
create a disconnected parse tree. [bug=1237763] | |||
2013-06-02 | Merged in big encoding-detection refactoring branch. | Leonard Richardson | |
2013-05-31 | The html.parser treebuilder can now handle numeric attributes in | Leonard Richardson | |
text when the hexidecimal name of the attribute starts with a capital X. | |||
2013-05-31 | Create a new lxml parser object for every new parsing strategy. | Leonard Richardson | |
2013-05-20 | Fixed another bug by which the html5lib tree builder could create a | Leonard Richardson | |
disconnected tree. [bug=1182089] | |||
2013-05-20 | Fixed test failures when lxml is not installed. | Leonard Richardson | |
2013-05-07 | Now that lxml's segfault on invalid doctype has been fixed, fix a | Leonard Richardson | |
corresponding problem on the Beautiful Soup end that was previously invisible. [bug=984936] | |||
2013-05-06 | Added failing test. | Leonard Richardson | |
2012-10-11 | Fix a bug in the lxml treebuilder which crashed when a tag included | Leonard Richardson | |
an attribute from the predefined xml: namespace. [bug=1065617] |