Age | Commit message (Collapse) | Author | |
---|---|---|---|
2023-01-27 | Got rid of some more warnings by removing code that's not relevant anymore, ↵ | Leonard Richardson | |
now that the minimum supported Python version is 3.6. | |||
2023-01-25 | Tag.interesting_string_types is now propagated when a tag is | Leonard Richardson | |
copied. [bug=1990400] | |||
2023-01-25 | Made the ISO-8859 test robust in a less hacky way. | Leonard Richardson | |
2023-01-25 | Made the ISO-8859-1 smoke test more robust. | Leonard Richardson | |
2023-01-25 | The HTMLFormatter and XMLFormatter constructors no longer return a | Leonard Richardson | |
value. [bug=1992693] | |||
2023-01-25 | Passing a Tag's .contents into PageElement.extend() now works the | Leonard Richardson | |
same way as passing the Tag itself. | |||
2022-05-15 | Fixed a test failure when cchardet is not installed but | Leonard Richardson | |
charset_normalizer is. [bug=1973072] | |||
2022-04-10 | Fixed another crash when overriding multi_valued_attributes and using the | Leonard Richardson | |
html5lib parser. [bug=1948488] | |||
2022-04-07 | Omit untrusted input when issuing warnings. | Leonard Richardson | |
2021-12-21 | It's now possible to customize the way output is indented by | Leonard Richardson | |
providing a value for the 'indent' argument to the Formatter constructor. The 'indent' argument works very similarly to the argument of the same name in the Python standard library's json.dump() method. [bug=1955497] | |||
2021-12-17 | Fix a crash when pickling a BeautifulSoup object that has no | Leonard Richardson | |
tree builder. [bug=1934003] | |||
2021-11-29 | Do a better job of keeping track of namespaces as an XML document is | Leonard Richardson | |
parsed, so that CSS selectors that use namespaces will do the right thing more often. [bug=1946243] | |||
2021-10-24 | Added test of warn_if_markup_looks_like_xml. | Leonard Richardson | |
2021-10-24 | Issue a warning when an HTML parser is used to parse a document that | Leonard Richardson | |
looks like XML but not XHTML. [bug=1939121] | |||
2021-10-24 | Used a warning to formally deprecate the 'text' argument in favor of 'string'. | Leonard Richardson | |
2021-10-23 | Changing find* tests to use string instead of text, except for one test that ↵ | Leonard Richardson | |
specifically checks that text is an alias for string. | |||
2021-10-23 | Added a workaround for an lxml bug ↵ | Leonard Richardson | |
(https://bugs.launchpad.net/lxml/+bug/1948551) that caused problems when parsing a Unicode string beginning with BYTE ORDER MARK. [bug=1947768] | |||
2021-10-23 | Fixed a crash when overriding multi_valued_attributes and using the | Leonard Richardson | |
html5lib parser. [bug=1948488] | |||
2021-10-11 | Added special string classes, RubyParenthesisString and RubyTextString, | Leonard Richardson | |
to make it possible to treat ruby text specially in get_text() calls. [bug=1941980] | |||
2021-10-11 | More test refactoring. | Leonard Richardson | |
2021-10-11 | Broke up some monolithic unit test files. | Leonard Richardson | |
2021-10-11 | Moved the test classes to tests/__init__.py. | Leonard Richardson | |
2021-10-09 | Moved testing.py into the same package as the tests. | Leonard Richardson | |
2021-09-12 | Ported unit tests to use pytest. | Leonard Richardson | |
2021-09-07 | Goodbye, Python 2. [bug=1942919] | Leonard Richardson | |
2021-06-01 | The 'replace_with()' method now takes a variable number of arguments, | Leonard Richardson | |
and can be used to replace a single element with a sequence of elements. Patch by Bill Chandos. | |||
2021-05-31 | The html.parser tree builder can now handles named entities | Leonard Richardson | |
found in the HTML5 spec in much the same way that the html5lib tree builder does. Note that the lxml tree builder still handles named entities differently. [bug=1924908] | |||
2021-04-08 | Brought in fuzz tests from the oss-project into Beautiful Soup's unit test ↵ | Leonard Richardson | |
suite. | |||
2021-02-14 | NavigableString and its subclasses now implement the get_text() | Leonard Richardson | |
method, as well as the properties .strings and .stripped_strings. These methods will either return the string itself, or nothing, so the only reason to use this is when iterating over a list of mixed Tag and NavigableString objects. [bug=1904309] | |||
2021-02-14 | The 'html5' formatter now treats attributes whose values are the | Leonard Richardson | |
empty string as HTML boolean attributes. Previously (and in other formatters), an attribute value must be set as None to be treated as a boolean attribute. In a future release, I plan to also give this behavior to the 'html' formatter. Patch by Isaac Muse. [bug=1915424] | |||
2021-02-13 | The behavior of methods like .get_text() and .strings now differs | Leonard Richardson | |
depending on the type of tag. The change is visible with HTML tags like <script>, <style>, and <template>. Starting in 4.9.0, methods like get_text() returned no results on such tags, because the contents of those tags are not considered 'text' within the document as a whole. But a user who calls script.get_text() is working from a different definition of 'text' than a user who calls div.get_text()--otherwise there would be no need to call script.get_text() at all. In 4.10.0, the contents of (e.g.) a <script> tag are considered 'text' during a get_text() call on the tag itself, but not considered 'text' during a get_text() call on the tag's parent. Because of this change, calling get_text() on each child of a tag may now return a different result than calling get_text() on the tag itself. That's because different tags now have different understandings of what counts as 'text'. [bug=1906226] [bug=1868861] | |||
2021-02-13 | Corrected the use of special string container classes in cases when a | Leonard Richardson | |
single tag may contain strings with different containers; such as the <template> tag, which may contain both TemplateString objects and Comment objects. [bug=1913406] | |||
2021-02-13 | Added a second way to pass specify encodings to UnicodeDammit and | Leonard Richardson | |
EncodingDetector, based on the order of precedence defined in the HTML5 spec, starting at: https://html.spec.whatwg.org/multipage/parsing.html#parsing-with-a-known-character-encoding Encodings in 'known_definite_encodings' are tried first, then byte-order-mark sniffing is run, then encodings in 'user_encodings' are tried. The old argument, 'override_encodings', is now a deprecated alias for 'known_definite_encodings'. This changes the default behavior of the html.parser and lxml tree builders, in a way that may slightly improve encoding detection but will probably have no effect. [bug=1889014] | |||
2021-02-13 | Improve the warning issued when a directory name (as opposed to | Leonard Richardson | |
the name of a regular file) is passed as markup into the BeautifulSoup constructor. [bug=1913628] | |||
2021-02-13 | Corrected output when the namespace prefix associated with a | Leonard Richardson | |
namespaced attribute is the empty string, as opposed to None. [bug=1915583] | |||
2020-09-26 | Fixed a bug that inconsistently moved elements over when passing | Leonard Richardson | |
a Tag, rather than a list, into Tag.extend(). [bug=1885710] | |||
2020-05-17 | Documented some recently added customization features. | Leonard Richardson | |
2020-05-17 | Added a keyword argument on_duplicate_attribute to the | Leonard Richardson | |
BeautifulSoupHTMLParser constructor (used by the html.parser tree builder) which lets you customize the handling of markup that contains the same attribute more than once, as in: <a href="url1" href="url2"> [bug=1878209] | |||
2020-04-21 | Added two distinct UserWarning subclasses for warnings issued from the ↵ | Leonard Richardson | |
BeautifulSoup constructor which a caller may want to filter out. [bug=1873787] | |||
2020-04-12 | Fixed test failures when run against soupselect 2.0. Patch by Tomáš | Leonard Richardson | |
Chvátal. [bug=1872279] | |||
2020-04-05 | Embedded CSS and Javascript is now stored in distinct Stylesheet and | Leonard Richardson | |
Script tags, which are ignored by methods like get_text(). This feature is not supported by the html5lib treebuilder. [bug=1868861] | |||
2020-01-01 | API CHANGE - Added PageElement.decomposed, a new property which lets you | Leonard Richardson | |
check whether you've already called decompose() on a Tag or NavigableString. | |||
2019-12-29 | Fixed an unhandled exception when formatting a Tag that had been ↵ | Leonard Richardson | |
decomposed.[bug=1857767] | |||
2019-10-05 | Avoid a crash when unpickling certain parse trees generated using html5lib ↵ | Leonard Richardson | |
on Python 3. [bug=1843545] | |||
2019-09-02 | Avoid a crash when trying to detect the declared encoding of a | Leonard Richardson | |
Unicode document. Raise an explanatory exception when the underlying parser completely rejects the incoming markup. [bug=1838877] | |||
2019-08-26 | It's now possible to override any of the element classes. | Leonard Richardson | |
2019-08-22 | Test the ability to build a tree using objects other than Tag and ↵ | Leonard Richardson | |
NavigableString. | |||
2019-08-21 | Copying a Tag preserves information that was originally obtained from | Leonard Richardson | |
the TreeBuilder used to build the original Tag. [bug=1838903] | |||
2019-08-21 | Fixed a crash when pretty-printing tags that were not created | Leonard Richardson | |
during initial parsing. [bug=1838903] | |||
2019-07-21 | Implemented line number tracking for html5lib. | Leonard Richardson | |