summaryrefslogtreecommitdiff
path: root/bs4/tests
AgeCommit message (Collapse)Author
2023-01-27Got rid of some more warnings by removing code that's not relevant anymore, ↵Leonard Richardson
now that the minimum supported Python version is 3.6.
2023-01-25Tag.interesting_string_types is now propagated when a tag isLeonard Richardson
copied. [bug=1990400]
2023-01-25Made the ISO-8859 test robust in a less hacky way.Leonard Richardson
2023-01-25Made the ISO-8859-1 smoke test more robust.Leonard Richardson
2023-01-25The HTMLFormatter and XMLFormatter constructors no longer return aLeonard Richardson
value. [bug=1992693]
2023-01-25Passing a Tag's .contents into PageElement.extend() now works theLeonard Richardson
same way as passing the Tag itself.
2022-05-15Fixed a test failure when cchardet is not installed butLeonard Richardson
charset_normalizer is. [bug=1973072]
2022-04-10Fixed another crash when overriding multi_valued_attributes and using theLeonard Richardson
html5lib parser. [bug=1948488]
2022-04-07Omit untrusted input when issuing warnings.Leonard Richardson
2021-12-21It's now possible to customize the way output is indented byLeonard Richardson
providing a value for the 'indent' argument to the Formatter constructor. The 'indent' argument works very similarly to the argument of the same name in the Python standard library's json.dump() method. [bug=1955497]
2021-12-17Fix a crash when pickling a BeautifulSoup object that has noLeonard Richardson
tree builder. [bug=1934003]
2021-11-29Do a better job of keeping track of namespaces as an XML document isLeonard Richardson
parsed, so that CSS selectors that use namespaces will do the right thing more often. [bug=1946243]
2021-10-24Added test of warn_if_markup_looks_like_xml.Leonard Richardson
2021-10-24Issue a warning when an HTML parser is used to parse a document thatLeonard Richardson
looks like XML but not XHTML. [bug=1939121]
2021-10-24Used a warning to formally deprecate the 'text' argument in favor of 'string'.Leonard Richardson
2021-10-23Changing find* tests to use string instead of text, except for one test that ↵Leonard Richardson
specifically checks that text is an alias for string.
2021-10-23Added a workaround for an lxml bug ↵Leonard Richardson
(https://bugs.launchpad.net/lxml/+bug/1948551) that caused problems when parsing a Unicode string beginning with BYTE ORDER MARK. [bug=1947768]
2021-10-23Fixed a crash when overriding multi_valued_attributes and using theLeonard Richardson
html5lib parser. [bug=1948488]
2021-10-11Added special string classes, RubyParenthesisString and RubyTextString,Leonard Richardson
to make it possible to treat ruby text specially in get_text() calls. [bug=1941980]
2021-10-11More test refactoring.Leonard Richardson
2021-10-11Broke up some monolithic unit test files.Leonard Richardson
2021-10-11Moved the test classes to tests/__init__.py.Leonard Richardson
2021-10-09Moved testing.py into the same package as the tests.Leonard Richardson
2021-09-12Ported unit tests to use pytest.Leonard Richardson
2021-09-07Goodbye, Python 2. [bug=1942919]Leonard Richardson
2021-06-01The 'replace_with()' method now takes a variable number of arguments,Leonard Richardson
and can be used to replace a single element with a sequence of elements. Patch by Bill Chandos.
2021-05-31The html.parser tree builder can now handles named entitiesLeonard Richardson
found in the HTML5 spec in much the same way that the html5lib tree builder does. Note that the lxml tree builder still handles named entities differently. [bug=1924908]
2021-04-08Brought in fuzz tests from the oss-project into Beautiful Soup's unit test ↵Leonard Richardson
suite.
2021-02-14NavigableString and its subclasses now implement the get_text()Leonard Richardson
method, as well as the properties .strings and .stripped_strings. These methods will either return the string itself, or nothing, so the only reason to use this is when iterating over a list of mixed Tag and NavigableString objects. [bug=1904309]
2021-02-14The 'html5' formatter now treats attributes whose values are theLeonard Richardson
empty string as HTML boolean attributes. Previously (and in other formatters), an attribute value must be set as None to be treated as a boolean attribute. In a future release, I plan to also give this behavior to the 'html' formatter. Patch by Isaac Muse. [bug=1915424]
2021-02-13The behavior of methods like .get_text() and .strings now differsLeonard Richardson
depending on the type of tag. The change is visible with HTML tags like <script>, <style>, and <template>. Starting in 4.9.0, methods like get_text() returned no results on such tags, because the contents of those tags are not considered 'text' within the document as a whole. But a user who calls script.get_text() is working from a different definition of 'text' than a user who calls div.get_text()--otherwise there would be no need to call script.get_text() at all. In 4.10.0, the contents of (e.g.) a <script> tag are considered 'text' during a get_text() call on the tag itself, but not considered 'text' during a get_text() call on the tag's parent. Because of this change, calling get_text() on each child of a tag may now return a different result than calling get_text() on the tag itself. That's because different tags now have different understandings of what counts as 'text'. [bug=1906226] [bug=1868861]
2021-02-13Corrected the use of special string container classes in cases when aLeonard Richardson
single tag may contain strings with different containers; such as the <template> tag, which may contain both TemplateString objects and Comment objects. [bug=1913406]
2021-02-13Added a second way to pass specify encodings to UnicodeDammit andLeonard Richardson
EncodingDetector, based on the order of precedence defined in the HTML5 spec, starting at: https://html.spec.whatwg.org/multipage/parsing.html#parsing-with-a-known-character-encoding Encodings in 'known_definite_encodings' are tried first, then byte-order-mark sniffing is run, then encodings in 'user_encodings' are tried. The old argument, 'override_encodings', is now a deprecated alias for 'known_definite_encodings'. This changes the default behavior of the html.parser and lxml tree builders, in a way that may slightly improve encoding detection but will probably have no effect. [bug=1889014]
2021-02-13Improve the warning issued when a directory name (as opposed toLeonard Richardson
the name of a regular file) is passed as markup into the BeautifulSoup constructor. [bug=1913628]
2021-02-13Corrected output when the namespace prefix associated with aLeonard Richardson
namespaced attribute is the empty string, as opposed to None. [bug=1915583]
2020-09-26Fixed a bug that inconsistently moved elements over when passingLeonard Richardson
a Tag, rather than a list, into Tag.extend(). [bug=1885710]
2020-05-17Documented some recently added customization features.Leonard Richardson
2020-05-17Added a keyword argument on_duplicate_attribute to theLeonard Richardson
BeautifulSoupHTMLParser constructor (used by the html.parser tree builder) which lets you customize the handling of markup that contains the same attribute more than once, as in: <a href="url1" href="url2"> [bug=1878209]
2020-04-21Added two distinct UserWarning subclasses for warnings issued from the ↵Leonard Richardson
BeautifulSoup constructor which a caller may want to filter out. [bug=1873787]
2020-04-12Fixed test failures when run against soupselect 2.0. Patch by TomášLeonard Richardson
Chvátal. [bug=1872279]
2020-04-05Embedded CSS and Javascript is now stored in distinct Stylesheet andLeonard Richardson
Script tags, which are ignored by methods like get_text(). This feature is not supported by the html5lib treebuilder. [bug=1868861]
2020-01-01API CHANGE - Added PageElement.decomposed, a new property which lets youLeonard Richardson
check whether you've already called decompose() on a Tag or NavigableString.
2019-12-29Fixed an unhandled exception when formatting a Tag that had been ↵Leonard Richardson
decomposed.[bug=1857767]
2019-10-05Avoid a crash when unpickling certain parse trees generated using html5lib ↵Leonard Richardson
on Python 3. [bug=1843545]
2019-09-02Avoid a crash when trying to detect the declared encoding of aLeonard Richardson
Unicode document. Raise an explanatory exception when the underlying parser completely rejects the incoming markup. [bug=1838877]
2019-08-26It's now possible to override any of the element classes.Leonard Richardson
2019-08-22Test the ability to build a tree using objects other than Tag and ↵Leonard Richardson
NavigableString.
2019-08-21Copying a Tag preserves information that was originally obtained fromLeonard Richardson
the TreeBuilder used to build the original Tag. [bug=1838903]
2019-08-21Fixed a crash when pretty-printing tags that were not createdLeonard Richardson
during initial parsing. [bug=1838903]
2019-07-21Implemented line number tracking for html5lib.Leonard Richardson