Age | Commit message (Collapse) | Author | |
---|---|---|---|
2022-04-10 | Install more dependencies before running the pre-release tests. | Leonard Richardson | |
2022-04-08 | Some cleanup work to get more consistent and complete about what gets ↵ | Leonard Richardson | |
packaged with the Beautiful Soup release. | |||
2022-04-07 | Incremented version number in documentation. | Leonard Richardson | |
2022-04-07 | Redid the increasingly irrelevant test-all-versions script to use pytest. | Leonard Richardson | |
2022-04-07 | Omit untrusted input when issuing warnings. | Leonard Richardson | |
2021-12-22 | Corrected error in documentation (patch by Frank Dana). | Leonard Richardson | |
2021-12-22 | Correct documentation on parser differences | FeRD (Frank Dana) | |
2021-12-21 | Added a bit about not modifying the .contents list directly. | Leonard Richardson | |
2021-12-21 | Corrected typo. | Leonard Richardson | |
2021-12-21 | Standardized the wording of the MarkupResemblesLocatorWarning | Leonard Richardson | |
warnings to to make them less judgemental about what you ought to be doing. [bug=1955450] | |||
2021-12-21 | Fixed typo in documentation spotted by a reader. | Leonard Richardson | |
2021-12-21 | I guess that's not a method. | Leonard Richardson | |
2021-12-21 | It's now possible to customize the way output is indented by | Leonard Richardson | |
providing a value for the 'indent' argument to the Formatter constructor. The 'indent' argument works very similarly to the argument of the same name in the Python standard library's json.dump() method. [bug=1955497] | |||
2021-12-19 | Remove a huge list of HTML entities that was only necessary under Python 2. | Leonard Richardson | |
2021-12-19 | Removed support for the iconv_codec library, which doesn't seem | Leonard Richardson | |
to exist anymore and was never put up on PyPI. (The closest replacement on PyPI, iconv_codecs, is GPL-licensed, so we can't use it.) | |||
2021-12-19 | If the charset-normalizer Python module | Leonard Richardson | |
(https://pypi.org/project/charset-normalizer/) is installed, Beautiful Soup will use it to detect the character sets of incoming documents. This is also the module used by newer versions of the Requests library. For the sake of backwards compatibility, chardet and cchardet both take precedence if installed. [bug=1955346] | |||
2021-12-17 | Fix a crash when pickling a BeautifulSoup object that has no | Leonard Richardson | |
tree builder. [bug=1934003] | |||
2021-11-29 | Do a better job of keeping track of namespaces as an XML document is | Leonard Richardson | |
parsed, so that CSS selectors that use namespaces will do the right thing more often. [bug=1946243] | |||
2021-10-24 | Added test of warn_if_markup_looks_like_xml. | Leonard Richardson | |
2021-10-24 | Issue a warning when an HTML parser is used to parse a document that | Leonard Richardson | |
looks like XML but not XHTML. [bug=1939121] | |||
2021-10-24 | Used a warning to formally deprecate the 'text' argument in favor of 'string'. | Leonard Richardson | |
2021-10-23 | Changing find* tests to use string instead of text, except for one test that ↵ | Leonard Richardson | |
specifically checks that text is an alias for string. | |||
2021-10-23 | Renamed the 'text' field to 'string' for real. Tests are not changed in this ↵ | Leonard Richardson | |
commit to demonstrate that the renaming doesn't break anything. [bug=1947038] | |||
2021-10-23 | Added a workaround for an lxml bug ↵ | Leonard Richardson | |
(https://bugs.launchpad.net/lxml/+bug/1948551) that caused problems when parsing a Unicode string beginning with BYTE ORDER MARK. [bug=1947768] | |||
2021-10-23 | Fixed a crash when overriding multi_valued_attributes and using the | Leonard Richardson | |
html5lib parser. [bug=1948488] | |||
2021-10-23 | Fix a Python 3-specific problem in diagnose.lxml_trace. | Leonard Richardson | |
2021-10-11 | Removed redundant and nonworking argument from example code. [bug=1946243] | Leonard Richardson | |
2021-10-11 | Added special string classes, RubyParenthesisString and RubyTextString, | Leonard Richardson | |
to make it possible to treat ruby text specially in get_text() calls. [bug=1941980] | |||
2021-10-11 | More test refactoring. | Leonard Richardson | |
2021-10-11 | Broke up some monolithic unit test files. | Leonard Richardson | |
2021-10-11 | Moved the test classes to tests/__init__.py. | Leonard Richardson | |
2021-10-09 | Moved testing.py into the same package as the tests. | Leonard Richardson | |
2021-10-09 | Changed HTTP URLs to HTTPS. | Leonard Richardson | |
2021-09-12 | Ported unit tests to use pytest. | Leonard Richardson | |
2021-09-07 | Updated release instructions following the 4.10.0 release. | Leonard Richardson | |
2021-09-07 | Goodbye, Python 2. [bug=1942919] | Leonard Richardson | |
2021-06-01 | The 'replace_with()' method now takes a variable number of arguments, | Leonard Richardson | |
and can be used to replace a single element with a sequence of elements. Patch by Bill Chandos. | |||
2021-05-31 | The html.parser tree builder can now handles named entities | Leonard Richardson | |
found in the HTML5 spec in much the same way that the html5lib tree builder does. Note that the lxml tree builder still handles named entities differently. [bug=1924908] | |||
2021-04-08 | Brought in fuzz tests from the oss-project into Beautiful Soup's unit test ↵ | Leonard Richardson | |
suite. | |||
2021-02-14 | NavigableString and its subclasses now implement the get_text() | Leonard Richardson | |
method, as well as the properties .strings and .stripped_strings. These methods will either return the string itself, or nothing, so the only reason to use this is when iterating over a list of mixed Tag and NavigableString objects. [bug=1904309] | |||
2021-02-14 | The 'html5' formatter now treats attributes whose values are the | Leonard Richardson | |
empty string as HTML boolean attributes. Previously (and in other formatters), an attribute value must be set as None to be treated as a boolean attribute. In a future release, I plan to also give this behavior to the 'html' formatter. Patch by Isaac Muse. [bug=1915424] | |||
2021-02-13 | The behavior of methods like .get_text() and .strings now differs | Leonard Richardson | |
depending on the type of tag. The change is visible with HTML tags like <script>, <style>, and <template>. Starting in 4.9.0, methods like get_text() returned no results on such tags, because the contents of those tags are not considered 'text' within the document as a whole. But a user who calls script.get_text() is working from a different definition of 'text' than a user who calls div.get_text()--otherwise there would be no need to call script.get_text() at all. In 4.10.0, the contents of (e.g.) a <script> tag are considered 'text' during a get_text() call on the tag itself, but not considered 'text' during a get_text() call on the tag's parent. Because of this change, calling get_text() on each child of a tag may now return a different result than calling get_text() on the tag itself. That's because different tags now have different understandings of what counts as 'text'. [bug=1906226] [bug=1868861] | |||
2021-02-13 | Corrected the use of special string container classes in cases when a | Leonard Richardson | |
single tag may contain strings with different containers; such as the <template> tag, which may contain both TemplateString objects and Comment objects. [bug=1913406] | |||
2021-02-13 | Added a second way to pass specify encodings to UnicodeDammit and | Leonard Richardson | |
EncodingDetector, based on the order of precedence defined in the HTML5 spec, starting at: https://html.spec.whatwg.org/multipage/parsing.html#parsing-with-a-known-character-encoding Encodings in 'known_definite_encodings' are tried first, then byte-order-mark sniffing is run, then encodings in 'user_encodings' are tried. The old argument, 'override_encodings', is now a deprecated alias for 'known_definite_encodings'. This changes the default behavior of the html.parser and lxml tree builders, in a way that may slightly improve encoding detection but will probably have no effect. [bug=1889014] | |||
2021-02-13 | Performance improvement when processing tags that speeds up overall | Leonard Richardson | |
tree construction by 2%. Patch by Morotti. [bug=1899358] | |||
2021-02-13 | Improve the warning issued when a directory name (as opposed to | Leonard Richardson | |
the name of a regular file) is passed as markup into the BeautifulSoup constructor. [bug=1913628] | |||
2021-02-13 | Corrected output when the namespace prefix associated with a | Leonard Richardson | |
namespaced attribute is the empty string, as opposed to None. [bug=1915583] | |||
2020-10-24 | Exclude more tests from the package. Patch by Ville Skyttä. | Leonard Richardson | |
2020-10-24 | Fix tests install exclusion | Ville Skyttä | |
2020-10-03 | I always forget to bump the version number in the doc. | Leonard Richardson | |