diff options
-rw-r--r-- | CHANGELOG | 5 | ||||
-rw-r--r-- | bs4/tests/__init__.py | 2 | ||||
-rwxr-xr-x | doc/source/index.rst | 49 |
3 files changed, 24 insertions, 32 deletions
@@ -1,4 +1,4 @@ -= 4.12.3 (?) += 4.12.3 (Unreleased) * Fixed a regression such that if you set .hidden on a tag, the tag becomes invisible but its contents are still visible. User manipulation @@ -12,6 +12,9 @@ * Corrected the syntax of the license definition in pyproject.toml. Patch by Louis Maddox. [bug=2032848] +* Corrected a typo in a test that was causing test failures when run against + libxml2 2.12.1. [bug=2045481] + = 4.12.2 (20230407) * Fixed an unhandled exception in BeautifulSoup.decode_contents diff --git a/bs4/tests/__init__.py b/bs4/tests/__init__.py index dbb1593..325affe 100644 --- a/bs4/tests/__init__.py +++ b/bs4/tests/__init__.py @@ -1105,7 +1105,7 @@ class XMLTreeBuilderSmokeTest(TreeBuilderSmokeTest): doc = """<?xml version="1.0" encoding="utf-8"?> <Document xmlns="http://example.com/ns0" xmlns:ns1="http://example.com/ns1" - xmlns:ns2="http://example.com/ns2" + xmlns:ns2="http://example.com/ns2"> <ns1:tag>foo</ns1:tag> <ns1:tag>bar</ns1:tag> <ns2:tag key="value">baz</ns2:tag> diff --git a/doc/source/index.rst b/doc/source/index.rst index a733b66..33901db 100755 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -241,10 +241,9 @@ This table summarizes the advantages and disadvantages of each parser library: +----------------------+--------------------------------------------+--------------------------------+--------------------------+ | Python's html.parser | ``BeautifulSoup(markup, "html.parser")`` | * Batteries included | * Not as fast as lxml, | | | | * Decent speed | less lenient than | -| | | * Lenient (As of Python 3.2) | html5lib | +| | | | html5lib. | +----------------------+--------------------------------------------+--------------------------------+--------------------------+ | lxml's HTML parser | ``BeautifulSoup(markup, "lxml")`` | * Very fast | * External C dependency | -| | | * Lenient | | +----------------------+--------------------------------------------+--------------------------------+--------------------------+ | lxml's XML parser | ``BeautifulSoup(markup, "lxml-xml")`` | * Very fast | * External C dependency | | | ``BeautifulSoup(markup, "xml")`` | * The only currently supported | | @@ -1525,9 +1524,10 @@ very useful. Calling a tag is like calling ``find_all()`` -------------------------------------------- -For convenience, calling a :py:class:`BeautifulSoup` object or :py:class:`Tag` -object as a function is equivalent to calling ``find_all()`` (if no built-in -method has that name). These two lines of code are equivalent:: +For convenience, calling a :py:class:`BeautifulSoup` object or +:py:class:`Tag` object as a function is equivalent to calling +``find_all()`` (if no built-in method has the name of the tag you're +looking for). These two lines of code are equivalent:: soup.find_all("a") soup("a") @@ -1585,8 +1585,8 @@ I spent a lot of time above covering ``find_all()`` and ``find()``. The Beautiful Soup API defines ten other methods for searching the tree, but don't be afraid. Five of these methods are basically the same as ``find_all()``, and the other five are basically -the same as ``find()``. The only differences are in how their search -through the tree (the search `axis`). +the same as ``find()``. The only differences are in how they move from +one part of the tree to another. First let's consider ``find_parents()`` and ``find_parent()``. Remember that ``find_all()`` and ``find()`` work @@ -2230,7 +2230,7 @@ in Beautiful Soup 4.10.0.` ``wrap()`` ---------- -``PageElement.wrap()`` wraps an element in the tag object you specify. It +``PageElement.wrap()`` wraps an element in the :py:class`Tag` object you specify. It returns the new wrapper:: soup = BeautifulSoup("<p>I wish I was bold.</p>", 'html.parser') @@ -2483,7 +2483,7 @@ occur in a string object or an attribute value:: # A LINK # </a> -Here's a formatter that increases the indentation width when pretty-printing:: +Here's a formatter that increases the indentation width when pretty-printing:: formatter = HTMLFormatter(indent=8) print(link_soup.a.prettify(formatter=formatter)) @@ -3279,25 +3279,14 @@ unexpected behavior, where a Beautiful Soup parse tree looks a lot different than the document used to create it. These problems are almost never problems with Beautiful Soup itself. -This is not because Beautiful Soup is an amazingly well-written -piece of software. It's because Beautiful Soup doesn't include any -parsing code. Instead, it relies on external parsers. If one parser -isn't working on a certain document, the best solution is to try a -different parser. See `Installing a parser`_ for details and a parser -comparison. - -The most common parse errors are ``HTMLParser.HTMLParseError: -malformed start tag`` and ``HTMLParser.HTMLParseError: bad end -tag``. These are both generated by Python's built-in HTML parser -library, and the solution is to :ref:`install lxml or -html5lib. <parser-installation>` - -The most common type of unexpected behavior is that you can't find a -tag that you know is in the document. You saw it going in, but -``find_all()`` returns ``[]`` or ``find()`` returns ``None``. This is -another common problem with Python's built-in HTML parser, which -sometimes skips tags it doesn't understand. Again, the best solution is to -:ref:`install lxml or html5lib. <parser-installation>` +This is not because Beautiful Soup is an amazingly well-written piece +of software. It's because Beautiful Soup doesn't include any parsing +code. Instead, it relies on external parsers. If one parser isn't +working on a certain document, the best solution is to try a different +parser. See `Installing a parser`_ for details and a parser +comparison. If this doesn't help, you might need to inspect the +document tree found inside the ``BeautifulSoup`` object, to see where +the markup you're looking for actually ended up. Version mismatch problems ------------------------- @@ -3313,12 +3302,12 @@ Version mismatch problems Python 3 version of Beautiful Soup under Python 2. * ``ImportError: No module named BeautifulSoup`` - Caused by running - Beautiful Soup 3 code on a system that doesn't have BS3 + Beautiful Soup 3 code in an environment that doesn't have BS3 installed. Or, by writing Beautiful Soup 4 code without knowing that the package name has changed to ``bs4``. * ``ImportError: No module named bs4`` - Caused by running Beautiful - Soup 4 code on a system that doesn't have BS4 installed. + Soup 4 code in an environment that doesn't have BS4 installed. .. _parsing-xml: |