summaryrefslogtreecommitdiff
path: root/doc/source/index.rst
diff options
context:
space:
mode:
Diffstat (limited to 'doc/source/index.rst')
-rwxr-xr-xdoc/source/index.rst49
1 files changed, 19 insertions, 30 deletions
diff --git a/doc/source/index.rst b/doc/source/index.rst
index a733b66..33901db 100755
--- a/doc/source/index.rst
+++ b/doc/source/index.rst
@@ -241,10 +241,9 @@ This table summarizes the advantages and disadvantages of each parser library:
+----------------------+--------------------------------------------+--------------------------------+--------------------------+
| Python's html.parser | ``BeautifulSoup(markup, "html.parser")`` | * Batteries included | * Not as fast as lxml, |
| | | * Decent speed | less lenient than |
-| | | * Lenient (As of Python 3.2) | html5lib |
+| | | | html5lib. |
+----------------------+--------------------------------------------+--------------------------------+--------------------------+
| lxml's HTML parser | ``BeautifulSoup(markup, "lxml")`` | * Very fast | * External C dependency |
-| | | * Lenient | |
+----------------------+--------------------------------------------+--------------------------------+--------------------------+
| lxml's XML parser | ``BeautifulSoup(markup, "lxml-xml")`` | * Very fast | * External C dependency |
| | ``BeautifulSoup(markup, "xml")`` | * The only currently supported | |
@@ -1525,9 +1524,10 @@ very useful.
Calling a tag is like calling ``find_all()``
--------------------------------------------
-For convenience, calling a :py:class:`BeautifulSoup` object or :py:class:`Tag`
-object as a function is equivalent to calling ``find_all()`` (if no built-in
-method has that name). These two lines of code are equivalent::
+For convenience, calling a :py:class:`BeautifulSoup` object or
+:py:class:`Tag` object as a function is equivalent to calling
+``find_all()`` (if no built-in method has the name of the tag you're
+looking for). These two lines of code are equivalent::
soup.find_all("a")
soup("a")
@@ -1585,8 +1585,8 @@ I spent a lot of time above covering ``find_all()`` and
``find()``. The Beautiful Soup API defines ten other methods for
searching the tree, but don't be afraid. Five of these methods are
basically the same as ``find_all()``, and the other five are basically
-the same as ``find()``. The only differences are in how their search
-through the tree (the search `axis`).
+the same as ``find()``. The only differences are in how they move from
+one part of the tree to another.
First let's consider ``find_parents()`` and
``find_parent()``. Remember that ``find_all()`` and ``find()`` work
@@ -2230,7 +2230,7 @@ in Beautiful Soup 4.10.0.`
``wrap()``
----------
-``PageElement.wrap()`` wraps an element in the tag object you specify. It
+``PageElement.wrap()`` wraps an element in the :py:class`Tag` object you specify. It
returns the new wrapper::
soup = BeautifulSoup("<p>I wish I was bold.</p>", 'html.parser')
@@ -2483,7 +2483,7 @@ occur in a string object or an attribute value::
# A LINK
# </a>
-Here's a formatter that increases the indentation width when pretty-printing::
+Here's a formatter that increases the indentation width when pretty-printing::
formatter = HTMLFormatter(indent=8)
print(link_soup.a.prettify(formatter=formatter))
@@ -3279,25 +3279,14 @@ unexpected behavior, where a Beautiful Soup parse tree looks a lot
different than the document used to create it.
These problems are almost never problems with Beautiful Soup itself.
-This is not because Beautiful Soup is an amazingly well-written
-piece of software. It's because Beautiful Soup doesn't include any
-parsing code. Instead, it relies on external parsers. If one parser
-isn't working on a certain document, the best solution is to try a
-different parser. See `Installing a parser`_ for details and a parser
-comparison.
-
-The most common parse errors are ``HTMLParser.HTMLParseError:
-malformed start tag`` and ``HTMLParser.HTMLParseError: bad end
-tag``. These are both generated by Python's built-in HTML parser
-library, and the solution is to :ref:`install lxml or
-html5lib. <parser-installation>`
-
-The most common type of unexpected behavior is that you can't find a
-tag that you know is in the document. You saw it going in, but
-``find_all()`` returns ``[]`` or ``find()`` returns ``None``. This is
-another common problem with Python's built-in HTML parser, which
-sometimes skips tags it doesn't understand. Again, the best solution is to
-:ref:`install lxml or html5lib. <parser-installation>`
+This is not because Beautiful Soup is an amazingly well-written piece
+of software. It's because Beautiful Soup doesn't include any parsing
+code. Instead, it relies on external parsers. If one parser isn't
+working on a certain document, the best solution is to try a different
+parser. See `Installing a parser`_ for details and a parser
+comparison. If this doesn't help, you might need to inspect the
+document tree found inside the ``BeautifulSoup`` object, to see where
+the markup you're looking for actually ended up.
Version mismatch problems
-------------------------
@@ -3313,12 +3302,12 @@ Version mismatch problems
Python 3 version of Beautiful Soup under Python 2.
* ``ImportError: No module named BeautifulSoup`` - Caused by running
- Beautiful Soup 3 code on a system that doesn't have BS3
+ Beautiful Soup 3 code in an environment that doesn't have BS3
installed. Or, by writing Beautiful Soup 4 code without knowing that
the package name has changed to ``bs4``.
* ``ImportError: No module named bs4`` - Caused by running Beautiful
- Soup 4 code on a system that doesn't have BS4 installed.
+ Soup 4 code in an environment that doesn't have BS4 installed.
.. _parsing-xml: