diff options
author | Leonard Richardson <leonardr@segfault.org> | 2014-12-08 22:02:34 -0500 |
---|---|---|
committer | Leonard Richardson <leonardr@segfault.org> | 2014-12-08 22:02:34 -0500 |
commit | 8b1dd38e165d211d904d7143ea5042f26353bdb5 (patch) | |
tree | 53f22edc5e87e4f507cf197812e79c5304254c34 | |
parent | 5a96d2906fcd21eaf5ef86228edb9647a01e828c (diff) |
Rephrased the 'you need a parser' section to cover today's more common BS3 porting environments. [bug=1370364]
-rw-r--r-- | doc/source/index.rst | 12 |
1 files changed, 6 insertions, 6 deletions
diff --git a/doc/source/index.rst b/doc/source/index.rst index 775c3e1..5d067ea 100644 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -2899,12 +2899,12 @@ deprecated and removed in Python 3.0. Beautiful Soup 4 uses ``html.parser`` by default, but you can plug in lxml or html5lib and use that instead. See `Installing a parser`_ for a comparison. -Since ``html.parser`` is not the same parser as ``SGMLParser``, it -will treat invalid markup differently. Usually the "difference" is -that ``html.parser`` crashes. In that case, you'll need to install -another parser. But sometimes ``html.parser`` just creates a different -parse tree than ``SGMLParser`` would. If this happens, you may need to -update your BS3 scraping code to deal with the new tree. +Since ``html.parser`` is not the same parser as ``SGMLParser``, you +may find that Beautiful Soup 4 gives you a different parse tree than +Beautiful Soup 3 for the same markup. If you swap out ``html.parser`` +for lxml or html5lib, you may find that the parse tree changes yet +again. If this happens, you'll need to update your scraping code to +deal with the new tree. Method names ^^^^^^^^^^^^ |