From 3ff7bde5d320fbec4c16e7f245c345e8455ca887 Mon Sep 17 00:00:00 2001 From: Leonard Richardson Date: Thu, 26 Apr 2012 07:32:53 -0400 Subject: Fixed test failure when lxml is not installed. --- doc/source/index.rst | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'doc/source/index.rst') diff --git a/doc/source/index.rst b/doc/source/index.rst index 734851d..5b65354 100644 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -2670,6 +2670,13 @@ deprecated and removed in Python 3.0. Beautiful Soup 4 uses ``html.parser`` by default, but you can plug in lxml or html5lib and use that instead. See `Installing a parser`_ for a comparison. +Since ``html.parser`` is not the same parser as ``SGMLParser``, it +will treat invalid markup differently. Usually the "difference" is +that ``html.parser`` crashes. In that case, you'll need to install +another parser. But sometimes ``html.parser`` just creates a different +parse tree than ``SGMLParser`` would. If this happens, you may need to +update your BS3 scraping code to deal with the new tree. + Method names ^^^^^^^^^^^^ -- cgit v1.2.3