>>> soup.find(text="bad")
u'bad'
>>> soup.i
HTML
>>> soup = BeautifulSoup("SomebadXML", "xml")
>>> print soup.prettify()
Some
bad
XML
= About Beautiful Soup 4 =
This is a nearly-complete rewrite that removes Beautiful Soup's custom
HTML parser in favor of a system that lets you write a little glue
code and plug in any HTML or XML parser you want.
Beautiful Soup 4.0 comes with glue code for four parsers:
* Python's standard HTMLParser (html.parser in Python 3)
* lxml's HTML and XML parsers
* html5lib's HTML parser
HTMLParser is the default, but I recommend you install one of the
other parsers, or you'll have problems handling real-world markup.
For complete documentation, see the Sphinx documentation in
docs/source. What follows is a summary of the changes from Beautiful
Soup 3.
== The module name has changed ==
Previously you imported the BeautifulSoup class from a module also
called BeautifulSoup. To save keystrokes and make it clear which
version of the API is in use, the module is now called 'bs4':
>>> from bs4 import BeautifulSoup
== It works with Python 3 ==
Beautiful Soup 3.1.0 worked with Python 3, but the parser it used was
so bad that it barely worked at all. Beautiful Soup 4 works with
Python 3, and since its parser is pluggable, you don't sacrifice
quality.
Special thanks to Thomas Kluyver and Ezio Melotti for getting Python 3
support to the finish line. Ezio Melotti is also to thank for greatly
improving the HTML parser that comes with Python 3.2.
== CDATA sections are normal text, if they're understood at all. ==
Currently, the lxml and html5lib HTML parsers ignore CDATA sections in
markup:
=>
A future version of html5lib will turn CDATA sections into text nodes,
but only within tags like