blob: 8baa022b2b6c4f349b79e8b501fb5bcbb429b727 (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
|
= About Beautiful Soup 4 =
Earlier versions of Beautiful Soup included a custom HTML
parser. Beautiful Soup 4 uses Python's default HTMLParser, which does
fairly poorly on real-world HTML. By installing lxml or html5lib you
can get more accurate parsing and possibly better performance as well.
= Introduction =
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup("<p>Some<b>bad<i>HTML")
>>> print soup.prettify()
<html>
<body>
<p>
Some
<b>
bad
<i>
HTML
</i>
</b>
</p>
</body>
</html>
>>> soup.find(text="bad")
u'bad'
>>> soup.i
<i>HTML</i>
|