summaryrefslogtreecommitdiff
path: root/README.txt
blob: 8baa022b2b6c4f349b79e8b501fb5bcbb429b727 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
= About Beautiful Soup 4 =

Earlier versions of Beautiful Soup included a custom HTML
parser. Beautiful Soup 4 uses Python's default HTMLParser, which does
fairly poorly on real-world HTML. By installing lxml or html5lib you
can get more accurate parsing and possibly better performance as well.

= Introduction =

  >>> from bs4 import BeautifulSoup
  >>> soup = BeautifulSoup("<p>Some<b>bad<i>HTML")
  >>> print soup.prettify()
  <html>
   <body>
    <p>
     Some
     <b>
      bad
      <i>
       HTML
      </i>
     </b>
    </p>
   </body>
  </html>
  >>> soup.find(text="bad")
  u'bad'

  >>> soup.i
  <i>HTML</i>