diff options
Diffstat (limited to 'doc/source')
-rw-r--r-- | doc/source/index.rst | 49 |
1 files changed, 33 insertions, 16 deletions
diff --git a/doc/source/index.rst b/doc/source/index.rst index 8258e97..56aa7fe 100644 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -166,12 +166,16 @@ Installing Beautiful Soup If you're using a recent version of Debian or Ubuntu Linux, you can install Beautiful Soup with the system package manager: -:kbd:`$ apt-get install python-bs4` +:kbd:`$ apt-get install python-bs4` (for Python 2) + +:kbd:`$ apt-get install python3-bs4` (for Python 3) Beautiful Soup 4 is published through PyPi, so if you can't install it with the system packager, you can install it with ``easy_install`` or ``pip``. The package name is ``beautifulsoup4``, and the same package -works on Python 2 and Python 3. +works on Python 2 and Python 3. Make sure you use the right version of +``pip`` or ``easy_install`` for your Python version (these may be named +``pip3`` and ``easy_install3`` respectively if you're using Python 3). :kbd:`$ easy_install beautifulsoup4` @@ -298,7 +302,8 @@ constructor. You can pass in a string or an open filehandle:: from bs4 import BeautifulSoup - soup = BeautifulSoup(open("index.html")) + with open("index.html") as fp: + soup = BeautifulSoup(fp) soup = BeautifulSoup("<html>data</html>") @@ -355,34 +360,34 @@ Attributes ^^^^^^^^^^ A tag may have any number of attributes. The tag ``<b -class="boldest">`` has an attribute "class" whose value is +id="boldest">`` has an attribute "id" whose value is "boldest". You can access a tag's attributes by treating the tag like a dictionary:: - tag['class'] + tag['id'] # u'boldest' You can access that dictionary directly as ``.attrs``:: tag.attrs - # {u'class': u'boldest'} + # {u'id': 'boldest'} You can add, remove, and modify a tag's attributes. Again, this is done by treating the tag as a dictionary:: - tag['class'] = 'verybold' - tag['id'] = 1 + tag['id'] = 'verybold' + tag['another-attribute'] = 1 tag - # <blockquote class="verybold" id="1">Extremely bold</blockquote> + # <b another-attribute="1" id="verybold"></b> - del tag['class'] del tag['id'] + del tag['another-attribute'] tag - # <blockquote>Extremely bold</blockquote> + # <b></b> - tag['class'] - # KeyError: 'class' - print(tag.get('class')) + tag['id'] + # KeyError: 'id' + print(tag.get('id')) # None .. _multivalue: @@ -1045,7 +1050,7 @@ A regular expression ^^^^^^^^^^^^^^^^^^^^ If you pass in a regular expression object, Beautiful Soup will filter -against that regular expression using its ``match()`` method. This code +against that regular expression using its ``search()`` method. This code finds all the tags whose names start with the letter "b"; in this case, the <body> tag and the <b> tag:: @@ -1257,6 +1262,17 @@ dictionary and passing the dictionary into ``find_all()`` as the data_soup.find_all(attrs={"data-foo": "value"}) # [<div data-foo="value">foo!</div>] +You can't use a keyword argument to search for HTML's 'name' element, +because Beautiful Soup uses the ``name`` argument to contain the name +of the tag itself. Instead, you can give a value to 'name' in the +``attrs`` argument. + + name_soup = BeautifulSoup('<input name="email"/>') + name_soup.find_all(name="email") + # [] + name_soup.find_all(attrs={"name": "email"}) + # [<input name="email"/>] + .. _attrs: Searching by CSS class @@ -2776,7 +2792,8 @@ you how different parsers handle the document, and tell you if you're missing a parser that Beautiful Soup could be using:: from bs4.diagnose import diagnose - data = open("bad.html").read() + with open("bad.html") as fp: + data = fp.read() diagnose(data) # Diagnostic running on Beautiful Soup 4.2.0 |