summaryrefslogtreecommitdiff
path: root/doc/source
diff options
context:
space:
mode:
Diffstat (limited to 'doc/source')
-rw-r--r--doc/source/index.rst49
1 files changed, 33 insertions, 16 deletions
diff --git a/doc/source/index.rst b/doc/source/index.rst
index 8258e97..56aa7fe 100644
--- a/doc/source/index.rst
+++ b/doc/source/index.rst
@@ -166,12 +166,16 @@ Installing Beautiful Soup
If you're using a recent version of Debian or Ubuntu Linux, you can
install Beautiful Soup with the system package manager:
-:kbd:`$ apt-get install python-bs4`
+:kbd:`$ apt-get install python-bs4` (for Python 2)
+
+:kbd:`$ apt-get install python3-bs4` (for Python 3)
Beautiful Soup 4 is published through PyPi, so if you can't install it
with the system packager, you can install it with ``easy_install`` or
``pip``. The package name is ``beautifulsoup4``, and the same package
-works on Python 2 and Python 3.
+works on Python 2 and Python 3. Make sure you use the right version of
+``pip`` or ``easy_install`` for your Python version (these may be named
+``pip3`` and ``easy_install3`` respectively if you're using Python 3).
:kbd:`$ easy_install beautifulsoup4`
@@ -298,7 +302,8 @@ constructor. You can pass in a string or an open filehandle::
from bs4 import BeautifulSoup
- soup = BeautifulSoup(open("index.html"))
+ with open("index.html") as fp:
+ soup = BeautifulSoup(fp)
soup = BeautifulSoup("<html>data</html>")
@@ -355,34 +360,34 @@ Attributes
^^^^^^^^^^
A tag may have any number of attributes. The tag ``<b
-class="boldest">`` has an attribute "class" whose value is
+id="boldest">`` has an attribute "id" whose value is
"boldest". You can access a tag's attributes by treating the tag like
a dictionary::
- tag['class']
+ tag['id']
# u'boldest'
You can access that dictionary directly as ``.attrs``::
tag.attrs
- # {u'class': u'boldest'}
+ # {u'id': 'boldest'}
You can add, remove, and modify a tag's attributes. Again, this is
done by treating the tag as a dictionary::
- tag['class'] = 'verybold'
- tag['id'] = 1
+ tag['id'] = 'verybold'
+ tag['another-attribute'] = 1
tag
- # <blockquote class="verybold" id="1">Extremely bold</blockquote>
+ # <b another-attribute="1" id="verybold"></b>
- del tag['class']
del tag['id']
+ del tag['another-attribute']
tag
- # <blockquote>Extremely bold</blockquote>
+ # <b></b>
- tag['class']
- # KeyError: 'class'
- print(tag.get('class'))
+ tag['id']
+ # KeyError: 'id'
+ print(tag.get('id'))
# None
.. _multivalue:
@@ -1045,7 +1050,7 @@ A regular expression
^^^^^^^^^^^^^^^^^^^^
If you pass in a regular expression object, Beautiful Soup will filter
-against that regular expression using its ``match()`` method. This code
+against that regular expression using its ``search()`` method. This code
finds all the tags whose names start with the letter "b"; in this
case, the <body> tag and the <b> tag::
@@ -1257,6 +1262,17 @@ dictionary and passing the dictionary into ``find_all()`` as the
data_soup.find_all(attrs={"data-foo": "value"})
# [<div data-foo="value">foo!</div>]
+You can't use a keyword argument to search for HTML's 'name' element,
+because Beautiful Soup uses the ``name`` argument to contain the name
+of the tag itself. Instead, you can give a value to 'name' in the
+``attrs`` argument.
+
+ name_soup = BeautifulSoup('<input name="email"/>')
+ name_soup.find_all(name="email")
+ # []
+ name_soup.find_all(attrs={"name": "email"})
+ # [<input name="email"/>]
+
.. _attrs:
Searching by CSS class
@@ -2776,7 +2792,8 @@ you how different parsers handle the document, and tell you if you're
missing a parser that Beautiful Soup could be using::
from bs4.diagnose import diagnose
- data = open("bad.html").read()
+ with open("bad.html") as fp:
+ data = fp.read()
diagnose(data)
# Diagnostic running on Beautiful Soup 4.2.0