summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--doc/source/index.rst213
1 files changed, 105 insertions, 108 deletions
diff --git a/doc/source/index.rst b/doc/source/index.rst
index a916413..4d580a1 100644
--- a/doc/source/index.rst
+++ b/doc/source/index.rst
@@ -295,140 +295,137 @@ and :py:class:`Comment`.
.. py:class:: Tag
-A :py:class:`Tag` object corresponds to an XML or HTML tag in the original document.
+ A :py:class:`Tag` object corresponds to an XML or HTML tag in the original document.
-::
+ ::
- soup = BeautifulSoup('<b class="boldest">Extremely bold</b>', 'html.parser')
- tag = soup.b
- type(tag)
- # <class 'bs4.element.Tag'>
+ soup = BeautifulSoup('<b class="boldest">Extremely bold</b>', 'html.parser')
+ tag = soup.b
+ type(tag)
+ # <class 'bs4.element.Tag'>
-Tags have a lot of attributes and methods, and I'll cover most of them
-in `Navigating the tree`_ and `Searching the tree`_. For now, the most
-important features of a tag are its name and attributes.
+ Tags have a lot of attributes and methods, and I'll cover most of them
+ in `Navigating the tree`_ and `Searching the tree`_. For now, the most
+ important features of a tag are its name and attributes.
-Name
-----
+ .. py:attribute:: name
-Every tag has a name, accessible as ``.name``::
+ Every tag has a name::
- tag.name
- # 'b'
+ tag.name
+ # 'b'
-If you change a tag's name, the change will be reflected in any HTML
-markup generated by Beautiful Soup::
+ If you change a tag's name, the change will be reflected in any
+ markup generated by Beautiful Soup down the line::
- tag.name = "blockquote"
- tag
- # <blockquote class="boldest">Extremely bold</blockquote>
+ tag.name = "blockquote"
+ tag
+ # <blockquote class="boldest">Extremely bold</blockquote>
-Attributes
-----------
+ .. py:attribute:: attrs
-A tag may have any number of attributes. The tag ``<b
-id="boldest">`` has an attribute "id" whose value is
-"boldest". You can access a tag's attributes by treating the tag like
-a dictionary::
+ An HTML or XML tag may have any number of attributes. The tag ``<b
+ id="boldest">`` has an attribute "id" whose value is
+ "boldest". You can access a tag's attributes by treating the tag like
+ a dictionary::
- tag = BeautifulSoup('<b id="boldest">bold</b>', 'html.parser').b
- tag['id']
- # 'boldest'
+ tag = BeautifulSoup('<b id="boldest">bold</b>', 'html.parser').b
+ tag['id']
+ # 'boldest'
-You can access that dictionary directly as ``.attrs``::
+ You can access the dictionary of attributes directly as ``.attrs``::
- tag.attrs
- # {'id': 'boldest'}
+ tag.attrs
+ # {'id': 'boldest'}
-You can add, remove, and modify a tag's attributes. Again, this is
-done by treating the tag as a dictionary::
+ You can add, remove, and modify a tag's attributes. Again, this is
+ done by treating the tag as a dictionary::
- tag['id'] = 'verybold'
- tag['another-attribute'] = 1
- tag
- # <b another-attribute="1" id="verybold"></b>
+ tag['id'] = 'verybold'
+ tag['another-attribute'] = 1
+ tag
+ # <b another-attribute="1" id="verybold"></b>
- del tag['id']
- del tag['another-attribute']
- tag
- # <b>bold</b>
+ del tag['id']
+ del tag['another-attribute']
+ tag
+ # <b>bold</b>
- tag['id']
- # KeyError: 'id'
- tag.get('id')
- # None
+ tag['id']
+ # KeyError: 'id'
+ tag.get('id')
+ # None
-.. _multivalue:
+ .. _multivalue:
-Multi-valued attributes
------------------------
+ Multi-valued attributes
+ -----------------------
-HTML 4 defines a few attributes that can have multiple values. HTML 5
-removes a couple of them, but defines a few more. The most common
-multi-valued attribute is ``class`` (that is, a tag can have more than
-one CSS class). Others include ``rel``, ``rev``, ``accept-charset``,
-``headers``, and ``accesskey``. By default, Beautiful Soup parses the value(s)
-of a multi-valued attribute into a list::
+ HTML 4 defines a few attributes that can have multiple values. HTML 5
+ removes a couple of them, but defines a few more. The most common
+ multi-valued attribute is ``class`` (that is, a tag can have more than
+ one CSS class). Others include ``rel``, ``rev``, ``accept-charset``,
+ ``headers``, and ``accesskey``. By default, Beautiful Soup parses the value(s)
+ of a multi-valued attribute into a list::
- css_soup = BeautifulSoup('<p class="body"></p>', 'html.parser')
- css_soup.p['class']
- # ['body']
+ css_soup = BeautifulSoup('<p class="body"></p>', 'html.parser')
+ css_soup.p['class']
+ # ['body']
- css_soup = BeautifulSoup('<p class="body strikeout"></p>', 'html.parser')
- css_soup.p['class']
- # ['body', 'strikeout']
+ css_soup = BeautifulSoup('<p class="body strikeout"></p>', 'html.parser')
+ css_soup.p['class']
+ # ['body', 'strikeout']
-If an attribute `looks` like it has more than one value, but it's not
-a multi-valued attribute as defined by any version of the HTML
-standard, Beautiful Soup will leave the attribute alone::
+ If an attribute `looks` like it has more than one value, but it's not
+ a multi-valued attribute as defined by any version of the HTML
+ standard, Beautiful Soup will leave the attribute alone::
- id_soup = BeautifulSoup('<p id="my id"></p>', 'html.parser')
- id_soup.p['id']
- # 'my id'
+ id_soup = BeautifulSoup('<p id="my id"></p>', 'html.parser')
+ id_soup.p['id']
+ # 'my id'
-When you turn a tag back into a string, multiple attribute values are
-consolidated::
+ When you turn a tag back into a string, multiple attribute values are
+ consolidated::
- rel_soup = BeautifulSoup('<p>Back to the <a rel="index first">homepage</a></p>', 'html.parser')
- rel_soup.a['rel']
- # ['index', 'first']
- rel_soup.a['rel'] = ['index', 'contents']
- print(rel_soup.p)
- # <p>Back to the <a rel="index contents">homepage</a></p>
+ rel_soup = BeautifulSoup('<p>Back to the <a rel="index first">homepage</a></p>', 'html.parser')
+ rel_soup.a['rel']
+ # ['index', 'first']
+ rel_soup.a['rel'] = ['index', 'contents']
+ print(rel_soup.p)
+ # <p>Back to the <a rel="index contents">homepage</a></p>
-You can force all attributes to be parsed as strings by passing
-``multi_valued_attributes=None`` as a keyword argument into the
-:py:class:`BeautifulSoup` constructor::
+ You can force all attributes to be parsed as strings by passing
+ ``multi_valued_attributes=None`` as a keyword argument into the
+ :py:class:`BeautifulSoup` constructor::
- no_list_soup = BeautifulSoup('<p class="body strikeout"></p>', 'html.parser', multi_valued_attributes=None)
- no_list_soup.p['class']
- # 'body strikeout'
+ no_list_soup = BeautifulSoup('<p class="body strikeout"></p>', 'html.parser', multi_valued_attributes=None)
+ no_list_soup.p['class']
+ # 'body strikeout'
-You can use ``get_attribute_list`` to get a value that's always a
-list, whether or not it's a multi-valued atribute::
+ You can use ``get_attribute_list`` to get a value that's always a
+ list, whether or not it's a multi-valued atribute::
- id_soup.p.get_attribute_list('id')
- # ["my id"]
+ id_soup.p.get_attribute_list('id')
+ # ["my id"]
-If you parse a document as XML, there are no multi-valued attributes::
-
- xml_soup = BeautifulSoup('<p class="body strikeout"></p>', 'xml')
- xml_soup.p['class']
- # 'body strikeout'
+ If you parse a document as XML, there are no multi-valued attributes::
-Again, you can configure this using the ``multi_valued_attributes`` argument::
+ xml_soup = BeautifulSoup('<p class="body strikeout"></p>', 'xml')
+ xml_soup.p['class']
+ # 'body strikeout'
- class_is_multi= { '*' : 'class'}
- xml_soup = BeautifulSoup('<p class="body strikeout"></p>', 'xml', multi_valued_attributes=class_is_multi)
- xml_soup.p['class']
- # ['body', 'strikeout']
+ Again, you can configure this using the ``multi_valued_attributes`` argument::
-You probably won't need to do this, but if you do, use the defaults as
-a guide. They implement the rules described in the HTML specification::
+ class_is_multi= { '*' : 'class'}
+ xml_soup = BeautifulSoup('<p class="body strikeout"></p>', 'xml', multi_valued_attributes=class_is_multi)
+ xml_soup.p['class']
+ # ['body', 'strikeout']
- from bs4.builder import builder_registry
- builder_registry.lookup('html').DEFAULT_CDATA_LIST_ATTRIBUTES
+ You probably won't need to do this, but if you do, use the defaults as
+ a guide. They implement the rules described in the HTML specification::
+ from bs4.builder import builder_registry
+ builder_registry.lookup('html').DEFAULT_CDATA_LIST_ATTRIBUTES
.. py:class:: NavigableString
@@ -479,12 +476,12 @@ done using Beautiful Soup. This is a big waste of memory.
---------------------------
The :py:class:`BeautifulSoup` object represents the parsed document as a
-whole. For most purposes, you can treat it as a :ref:`Tag`
+whole. For most purposes, you can treat it as a :py:class:`Tag`
object. This means it supports most of the methods described in
`Navigating the tree`_ and `Searching the tree`_.
You can also pass a :py:class:`BeautifulSoup` object into one of the methods
-defined in `Modifying the tree`_, just as you would a :ref:`Tag`. This
+defined in `Modifying the tree`_, just as you would a :py:class:`Tag`. This
lets you do things like combine two parsed documents::
doc = BeautifulSoup("<document><content/>INSERT FOOTER HERE</document", "xml")
@@ -503,8 +500,8 @@ useful to look at its ``.name``, so it's been given the special
soup.name
# '[document]'
-Comments
---------
+Special strings
+---------------
:py:class:`Tag`, :py:class:`NavigableString`, and
:py:class:`BeautifulSoup` cover almost everything you'll see in an
@@ -534,8 +531,8 @@ displayed with special formatting::
# <!--Hey, buddy. Want to buy a used parser?-->
# </b>
-Special strings for HTML documents
-----------------------------------
+For HTML documents
+^^^^^^^^^^^^^^^^^^
Beautiful Soup defines a few :py:class:`NavigableString` subclasses to
contain strings found inside specific HTML tags. This makes it easier
@@ -562,8 +559,8 @@ A :py:class:`NavigableString` subclass that represents embedded HTML
templates; that is, any strings found inside a ``<template>`` tag during
document parsing.
-Special strings for XML documents
----------------------------------
+For XML documents
+^^^^^^^^^^^^^^^^^
Beautiful Soup defines some :py:class:`NavigableString` classes for
holding special types of strings that can be found in XML
@@ -1937,7 +1934,7 @@ document.
Changing tag names and attributes
---------------------------------
-I covered this earlier, in `Attributes`_, but it bears repeating. You
+I covered this earlier, in :py:attr:`Tag.attrs`, but it bears repeating. You
can rename a tag, change the values of its attributes, add new
attributes, and delete attributes::
@@ -2928,7 +2925,7 @@ these numbers represent the position of the final greater-than sign::
# (2, 0, 'Paragraph 1')
# (3, 6, 'Paragraph 2')
-You can shut off this feature by passing ``store_line_numbers=False`
+You can shut off this feature by passing ``store_line_numbers=False``
into the :py:class:`BeautifulSoup` constructor::
markup = "<p\n>Paragraph 1</p>\n <p>Paragraph 2</p>"