diff options
author | Leonard Richardson <leonardr@segfault.org> | 2023-03-15 19:14:04 -0400 |
---|---|---|
committer | Leonard Richardson <leonardr@segfault.org> | 2023-03-15 19:14:04 -0400 |
commit | fbbc002be5e57981a0f2c56679b9ac770abcbc5b (patch) | |
tree | 1cec7542153284426762ecba6a9e9a81ae41c261 /doc | |
parent | 305133e16a4fc035f4b2301a5eb9cdc40812e214 (diff) |
Rewrote documentation so that py:class:: directives could be inserted and the text would flow naturally.
Diffstat (limited to 'doc')
-rw-r--r-- | doc/source/index.rst | 213 |
1 files changed, 105 insertions, 108 deletions
diff --git a/doc/source/index.rst b/doc/source/index.rst index a916413..4d580a1 100644 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -295,140 +295,137 @@ and :py:class:`Comment`. .. py:class:: Tag -A :py:class:`Tag` object corresponds to an XML or HTML tag in the original document. + A :py:class:`Tag` object corresponds to an XML or HTML tag in the original document. -:: + :: - soup = BeautifulSoup('<b class="boldest">Extremely bold</b>', 'html.parser') - tag = soup.b - type(tag) - # <class 'bs4.element.Tag'> + soup = BeautifulSoup('<b class="boldest">Extremely bold</b>', 'html.parser') + tag = soup.b + type(tag) + # <class 'bs4.element.Tag'> -Tags have a lot of attributes and methods, and I'll cover most of them -in `Navigating the tree`_ and `Searching the tree`_. For now, the most -important features of a tag are its name and attributes. + Tags have a lot of attributes and methods, and I'll cover most of them + in `Navigating the tree`_ and `Searching the tree`_. For now, the most + important features of a tag are its name and attributes. -Name ----- + .. py:attribute:: name -Every tag has a name, accessible as ``.name``:: + Every tag has a name:: - tag.name - # 'b' + tag.name + # 'b' -If you change a tag's name, the change will be reflected in any HTML -markup generated by Beautiful Soup:: + If you change a tag's name, the change will be reflected in any + markup generated by Beautiful Soup down the line:: - tag.name = "blockquote" - tag - # <blockquote class="boldest">Extremely bold</blockquote> + tag.name = "blockquote" + tag + # <blockquote class="boldest">Extremely bold</blockquote> -Attributes ----------- + .. py:attribute:: attrs -A tag may have any number of attributes. The tag ``<b -id="boldest">`` has an attribute "id" whose value is -"boldest". You can access a tag's attributes by treating the tag like -a dictionary:: + An HTML or XML tag may have any number of attributes. The tag ``<b + id="boldest">`` has an attribute "id" whose value is + "boldest". You can access a tag's attributes by treating the tag like + a dictionary:: - tag = BeautifulSoup('<b id="boldest">bold</b>', 'html.parser').b - tag['id'] - # 'boldest' + tag = BeautifulSoup('<b id="boldest">bold</b>', 'html.parser').b + tag['id'] + # 'boldest' -You can access that dictionary directly as ``.attrs``:: + You can access the dictionary of attributes directly as ``.attrs``:: - tag.attrs - # {'id': 'boldest'} + tag.attrs + # {'id': 'boldest'} -You can add, remove, and modify a tag's attributes. Again, this is -done by treating the tag as a dictionary:: + You can add, remove, and modify a tag's attributes. Again, this is + done by treating the tag as a dictionary:: - tag['id'] = 'verybold' - tag['another-attribute'] = 1 - tag - # <b another-attribute="1" id="verybold"></b> + tag['id'] = 'verybold' + tag['another-attribute'] = 1 + tag + # <b another-attribute="1" id="verybold"></b> - del tag['id'] - del tag['another-attribute'] - tag - # <b>bold</b> + del tag['id'] + del tag['another-attribute'] + tag + # <b>bold</b> - tag['id'] - # KeyError: 'id' - tag.get('id') - # None + tag['id'] + # KeyError: 'id' + tag.get('id') + # None -.. _multivalue: + .. _multivalue: -Multi-valued attributes ------------------------ + Multi-valued attributes + ----------------------- -HTML 4 defines a few attributes that can have multiple values. HTML 5 -removes a couple of them, but defines a few more. The most common -multi-valued attribute is ``class`` (that is, a tag can have more than -one CSS class). Others include ``rel``, ``rev``, ``accept-charset``, -``headers``, and ``accesskey``. By default, Beautiful Soup parses the value(s) -of a multi-valued attribute into a list:: + HTML 4 defines a few attributes that can have multiple values. HTML 5 + removes a couple of them, but defines a few more. The most common + multi-valued attribute is ``class`` (that is, a tag can have more than + one CSS class). Others include ``rel``, ``rev``, ``accept-charset``, + ``headers``, and ``accesskey``. By default, Beautiful Soup parses the value(s) + of a multi-valued attribute into a list:: - css_soup = BeautifulSoup('<p class="body"></p>', 'html.parser') - css_soup.p['class'] - # ['body'] + css_soup = BeautifulSoup('<p class="body"></p>', 'html.parser') + css_soup.p['class'] + # ['body'] - css_soup = BeautifulSoup('<p class="body strikeout"></p>', 'html.parser') - css_soup.p['class'] - # ['body', 'strikeout'] + css_soup = BeautifulSoup('<p class="body strikeout"></p>', 'html.parser') + css_soup.p['class'] + # ['body', 'strikeout'] -If an attribute `looks` like it has more than one value, but it's not -a multi-valued attribute as defined by any version of the HTML -standard, Beautiful Soup will leave the attribute alone:: + If an attribute `looks` like it has more than one value, but it's not + a multi-valued attribute as defined by any version of the HTML + standard, Beautiful Soup will leave the attribute alone:: - id_soup = BeautifulSoup('<p id="my id"></p>', 'html.parser') - id_soup.p['id'] - # 'my id' + id_soup = BeautifulSoup('<p id="my id"></p>', 'html.parser') + id_soup.p['id'] + # 'my id' -When you turn a tag back into a string, multiple attribute values are -consolidated:: + When you turn a tag back into a string, multiple attribute values are + consolidated:: - rel_soup = BeautifulSoup('<p>Back to the <a rel="index first">homepage</a></p>', 'html.parser') - rel_soup.a['rel'] - # ['index', 'first'] - rel_soup.a['rel'] = ['index', 'contents'] - print(rel_soup.p) - # <p>Back to the <a rel="index contents">homepage</a></p> + rel_soup = BeautifulSoup('<p>Back to the <a rel="index first">homepage</a></p>', 'html.parser') + rel_soup.a['rel'] + # ['index', 'first'] + rel_soup.a['rel'] = ['index', 'contents'] + print(rel_soup.p) + # <p>Back to the <a rel="index contents">homepage</a></p> -You can force all attributes to be parsed as strings by passing -``multi_valued_attributes=None`` as a keyword argument into the -:py:class:`BeautifulSoup` constructor:: + You can force all attributes to be parsed as strings by passing + ``multi_valued_attributes=None`` as a keyword argument into the + :py:class:`BeautifulSoup` constructor:: - no_list_soup = BeautifulSoup('<p class="body strikeout"></p>', 'html.parser', multi_valued_attributes=None) - no_list_soup.p['class'] - # 'body strikeout' + no_list_soup = BeautifulSoup('<p class="body strikeout"></p>', 'html.parser', multi_valued_attributes=None) + no_list_soup.p['class'] + # 'body strikeout' -You can use ``get_attribute_list`` to get a value that's always a -list, whether or not it's a multi-valued atribute:: + You can use ``get_attribute_list`` to get a value that's always a + list, whether or not it's a multi-valued atribute:: - id_soup.p.get_attribute_list('id') - # ["my id"] + id_soup.p.get_attribute_list('id') + # ["my id"] -If you parse a document as XML, there are no multi-valued attributes:: - - xml_soup = BeautifulSoup('<p class="body strikeout"></p>', 'xml') - xml_soup.p['class'] - # 'body strikeout' + If you parse a document as XML, there are no multi-valued attributes:: -Again, you can configure this using the ``multi_valued_attributes`` argument:: + xml_soup = BeautifulSoup('<p class="body strikeout"></p>', 'xml') + xml_soup.p['class'] + # 'body strikeout' - class_is_multi= { '*' : 'class'} - xml_soup = BeautifulSoup('<p class="body strikeout"></p>', 'xml', multi_valued_attributes=class_is_multi) - xml_soup.p['class'] - # ['body', 'strikeout'] + Again, you can configure this using the ``multi_valued_attributes`` argument:: -You probably won't need to do this, but if you do, use the defaults as -a guide. They implement the rules described in the HTML specification:: + class_is_multi= { '*' : 'class'} + xml_soup = BeautifulSoup('<p class="body strikeout"></p>', 'xml', multi_valued_attributes=class_is_multi) + xml_soup.p['class'] + # ['body', 'strikeout'] - from bs4.builder import builder_registry - builder_registry.lookup('html').DEFAULT_CDATA_LIST_ATTRIBUTES + You probably won't need to do this, but if you do, use the defaults as + a guide. They implement the rules described in the HTML specification:: + from bs4.builder import builder_registry + builder_registry.lookup('html').DEFAULT_CDATA_LIST_ATTRIBUTES .. py:class:: NavigableString @@ -479,12 +476,12 @@ done using Beautiful Soup. This is a big waste of memory. --------------------------- The :py:class:`BeautifulSoup` object represents the parsed document as a -whole. For most purposes, you can treat it as a :ref:`Tag` +whole. For most purposes, you can treat it as a :py:class:`Tag` object. This means it supports most of the methods described in `Navigating the tree`_ and `Searching the tree`_. You can also pass a :py:class:`BeautifulSoup` object into one of the methods -defined in `Modifying the tree`_, just as you would a :ref:`Tag`. This +defined in `Modifying the tree`_, just as you would a :py:class:`Tag`. This lets you do things like combine two parsed documents:: doc = BeautifulSoup("<document><content/>INSERT FOOTER HERE</document", "xml") @@ -503,8 +500,8 @@ useful to look at its ``.name``, so it's been given the special soup.name # '[document]' -Comments --------- +Special strings +--------------- :py:class:`Tag`, :py:class:`NavigableString`, and :py:class:`BeautifulSoup` cover almost everything you'll see in an @@ -534,8 +531,8 @@ displayed with special formatting:: # <!--Hey, buddy. Want to buy a used parser?--> # </b> -Special strings for HTML documents ----------------------------------- +For HTML documents +^^^^^^^^^^^^^^^^^^ Beautiful Soup defines a few :py:class:`NavigableString` subclasses to contain strings found inside specific HTML tags. This makes it easier @@ -562,8 +559,8 @@ A :py:class:`NavigableString` subclass that represents embedded HTML templates; that is, any strings found inside a ``<template>`` tag during document parsing. -Special strings for XML documents ---------------------------------- +For XML documents +^^^^^^^^^^^^^^^^^ Beautiful Soup defines some :py:class:`NavigableString` classes for holding special types of strings that can be found in XML @@ -1937,7 +1934,7 @@ document. Changing tag names and attributes --------------------------------- -I covered this earlier, in `Attributes`_, but it bears repeating. You +I covered this earlier, in :py:attr:`Tag.attrs`, but it bears repeating. You can rename a tag, change the values of its attributes, add new attributes, and delete attributes:: @@ -2928,7 +2925,7 @@ these numbers represent the position of the final greater-than sign:: # (2, 0, 'Paragraph 1') # (3, 6, 'Paragraph 2') -You can shut off this feature by passing ``store_line_numbers=False` +You can shut off this feature by passing ``store_line_numbers=False`` into the :py:class:`BeautifulSoup` constructor:: markup = "<p\n>Paragraph 1</p>\n <p>Paragraph 2</p>" |