From fbbc002be5e57981a0f2c56679b9ac770abcbc5b Mon Sep 17 00:00:00 2001 From: Leonard Richardson Date: Wed, 15 Mar 2023 19:14:04 -0400 Subject: Rewrote documentation so that py:class:: directives could be inserted and the text would flow naturally. --- doc/source/index.rst | 213 +++++++++++++++++++++++++-------------------------- 1 file changed, 105 insertions(+), 108 deletions(-) (limited to 'doc/source') diff --git a/doc/source/index.rst b/doc/source/index.rst index a916413..4d580a1 100644 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -295,140 +295,137 @@ and :py:class:`Comment`. .. py:class:: Tag -A :py:class:`Tag` object corresponds to an XML or HTML tag in the original document. + A :py:class:`Tag` object corresponds to an XML or HTML tag in the original document. -:: + :: - soup = BeautifulSoup('Extremely bold', 'html.parser') - tag = soup.b - type(tag) - # + soup = BeautifulSoup('Extremely bold', 'html.parser') + tag = soup.b + type(tag) + # -Tags have a lot of attributes and methods, and I'll cover most of them -in `Navigating the tree`_ and `Searching the tree`_. For now, the most -important features of a tag are its name and attributes. + Tags have a lot of attributes and methods, and I'll cover most of them + in `Navigating the tree`_ and `Searching the tree`_. For now, the most + important features of a tag are its name and attributes. -Name ----- + .. py:attribute:: name -Every tag has a name, accessible as ``.name``:: + Every tag has a name:: - tag.name - # 'b' + tag.name + # 'b' -If you change a tag's name, the change will be reflected in any HTML -markup generated by Beautiful Soup:: + If you change a tag's name, the change will be reflected in any + markup generated by Beautiful Soup down the line:: - tag.name = "blockquote" - tag - #
Extremely bold
+ tag.name = "blockquote" + tag + #
Extremely bold
-Attributes ----------- + .. py:attribute:: attrs -A tag may have any number of attributes. The tag ```` has an attribute "id" whose value is -"boldest". You can access a tag's attributes by treating the tag like -a dictionary:: + An HTML or XML tag may have any number of attributes. The tag ```` has an attribute "id" whose value is + "boldest". You can access a tag's attributes by treating the tag like + a dictionary:: - tag = BeautifulSoup('bold', 'html.parser').b - tag['id'] - # 'boldest' + tag = BeautifulSoup('bold', 'html.parser').b + tag['id'] + # 'boldest' -You can access that dictionary directly as ``.attrs``:: + You can access the dictionary of attributes directly as ``.attrs``:: - tag.attrs - # {'id': 'boldest'} + tag.attrs + # {'id': 'boldest'} -You can add, remove, and modify a tag's attributes. Again, this is -done by treating the tag as a dictionary:: + You can add, remove, and modify a tag's attributes. Again, this is + done by treating the tag as a dictionary:: - tag['id'] = 'verybold' - tag['another-attribute'] = 1 - tag - # + tag['id'] = 'verybold' + tag['another-attribute'] = 1 + tag + # - del tag['id'] - del tag['another-attribute'] - tag - # bold + del tag['id'] + del tag['another-attribute'] + tag + # bold - tag['id'] - # KeyError: 'id' - tag.get('id') - # None + tag['id'] + # KeyError: 'id' + tag.get('id') + # None -.. _multivalue: + .. _multivalue: -Multi-valued attributes ------------------------ + Multi-valued attributes + ----------------------- -HTML 4 defines a few attributes that can have multiple values. HTML 5 -removes a couple of them, but defines a few more. The most common -multi-valued attribute is ``class`` (that is, a tag can have more than -one CSS class). Others include ``rel``, ``rev``, ``accept-charset``, -``headers``, and ``accesskey``. By default, Beautiful Soup parses the value(s) -of a multi-valued attribute into a list:: + HTML 4 defines a few attributes that can have multiple values. HTML 5 + removes a couple of them, but defines a few more. The most common + multi-valued attribute is ``class`` (that is, a tag can have more than + one CSS class). Others include ``rel``, ``rev``, ``accept-charset``, + ``headers``, and ``accesskey``. By default, Beautiful Soup parses the value(s) + of a multi-valued attribute into a list:: - css_soup = BeautifulSoup('

', 'html.parser') - css_soup.p['class'] - # ['body'] + css_soup = BeautifulSoup('

', 'html.parser') + css_soup.p['class'] + # ['body'] - css_soup = BeautifulSoup('

', 'html.parser') - css_soup.p['class'] - # ['body', 'strikeout'] + css_soup = BeautifulSoup('

', 'html.parser') + css_soup.p['class'] + # ['body', 'strikeout'] -If an attribute `looks` like it has more than one value, but it's not -a multi-valued attribute as defined by any version of the HTML -standard, Beautiful Soup will leave the attribute alone:: + If an attribute `looks` like it has more than one value, but it's not + a multi-valued attribute as defined by any version of the HTML + standard, Beautiful Soup will leave the attribute alone:: - id_soup = BeautifulSoup('

', 'html.parser') - id_soup.p['id'] - # 'my id' + id_soup = BeautifulSoup('

', 'html.parser') + id_soup.p['id'] + # 'my id' -When you turn a tag back into a string, multiple attribute values are -consolidated:: + When you turn a tag back into a string, multiple attribute values are + consolidated:: - rel_soup = BeautifulSoup('

Back to the homepage

', 'html.parser') - rel_soup.a['rel'] - # ['index', 'first'] - rel_soup.a['rel'] = ['index', 'contents'] - print(rel_soup.p) - #

Back to the homepage

+ rel_soup = BeautifulSoup('

Back to the homepage

', 'html.parser') + rel_soup.a['rel'] + # ['index', 'first'] + rel_soup.a['rel'] = ['index', 'contents'] + print(rel_soup.p) + #

Back to the homepage

-You can force all attributes to be parsed as strings by passing -``multi_valued_attributes=None`` as a keyword argument into the -:py:class:`BeautifulSoup` constructor:: + You can force all attributes to be parsed as strings by passing + ``multi_valued_attributes=None`` as a keyword argument into the + :py:class:`BeautifulSoup` constructor:: - no_list_soup = BeautifulSoup('

', 'html.parser', multi_valued_attributes=None) - no_list_soup.p['class'] - # 'body strikeout' + no_list_soup = BeautifulSoup('

', 'html.parser', multi_valued_attributes=None) + no_list_soup.p['class'] + # 'body strikeout' -You can use ``get_attribute_list`` to get a value that's always a -list, whether or not it's a multi-valued atribute:: + You can use ``get_attribute_list`` to get a value that's always a + list, whether or not it's a multi-valued atribute:: - id_soup.p.get_attribute_list('id') - # ["my id"] + id_soup.p.get_attribute_list('id') + # ["my id"] -If you parse a document as XML, there are no multi-valued attributes:: - - xml_soup = BeautifulSoup('

', 'xml') - xml_soup.p['class'] - # 'body strikeout' + If you parse a document as XML, there are no multi-valued attributes:: -Again, you can configure this using the ``multi_valued_attributes`` argument:: + xml_soup = BeautifulSoup('

', 'xml') + xml_soup.p['class'] + # 'body strikeout' - class_is_multi= { '*' : 'class'} - xml_soup = BeautifulSoup('

', 'xml', multi_valued_attributes=class_is_multi) - xml_soup.p['class'] - # ['body', 'strikeout'] + Again, you can configure this using the ``multi_valued_attributes`` argument:: -You probably won't need to do this, but if you do, use the defaults as -a guide. They implement the rules described in the HTML specification:: + class_is_multi= { '*' : 'class'} + xml_soup = BeautifulSoup('

', 'xml', multi_valued_attributes=class_is_multi) + xml_soup.p['class'] + # ['body', 'strikeout'] - from bs4.builder import builder_registry - builder_registry.lookup('html').DEFAULT_CDATA_LIST_ATTRIBUTES + You probably won't need to do this, but if you do, use the defaults as + a guide. They implement the rules described in the HTML specification:: + from bs4.builder import builder_registry + builder_registry.lookup('html').DEFAULT_CDATA_LIST_ATTRIBUTES .. py:class:: NavigableString @@ -479,12 +476,12 @@ done using Beautiful Soup. This is a big waste of memory. --------------------------- The :py:class:`BeautifulSoup` object represents the parsed document as a -whole. For most purposes, you can treat it as a :ref:`Tag` +whole. For most purposes, you can treat it as a :py:class:`Tag` object. This means it supports most of the methods described in `Navigating the tree`_ and `Searching the tree`_. You can also pass a :py:class:`BeautifulSoup` object into one of the methods -defined in `Modifying the tree`_, just as you would a :ref:`Tag`. This +defined in `Modifying the tree`_, just as you would a :py:class:`Tag`. This lets you do things like combine two parsed documents:: doc = BeautifulSoup("INSERT FOOTER HERE #
-Special strings for HTML documents ----------------------------------- +For HTML documents +^^^^^^^^^^^^^^^^^^ Beautiful Soup defines a few :py:class:`NavigableString` subclasses to contain strings found inside specific HTML tags. This makes it easier @@ -562,8 +559,8 @@ A :py:class:`NavigableString` subclass that represents embedded HTML templates; that is, any strings found inside a ``