diff options
author | Leonard Richardson <leonardr@segfault.org> | 2023-08-27 11:47:13 -0400 |
---|---|---|
committer | Chris Papademetrious <chrispitude@gmail.com> | 2023-11-25 05:28:41 -0500 |
commit | 16e1c6bcd276237aa825f5efdbf1e80ed58fd66e (patch) | |
tree | f34905d97e2be2be97c17e9f1bad19f3c6828d63 | |
parent | 418ab58f94ea4a0957bcad8e7520bb3fb62d1394 (diff) |
some doc suggestions from a first-time user
Signed-off-by: Chris Papademetrious <chrispitude@gmail.com>
-rw-r--r-- | CHANGELOG | 3 | ||||
-rwxr-xr-x[-rw-r--r--] | doc/source/index.rst | 479 | ||||
-rw-r--r-- | pyproject.toml | 2 |
3 files changed, 263 insertions, 221 deletions
@@ -9,6 +9,9 @@ markup would result in an AssertionError instead of a ParserRejectedMarkup exception. +* Corrected the syntax of the license definition in pyproject.toml. Patch + by Louis Maddox. [bug=2032848] + = 4.12.2 (20230407) * Fixed an unhandled exception in BeautifulSoup.decode_contents diff --git a/doc/source/index.rst b/doc/source/index.rst index aedfdfc..a733b66 100644..100755 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -38,7 +38,7 @@ Beautiful Soup users: * `이 문서는 한국어 번역도 가능합니다. <https://www.crummy.com/software/BeautifulSoup/bs4/doc.ko/>`_ * `Este documento também está disponível em Português do Brasil. <https://www.crummy.com/software/BeautifulSoup/bs4/doc.ptbr>`_ * `Эта документация доступна на русском языке. <https://www.crummy.com/software/BeautifulSoup/bs4/doc.ru/>`_ - + Getting help ------------ @@ -241,7 +241,7 @@ This table summarizes the advantages and disadvantages of each parser library: +----------------------+--------------------------------------------+--------------------------------+--------------------------+ | Python's html.parser | ``BeautifulSoup(markup, "html.parser")`` | * Batteries included | * Not as fast as lxml, | | | | * Decent speed | less lenient than | -| | | * Lenient (As of Python 3.2) | html5lib. | +| | | * Lenient (As of Python 3.2) | html5lib | +----------------------+--------------------------------------------+--------------------------------+--------------------------+ | lxml's HTML parser | ``BeautifulSoup(markup, "lxml")`` | * Very fast | * External C dependency | | | | * Lenient | | @@ -291,7 +291,8 @@ Kinds of objects Beautiful Soup transforms a complex HTML document into a complex tree of Python objects. But you'll only ever have to deal with about four `kinds` of objects: :py:class:`Tag`, :py:class:`NavigableString`, :py:class:`BeautifulSoup`, -and :py:class:`Comment`. +and :py:class:`Comment`. These objects represent the HTML `elements` +that comprise the page. .. py:class:: Tag @@ -306,7 +307,7 @@ and :py:class:`Comment`. Tags have a lot of attributes and methods, and I'll cover most of them in `Navigating the tree`_ and `Searching the tree`_. For now, the most - important features of a tag are its name and attributes. + important methods of a tag are for accessing its name and attributes. .. py:attribute:: name @@ -337,6 +338,8 @@ and :py:class:`Comment`. tag.attrs # {'id': 'boldest'} + tag.attrs.keys() + # dict_keys(['id']) You can add, remove, and modify a tag's attributes. Again, this is done by treating the tag as a dictionary:: @@ -365,27 +368,19 @@ and :py:class:`Comment`. removes a couple of them, but defines a few more. The most common multi-valued attribute is ``class`` (that is, a tag can have more than one CSS class). Others include ``rel``, ``rev``, ``accept-charset``, - ``headers``, and ``accesskey``. By default, Beautiful Soup parses the value(s) - of a multi-valued attribute into a list:: + ``headers``, and ``accesskey``. By default, Beautiful Soup stores the value(s) + of a multi-valued attribute as a list:: css_soup = BeautifulSoup('<p class="body"></p>', 'html.parser') css_soup.p['class'] # ['body'] - + css_soup = BeautifulSoup('<p class="body strikeout"></p>', 'html.parser') css_soup.p['class'] # ['body', 'strikeout'] - If an attribute `looks` like it has more than one value, but it's not - a multi-valued attribute as defined by any version of the HTML - standard, Beautiful Soup will leave the attribute alone:: - - id_soup = BeautifulSoup('<p id="my id"></p>', 'html.parser') - id_soup.p['id'] - # 'my id' - - When you turn a tag back into a string, multiple attribute values are - consolidated:: + When you turn a tag back into a string, the values of any multi-valued + attributes are consolidated:: rel_soup = BeautifulSoup('<p>Back to the <a rel="index first">homepage</a></p>', 'html.parser') rel_soup.a['rel'] @@ -394,7 +389,15 @@ and :py:class:`Comment`. print(rel_soup.p) # <p>Back to the <a rel="index contents">homepage</a></p> - You can force all attributes to be parsed as strings by passing + If an attribute `looks` like it has more than one value, but it's not + a multi-valued attribute as defined by any version of the HTML + standard, Beautiful Soup stores it as a simple string:: + + id_soup = BeautifulSoup('<p id="my id"></p>', 'html.parser') + id_soup.p['id'] + # 'my id' + + You can force all attributes to be stored as strings by passing ``multi_valued_attributes=None`` as a keyword argument into the :py:class:`BeautifulSoup` constructor:: @@ -402,12 +405,14 @@ and :py:class:`Comment`. no_list_soup.p['class'] # 'body strikeout' - You can use ``get_attribute_list`` to get a value that's always a - list, whether or not it's a multi-valued atribute:: + You can use ``get_attribute_list`` to always return the value in a list + container, whether it's a string or multi-valued attribute value:: + id_soup.p['id'] + # 'my id' id_soup.p.get_attribute_list('id') # ["my id"] - + If you parse a document as XML, there are no multi-valued attributes:: xml_soup = BeautifulSoup('<p class="body strikeout"></p>', 'xml') @@ -426,13 +431,13 @@ and :py:class:`Comment`. from bs4.builder import builder_registry builder_registry.lookup('html').DEFAULT_CDATA_LIST_ATTRIBUTES - + .. py:class:: NavigableString ----------------------------- -A string corresponds to a bit of text within a tag. Beautiful Soup -uses the :py:class:`NavigableString` class to contain these bits of text:: +A tag can contain strings as pieces of text. Beautiful Soup +uses the :py:class:`NavigableString` class to contain these pieces of text:: soup = BeautifulSoup('<b class="boldest">Extremely bold</b>', 'html.parser') tag = soup.b @@ -494,8 +499,9 @@ lets you do things like combine two parsed documents:: Since the :py:class:`BeautifulSoup` object doesn't correspond to an actual HTML or XML tag, it has no name and no attributes. But sometimes it's -useful to look at its ``.name``, so it's been given the special -``.name`` "[document]":: +useful to reference its ``.name`` (such as when writing code that works +with both :py:class:`Tag` and :py:class:`BeautifulSoup` objects), +so it's been given the special ``.name`` "[document]":: soup.name # '[document]' @@ -533,7 +539,7 @@ displayed with special formatting:: For HTML documents ^^^^^^^^^^^^^^^^^^ - + Beautiful Soup defines a few :py:class:`NavigableString` subclasses to contain strings found inside specific HTML tags. This makes it easier to pick out the main body of the page, by ignoring strings that @@ -589,7 +595,6 @@ A :py:class:`NavigableString` subclass that represents a `CData section <https:/ A :py:class:`NavigableString` subclass that represents the contents of an `XML processing instruction <https://www.w3.org/TR/REC-xml/#sec-pi>`_. - Navigating the tree =================== @@ -618,7 +623,7 @@ a document to another. Going down ---------- -Tags may contain strings and other tags. These elements are the tag's +Tags may contain strings and more tags. These elements are the tag's `children`. Beautiful Soup provides a lot of different attributes for navigating and iterating over a tag's children. @@ -628,8 +633,15 @@ attributes, because a string can't have children. Navigating using tag names ^^^^^^^^^^^^^^^^^^^^^^^^^^ -The simplest way to navigate the parse tree is to say the name of the -tag you want. If you want the <head> tag, just say ``soup.head``:: +The simplest way to navigate the parse tree is to find a tag by name. To +do this, you can use the ``find()`` method:: + + soup.find("head") + # <head><title>The Dormouse's story</title></head> + +For convenience, just saying the name of the tag you want is equivalent +to ``find()`` (if no built-in attribute has that name). If you want the +<head> tag, just say ``soup.head``:: soup.head # <head><title>The Dormouse's story</title></head> @@ -637,27 +649,28 @@ tag you want. If you want the <head> tag, just say ``soup.head``:: soup.title # <title>The Dormouse's story</title> -You can do use this trick again and again to zoom in on a certain part +You can use this trick again and again to zoom in on a certain part of the parse tree. This code gets the first <b> tag beneath the <body> tag:: soup.body.b # <b>The Dormouse's story</b> -Using a tag name as an attribute will give you only the `first` tag by that -name:: +``find()`` (and its convenience equivalent) gives you only the `first` tag +by that name:: soup.a # <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a> -If you need to get `all` the <a> tags, or anything more complicated -than the first tag with a certain name, you'll need to use one of the -methods described in `Searching the tree`_, such as `find_all()`:: +If you need to get `all` the <a> tags, you can use ``find_all()``:: soup.find_all('a') # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>] +For more complicated tasks, such as pattern-matching and filtering, you can +use the methods described in `Searching the tree`_. + ``.contents`` and ``.children`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -703,11 +716,11 @@ If you want to modify a tag's children, use the methods described in directly: that can lead to problems that are subtle and difficult to spot. - + ``.descendants`` ^^^^^^^^^^^^^^^^ -The ``.contents`` and ``.children`` attributes only consider a tag's +The ``.contents`` and ``.children`` attributes consider only a tag's `direct` children. For instance, the <head> tag has a single direct child--the <title> tag:: @@ -769,7 +782,8 @@ If a tag contains more than one thing, then it's not clear what ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If there's more than one thing inside a tag, you can still look at -just the strings. Use the ``.strings`` generator:: +just the strings. Use the ``.strings`` generator to see all descendant +strings:: for string in soup.strings: print(repr(string)) @@ -790,8 +804,8 @@ just the strings. Use the ``.strings`` generator:: # '...' # '\n' -These strings tend to have a lot of extra whitespace, which you can -remove by using the ``.stripped_strings`` generator instead:: +Newlines and spaces that separate tags are also strings. You can remove extra +whitespace by using the ``.stripped_strings`` generator instead:: for string in soup.stripped_strings: print(repr(string)) @@ -938,7 +952,7 @@ newline that separate the first <a> tag from the second:: link.next_sibling # ',\n ' -The second <a> tag is actually the ``.next_sibling`` of the comma:: +The second <a> tag is then the ``.next_sibling`` of the comma string:: link.next_sibling.next_sibling # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> @@ -967,6 +981,9 @@ You can iterate over a tag's siblings with ``.next_siblings`` or # <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a> # 'Once upon a time there were three little sisters; and their names were\n' +(If the argument syntax to find tags by their attribute value is unfamiliar, +don't worry; this is covered later in :ref:`The keyword arguments <kwargs>`.) + Going back and forth -------------------- @@ -975,11 +992,14 @@ Take a look at the beginning of the "three sisters" document:: # <html><head><title>The Dormouse's story</title></head> # <p class="title"><b>The Dormouse's story</b></p> +.. _document-order: + An HTML parser takes this string of characters and turns it into a series of events: "open an <html> tag", "open a <head> tag", "open a <title> tag", "add a string", "close the <title> tag", "open a <p> -tag", and so on. Beautiful Soup offers tools for reconstructing the -initial parse of the document. +tag", and so on. The order in which the opening tags and strings are +encountered is called `document order`. Beautiful Soup offers tools for +searching a document's elements in document order. .. _element-generators: @@ -987,12 +1007,13 @@ initial parse of the document. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``.next_element`` attribute of a string or tag points to whatever -was parsed immediately afterwards. It might be the same as -``.next_sibling``, but it's usually drastically different. +was parsed immediately after the opening of the current tag or after +the current string. It might be the same as ``.next_sibling``, but it's +usually drastically different. Here's the final <a> tag in the "three sisters" document. Its ``.next_sibling`` is a string: the conclusion of the sentence that was -interrupted by the start of the <a> tag.:: +interrupted by the start of the <a> tag:: last_a_tag = soup.find("a", id="link3") last_a_tag @@ -1003,7 +1024,7 @@ interrupted by the start of the <a> tag.:: But the ``.next_element`` of that <a> tag, the thing that was parsed immediately after the <a> tag, is `not` the rest of that sentence: -it's the word "Tillie":: +it's the string "Tillie" inside it:: last_a_tag.next_element # 'Tillie' @@ -1015,8 +1036,8 @@ the sentence. The semicolon is on the same level as the <a> tag, but the word "Tillie" was encountered first. The ``.previous_element`` attribute is the exact opposite of -``.next_element``. It points to whatever element was parsed -immediately before this one:: +``.next_element``. It points to the opening tag or string that was +parsed immediately before this one:: last_a_tag.previous_element # ' and\n' @@ -1066,7 +1087,7 @@ Once again, I'll be using the "three sisters" document as an example:: from bs4 import BeautifulSoup soup = BeautifulSoup(html_doc, 'html.parser') -By passing in a filter to an argument like ``find_all()``, you can +By passing in a filter to a method like ``find_all()``, you can zoom in on the parts of the document you're interested in. Kinds of filters @@ -1085,8 +1106,8 @@ A string ^^^^^^^^ The simplest filter is a string. Pass a string to a search method and -Beautiful Soup will perform a match against that exact string. This -code finds all the <b> tags in the document:: +Beautiful Soup will perform a tag-name match against that exact string. +This code finds all the <b> tags in the document:: soup.find_all('b') # [<b>The Dormouse's story</b>] @@ -1117,27 +1138,10 @@ This code finds all the tags whose names contain the letter 't':: # html # title -.. _a list: - -A list -^^^^^^ - -If you pass in a list, Beautiful Soup will allow a string match -against `any` item in that list. This code finds all the <a> tags -`and` all the <b> tags:: - - soup.find_all(["a", "b"]) - # [<b>The Dormouse's story</b>, - # <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, - # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, - # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>] - -.. _the value True: - ``True`` ^^^^^^^^ -The value ``True`` matches everything it can. This code finds `all` +The value ``True`` matches every tag it can. This code finds `all` the tags in the document, but none of the text strings:: for tag in soup.find_all(True): @@ -1177,24 +1181,11 @@ tags:: # <p class="story">Once upon a time there were…bottom of a well.</p>, # <p class="story">...</p>] -This function only picks up the <p> tags. It doesn't pick up the <a> +This function picks up only the <p> tags. It doesn't pick up the <a> tags, because those tags define both "class" and "id". It doesn't pick up tags like <html> and <title>, because those tags don't define "class". -If you pass in a function to filter on a specific attribute like -``href``, the argument passed into the function will be the attribute -value, not the whole tag. Here's a function that finds all ``a`` tags -whose ``href`` attribute *does not* match a regular expression:: - - import re - def not_lacie(href): - return href and not re.compile("lacie").search(href) - - soup.find_all(href=not_lacie) - # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, - # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>] - The function can be as complicated as you need it to be. Here's a function that returns ``True`` if a tag is surrounded by string objects:: @@ -1213,6 +1204,23 @@ objects:: # a # p +.. _a list: + +A list +^^^^^^ + +If you pass in a list, Beautiful Soup will look for a match against +`any` string, regular expression, or function in that list. This +code finds all the <a> tags `and` all the <b> tags:: + + soup.find_all(["a", "b"]) + # [<b>The Dormouse's story</b>, + # <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, + # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, + # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>] + +.. _the value True: + Now we're ready to look at the search methods in detail. ``find_all()`` @@ -1271,32 +1279,57 @@ True`_. The keyword arguments ^^^^^^^^^^^^^^^^^^^^^ -Any argument that's not recognized will be turned into a filter on one -of a tag's attributes. If you pass in a value for an argument called ``id``, -Beautiful Soup will filter against each tag's 'id' attribute:: +Any keyword argument that's not recognized will be turned into a filter +that matches tags by their attributes. + +If you pass in a value for an argument called ``id``, Beautiful Soup will +filter against each tag's 'id' attribute value:: soup.find_all(id='link2') # [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>] -If you pass in a value for ``href``, Beautiful Soup will filter -against each tag's 'href' attribute:: +Just as with tags, you can filter an attribute based on `a string`_, +`a regular expression`_, `a list`_, `a function`_, or `the value True`_. + +If you pass in a regular expression object for ``href``, Beautiful Soup will +pattern-match against each tag's 'href' attribute value:: soup.find_all(href=re.compile("elsie")) # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>] -You can filter an attribute based on `a string`_, `a regular -expression`_, `a list`_, `a function`_, or `the value True`_. - -This code finds all tags whose ``id`` attribute has a value, -regardless of what the value is:: +The value ``True`` matches every tag that defines the attribute. This code +finds `all` tags with an ``id`` attribute: soup.find_all(id=True) # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>] -You can filter multiple attributes at once by passing in more than one -keyword argument:: +For more complex matches, you can define a function that takes an attribute +value as its only argument. The function should return ``True`` if the value +matches, and ``False`` otherwise. + +Here's a function that finds all ``a`` tags whose ``href`` attribute *does not* +match a regular expression:: + + import re + def not_lacie(href): + return href and not re.compile("lacie").search(href) + + soup.find_all(href=not_lacie) + # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, + # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>] + +If you pass in a list for an argument, Beautiful Soup will look for an +attribute-value match against `any` string, regular expression, or function in +that list. This code finds the first and last link: + + soup.find_all(id=["link1", re.compile("3$")]) + # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, + # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>] + +You can filter against multiple attributes at once by passing multiple +keyword arguments:: soup.find_all(href=re.compile("elsie"), id='link1') # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>] @@ -1315,7 +1348,7 @@ dictionary and passing the dictionary into ``find_all()`` as the data_soup.find_all(attrs={"data-foo": "value"}) # [<div data-foo="value">foo!</div>] -You can't use a keyword argument to search for HTML's 'name' element, +Similarly, you can't use a keyword argument to search for HTML's 'name' attribute, because Beautiful Soup uses the ``name`` argument to contain the name of the tag itself. Instead, you can give a value to 'name' in the ``attrs`` argument:: @@ -1378,15 +1411,9 @@ But searching for variants of the string value won't work:: css_soup.find_all("p", class_="strikeout body") # [] -If you want to search for tags that match two or more CSS classes, you -should use a CSS selector:: - - css_soup.select("p.strikeout.body") - # [<p class="body strikeout"></p>] - In older versions of Beautiful Soup, which don't have the ``class_`` -shortcut, you can use the ``attrs`` trick mentioned above. Create a -dictionary whose value for "class" is the string (or regular +shortcut, you can use the ``attrs`` argument trick mentioned above. +Create a dictionary whose value for "class" is the string (or regular expression, or whatever) you want to search for:: soup.find_all("a", attrs={"class": "sister"}) @@ -1394,14 +1421,20 @@ expression, or whatever) you want to search for:: # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>] +To search for tags that match two or more CSS classes at once, use the +`select()` CSS selector method :ref:`described here <css-selectors>`:: + + css_soup.select("p.strikeout.body") + # [<p class="body strikeout"></p>] + .. _string: The ``string`` argument ^^^^^^^^^^^^^^^^^^^^^^^ -With ``string`` you can search for strings instead of tags. As with -``name`` and the keyword arguments, you can pass in `a string`_, `a -regular expression`_, `a list`_, `a function`_, or `the value True`_. +With the ``string`` argument, you can search for strings instead of tags. As +with ``name`` and attribute keyword arguments, you can pass in `a string`_, `a +regular expression`_, `a function`_, `a list`_, or `the value True`_. Here are some examples:: soup.find_all(string="Elsie") @@ -1420,10 +1453,9 @@ Here are some examples:: soup.find_all(string=is_the_only_string_within_a_tag) # ["The Dormouse's story", "The Dormouse's story", 'Elsie', 'Lacie', 'Tillie', '...'] -Although ``string`` is for finding strings, you can combine it with -arguments that find tags: Beautiful Soup will find all tags whose -``.string`` matches your value for ``string``. This code finds the <a> -tags whose ``.string`` is "Elsie":: +If you use the ``string`` argument in a tag search, Beautiful Soup will find +all tags whose ``.string`` matches your value for ``string``. This code finds +the <a> tags whose ``.string`` is "Elsie":: soup.find_all("a", string="Elsie") # [<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>] @@ -1457,10 +1489,9 @@ only finds the first two:: The ``recursive`` argument ^^^^^^^^^^^^^^^^^^^^^^^^^^ -If you call ``mytag.find_all()``, Beautiful Soup will examine all the -descendants of ``mytag``: its children, its children's children, and -so on. If you only want Beautiful Soup to consider direct children, -you can pass in ``recursive=False``. See the difference here:: +By default, ``mytag.find_all()`` will examine all the descendants of ``mytag``: +its children, its children's children, and so on. To consider only direct +children, you can pass in ``recursive=False``. See the difference here:: soup.html.find_all("title") # [<title>The Dormouse's story</title>] @@ -1486,19 +1517,17 @@ the <html> tag, but when ``recursive=False`` restricts it to the Beautiful Soup offers a lot of tree-searching methods (covered below), and they mostly take the same arguments as ``find_all()``: ``name``, -``attrs``, ``string``, ``limit``, and the keyword arguments. But the -``recursive`` argument is different: ``find_all()`` and ``find()`` are -the only methods that support it. Passing ``recursive=False`` into a -method like ``find_parents()`` wouldn't be very useful. +``attrs``, ``string``, ``limit``, and attribute keyword arguments. But the +``recursive`` argument is specific to the ``find_all()`` and ``find()`` methods. +Passing ``recursive=False`` into a method like ``find_parents()`` wouldn't be +very useful. Calling a tag is like calling ``find_all()`` -------------------------------------------- -Because ``find_all()`` is the most popular method in the Beautiful -Soup search API, you can use a shortcut for it. If you treat the -:py:class:`BeautifulSoup` object or a :py:class:`Tag` object as though it were a -function, then it's the same as calling ``find_all()`` on that -object. These two lines of code are equivalent:: +For convenience, calling a :py:class:`BeautifulSoup` object or :py:class:`Tag` +object as a function is equivalent to calling ``find_all()`` (if no built-in +method has that name). These two lines of code are equivalent:: soup.find_all("a") soup("a") @@ -1516,7 +1545,7 @@ Method signature: find(:ref:`name <name>`, :ref:`attrs <attrs>`, :ref:`recursive The ``find_all()`` method scans the entire document looking for results, but sometimes you only want to find one result. If you know a -document only has one <body> tag, it's a waste of time to scan the +document has only one <body> tag, it's a waste of time to scan the entire document looking for more. Rather than passing in ``limit=1`` every time you call ``find_all``, you can use the ``find()`` method. These two lines of code are `nearly` equivalent:: @@ -1556,8 +1585,8 @@ I spent a lot of time above covering ``find_all()`` and ``find()``. The Beautiful Soup API defines ten other methods for searching the tree, but don't be afraid. Five of these methods are basically the same as ``find_all()``, and the other five are basically -the same as ``find()``. The only differences are in what parts of the -tree they search. +the same as ``find()``. The only differences are in how their search +through the tree (the search `axis`). First let's consider ``find_parents()`` and ``find_parent()``. Remember that ``find_all()`` and ``find()`` work @@ -1585,16 +1614,16 @@ buried deep in the "three daughters" document:: One of the three <a> tags is the direct parent of the string in question, so our search finds it. One of the three <p> tags is an -indirect parent of the string, and our search finds that as +indirect parent (`ancestor`) of the string, and our search finds that as well. There's a <p> tag with the CSS class "title" `somewhere` in the document, but it's not one of this string's parents, so we can't find it with ``find_parents()``. -You may have made the connection between ``find_parent()`` and +You may have noticed a similarity between ``find_parent()`` and ``find_parents()``, and the `.parent`_ and `.parents`_ attributes -mentioned earlier. The connection is very strong. These search methods -actually use ``.parents`` to iterate over all the parents, and check -each one against the provided filter to see if it matches. +mentioned earlier. These search methods actually use the ``.parents`` +attribute to iterate through all parents (unfiltered), checking each one +against the provided filter to see if it matches. ``find_next_siblings()`` and ``find_next_sibling()`` ---------------------------------------------------- @@ -1606,7 +1635,7 @@ Method signature: find_next_sibling(:ref:`name <name>`, :ref:`attrs <attrs>`, :r These methods use :ref:`.next_siblings <sibling-generators>` to iterate over the rest of an element's siblings in the tree. The ``find_next_siblings()`` method returns all the siblings that match, -and ``find_next_sibling()`` only returns the first one:: +and ``find_next_sibling()`` returns only the first one:: first_link = soup.a first_link @@ -1630,7 +1659,7 @@ Method signature: find_previous_sibling(:ref:`name <name>`, :ref:`attrs <attrs>` These methods use :ref:`.previous_siblings <sibling-generators>` to iterate over an element's siblings that precede it in the tree. The ``find_previous_siblings()`` method returns all the siblings that match, and -``find_previous_sibling()`` only returns the first one:: +``find_previous_sibling()`` returns only the first one:: last_link = soup.find("a", id="link3") last_link @@ -1644,7 +1673,6 @@ method returns all the siblings that match, and first_story_paragraph.find_previous_sibling("p") # <p class="title"><b>The Dormouse's story</b></p> - ``find_all_next()`` and ``find_next()`` --------------------------------------- @@ -1655,7 +1683,7 @@ Method signature: find_next(:ref:`name <name>`, :ref:`attrs <attrs>`, :ref:`stri These methods use :ref:`.next_elements <element-generators>` to iterate over whatever tags and strings that come after it in the document. The ``find_all_next()`` method returns all matches, and -``find_next()`` only returns the first match:: +``find_next()`` returns only the first match:: first_link = soup.a first_link @@ -1672,8 +1700,8 @@ In the first example, the string "Elsie" showed up, even though it was contained within the <a> tag we started from. In the second example, the last <p> tag in the document showed up, even though it's not in the same part of the tree as the <a> tag we started from. For these -methods, all that matters is that an element match the filter, and -show up later in the document than the starting element. +methods, all that matters is that an element matches the filter and +it shows up later in the document in :ref:`document order <document-order>`. ``find_all_previous()`` and ``find_previous()`` ----------------------------------------------- @@ -1702,13 +1730,17 @@ The call to ``find_all_previous("p")`` found the first paragraph in the document (the one with class="title"), but it also finds the second paragraph, the <p> tag that contains the <a> tag we started with. This shouldn't be too surprising: we're looking at all the tags -that show up earlier in the document than the one we started with. A +that show up earlier in the document in :ref:`document order <document-order>` than the one we started with. A <p> tag that contains an <a> tag must have shown up before the <a> tag it contains. +.. _css-selectors: + CSS selectors through the ``.css`` property ------------------------------------------- +.. _select: + :py:class:`BeautifulSoup` and :py:class:`Tag` objects support CSS selectors through their ``.css`` property. The actual selector implementation is handled by the `Soup Sieve <https://facelessuser.github.io/soupsieve/>`_ @@ -1718,7 +1750,7 @@ time, so you don't have to do anything extra. The Soup Sieve documentation lists `all the currently supported CSS selectors <https://facelessuser.github.io/soupsieve/selectors/>`_, but -here are some of the basics. You can find tags:: +here are some of the basics. You can find tags by name:: soup.css.select("title") # [<title>The Dormouse's story</title>] @@ -1726,7 +1758,15 @@ here are some of the basics. You can find tags:: soup.css.select("p:nth-of-type(3)") # [<p class="story">...</p>] -Find tags beneath other tags:: +Find tags by ID:: + + soup.css.select("#link1") + # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>] + + soup.css.select("a#link2") + # [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>] + +Find tags contained anywhere within other tags:: soup.css.select("body a") # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, @@ -1736,7 +1776,7 @@ Find tags beneath other tags:: soup.css.select("html head title") # [<title>The Dormouse's story</title>] -Find tags `directly` beneath other tags:: +Find tags `directly` within other tags:: soup.css.select("head > title") # [<title>The Dormouse's story</title>] @@ -1749,18 +1789,17 @@ Find tags `directly` beneath other tags:: soup.css.select("p > a:nth-of-type(2)") # [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>] - soup.css.select("p > #link1") - # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>] - soup.css.select("body > a") # [] -Find the siblings of tags:: +Find all matching next siblings of tags:: soup.css.select("#link1 ~ .sister") # [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>] +Find the next sibling tag (but only if it matches):: + soup.css.select("#link1 + .sister") # [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>] @@ -1776,14 +1815,6 @@ Find tags by CSS class:: # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>] -Find tags by ID:: - - soup.css.select("#link1") - # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>] - - soup.css.select("a#link2") - # [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>] - Find tags that match any selector from a list of selectors:: soup.css.select("#link1,#link2") @@ -1864,9 +1895,9 @@ that matches a CSS selector, similar to Beautiful Soup's # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>; # and they lived at the bottom of a well.</p> -The ``match()`` method returns a boolean depending on whether or not a +The ``match()`` method returns a Boolean depending on whether or not a specific :py:class:`Tag` matches a selector:: - + # elsie.css.match("#link1") True @@ -1875,13 +1906,13 @@ specific :py:class:`Tag` matches a selector:: The ``filter()`` method returns the subset of a tag's direct children that match a selector:: - + [tag.string for tag in soup.find('p', 'story').css.filter('a')] # ['Elsie', 'Lacie', 'Tillie'] The ``escape()`` method escapes CSS identifiers that would otherwise be invalid:: - + soup.css.escape("1-strange-identifier") # '\\31 -strange-identifier' @@ -1922,7 +1953,6 @@ supported. The Soup Sieve integration was added in Beautiful Soup 4.7.0. Earlier versions had the ``.select()`` method, but only the most commonly-used CSS selectors were supported. - Modifying the tree ================== @@ -1965,9 +1995,9 @@ replaced with that string:: tag.string = "New link text." tag # <a href="http://example.com/">New link text.</a> - + Be careful: if the tag contained other tags, they and all their -contents will be destroyed. +contents will be destroyed. ``append()`` ------------ @@ -1997,7 +2027,7 @@ in order:: # <a>Soup's on</a> soup.a.contents # ['Soup', ''s', ' ', 'on'] - + ``NavigableString()`` and ``.new_tag()`` ------------------------------------------------- @@ -2029,7 +2059,7 @@ If you want to create a comment or some other subclass of `(This is a new feature in Beautiful Soup 4.4.0.)` -What if you need to create a whole new tag? The best solution is to +What if you need to create a whole new tag? The best solution is to call the factory method ``BeautifulSoup.new_tag()``:: soup = BeautifulSoup("<b></b>", 'html.parser') @@ -2062,7 +2092,7 @@ say. It works just like ``.insert()`` on a Python list:: tag # <a href="http://example.com/">I linked to but did not endorse <i>example.com</i></a> tag.contents - # ['I linked to ', 'but did not endorse', <i>example.com</i>] + # ['I linked to ', 'but did not endorse ', <i>example.com</i>] ``insert_before()`` and ``insert_after()`` ------------------------------------------ @@ -2124,7 +2154,7 @@ returns the tag or string that was extracted:: At this point you effectively have two parse trees: one rooted at the :py:class:`BeautifulSoup` object you used to parse the document, and one rooted -at the tag that was extracted. You can go on to call ``extract`` on +at the tag that was extracted. You can go on to call ``extract()`` on a child of the element you extracted:: my_string = i_tag.string.extract() @@ -2163,14 +2193,13 @@ whether something has been decomposed, you can check its a_tag.decomposed # False - .. _replace_with(): ``replace_with()`` ------------------ -``PageElement.replace_with()`` removes a tag or string from the tree, -and replaces it with one or more tags or strings of your choice:: +``PageElement.replace_with()`` extracts a tag or string from the tree, +then replaces it with one or more tags or strings of your choice:: markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>' soup = BeautifulSoup(markup, 'html.parser') @@ -2198,11 +2227,10 @@ that you can examine it or add it back to another part of the tree. `The ability to pass multiple arguments into replace_with() is new in Beautiful Soup 4.10.0.` - ``wrap()`` ---------- -``PageElement.wrap()`` wraps an element in the tag you specify. It +``PageElement.wrap()`` wraps an element in the tag object you specify. It returns the new wrapper:: soup = BeautifulSoup("<p>I wish I was bold.</p>", 'html.parser') @@ -2234,7 +2262,11 @@ that was replaced. ``smooth()`` --------------------------- -After calling a bunch of methods that modify the parse tree, you may end up with two or more :py:class:`NavigableString` objects next to each other. Beautiful Soup doesn't have any problems with this, but since it can't happen in a freshly parsed document, you might not expect behavior like the following:: +After calling a bunch of methods that modify the parse tree, you may end up +with two or more :py:class:`NavigableString` objects next to each other. +Beautiful Soup doesn't have any problems with this, but since it can't happen +in a freshly parsed document, you might not expect behavior like the +following:: soup = BeautifulSoup("<p>A one</p>", 'html.parser') soup.p.append(", a two") @@ -2311,7 +2343,7 @@ Since it adds whitespace (in the form of newlines), ``prettify()`` changes the meaning of an HTML document and should not be used to reformat one. The goal of ``prettify()`` is to help you visually understand the structure of the documents you work with. - + Non-pretty printing ------------------- @@ -2389,25 +2421,25 @@ If you pass in ``formatter="html5"``, it's similar to omit the closing slash in HTML void tags like "br":: br = BeautifulSoup("<br>", 'html.parser').br - + print(br.encode(formatter="html")) # b'<br/>' - + print(br.encode(formatter="html5")) # b'<br>' In addition, any attributes whose values are the empty string -will become HTML-style boolean attributes:: +will become HTML-style Boolean attributes:: option = BeautifulSoup('<option selected=""></option>').option print(option.encode(formatter="html")) # b'<option selected=""></option>' - + print(option.encode(formatter="html5")) # b'<option selected></option>' *(This behavior is new as of Beautiful Soup 4.10.0.)* - + If you pass in ``formatter=None``, Beautiful Soup will not modify strings at all on output. This is the fastest option, but it may lead to Beautiful Soup generating invalid HTML/XML, as in these examples:: @@ -2433,12 +2465,12 @@ object in as ``formatter``. Used to customize the formatting rules for HTML documents. Here's a formatter that converts strings to uppercase, whether they -occur in a text node or in an attribute value:: +occur in a string object or an attribute value:: from bs4.formatter import HTMLFormatter def uppercase(str): return str.upper() - + formatter = HTMLFormatter(uppercase) print(soup.prettify(formatter=formatter)) @@ -2451,7 +2483,7 @@ occur in a text node or in an attribute value:: # A LINK # </a> -Here's a formatter that increases the indentation when pretty-printing:: +Here's a formatter that increases the indentation width when pretty-printing:: formatter = HTMLFormatter(indent=8) print(link_soup.a.prettify(formatter=formatter)) @@ -2485,8 +2517,8 @@ whenever it appears:: if k == 'm': continue yield k, v - - print(attr_soup.p.encode(formatter=UnsortedAttributes())) + + print(attr_soup.p.encode(formatter=UnsortedAttributes())) # <p z="1" a="3"></p> One last caveat: if you create a :py:class:`CData` object, the text inside @@ -2548,7 +2580,6 @@ the human-visible content of the page.* either return the object itself, or nothing, so the only reason to do this is when you're iterating over a mixed list.* - Specifying the parser to use ============================ @@ -2559,15 +2590,15 @@ few additional arguments you can pass in to the constructor to change which parser is used. The first argument to the :py:class:`BeautifulSoup` constructor is a string or -an open filehandle--the markup you want parsed. The second argument is -`how` you'd like the markup parsed. +an open filehandle—the source of the markup you want parsed. The second +argument is `how` you'd like the markup parsed. If you don't specify anything, you'll get the best HTML parser that's installed. Beautiful Soup ranks lxml's parser as being the best, then html5lib's, then Python's built-in parser. You can override this by specifying one of the following: -* What type of markup you want to parse. Currently supported are +* What type of markup you want to parse. Currently supported values are "html", "xml", and "html5". * The name of the parser library you want to use. Currently supported @@ -2654,7 +2685,7 @@ Encodings ========= Any HTML or XML document is written in a specific encoding like ASCII -or UTF-8. But when you load that document into Beautiful Soup, you'll +or UTF-8. But when you load that document into Beautiful Soup, you'll discover it's been converted to Unicode:: markup = "<h1>Sacr\xc3\xa9 bleu!</h1>" @@ -2727,8 +2758,8 @@ paragraph) and doesn't stand in for missing data. Output encoding --------------- -When you write out a document from Beautiful Soup, you get a UTF-8 -document, even if the document wasn't in UTF-8 to begin with. Here's a +When you write out an output document from Beautiful Soup, you get a UTF-8 +document, even if the input document wasn't in UTF-8 to begin with. Here's a document written in the Latin-1 encoding:: markup = b''' @@ -2903,7 +2934,7 @@ Line numbers ============ The ``html.parser`` and ``html5lib`` parsers can keep track of where in -the original document each Tag was found. You can access this +the original document each :py:class:`Tag` was found. You can access this information as ``Tag.sourceline`` (line number) and ``Tag.sourcepos`` (position of the start tag within a line):: @@ -2918,7 +2949,7 @@ Note that the two parsers mean slightly different things by ``sourceline`` and ``sourcepos``. For html.parser, these numbers represent the position of the initial less-than sign. For html5lib, these numbers represent the position of the final greater-than sign:: - + soup = BeautifulSoup(markup, 'html5lib') for tag in soup.find_all('p'): print(repr((tag.sourceline, tag.sourcepos, tag.string))) @@ -2932,7 +2963,7 @@ into the :py:class:`BeautifulSoup` constructor:: soup = BeautifulSoup(markup, 'html.parser', store_line_numbers=False) print(soup.p.sourceline) # None - + `This feature is new in 4.8.1, and the parsers based on lxml don't support it.` @@ -2940,10 +2971,10 @@ Comparing objects for equality ============================== Beautiful Soup says that two :py:class:`NavigableString` or :py:class:`Tag` objects -are equal when they represent the same HTML or XML markup. In this -example, the two <b> tags are treated as equal, even though they live -in different parts of the object tree, because they both look like -"<b>pizza</b>":: +are equal when they represent the same HTML or XML markup, even if their +attributes are in a different order or they live in different parts of the +object tree. In this example, the two <b> tags are treated as equal, because +they both look like "<b>pizza</b>":: markup = "<p>I want <b>pizza</b> and more <b>pizza</b>!</p>" soup = BeautifulSoup(markup, 'html.parser') @@ -3000,7 +3031,7 @@ used customization techniques. Parsing only part of a document ------------------------------- -Let's say you want to use Beautiful Soup look at a document's <a> +Let's say you want to use Beautiful Soup to look at a document's <a> tags. It's a waste of time and memory to parse the entire document and then go over it again looking for <a> tags. It would be much faster to ignore everything that wasn't an <a> tag in the first place. The @@ -3076,6 +3107,13 @@ three :py:class:`SoupStrainer` objects:: # ... # +The `:py:class:`SoupStrainer`` behavior is as follows: + +* When a tag matches, it is kept (including all its contents, whether they also + match or not). +* When a tag does not match, the tag itself is not kept, but parsing continues + into its contents to look for other tags that do match. + You can also pass a :py:class:`SoupStrainer` into any of the methods covered in `Searching the tree`_. This probably isn't terribly useful, but I thought I'd mention it:: @@ -3136,7 +3174,7 @@ The default behavior is to use the last value found for the tag:: soup = BeautifulSoup(markup, 'html.parser', on_duplicate_attribute='replace') soup.a['href'] # http://url2/ - + With ``on_duplicate_attribute='ignore'`` you can tell Beautiful Soup to use the `first` value found and ignore the rest:: @@ -3147,7 +3185,8 @@ to use the `first` value found and ignore the rest:: (lxml and html5lib always do it this way; their behavior can't be configured from within Beautiful Soup.) -If you need more, you can pass in a function that's called on each duplicate value:: +If you need more control, you can pass in a function that's called on each +duplicate value:: def accumulate(attributes_so_far, key, value): if not isinstance(attributes_so_far[key], list): @@ -3183,14 +3222,14 @@ tell Beautiful Soup to instantiate `subclasses` of :py:class:`Tag` or isinstance(soup.div, MyTag) # False isinstance(soup.div.string, MyString) - # False + # False my_classes = { Tag: MyTag, NavigableString: MyString } soup = BeautifulSoup(markup, 'html.parser', element_classes=my_classes) isinstance(soup.div, MyTag) # True isinstance(soup.div.string, MyString) - # True + # True This can be useful when incorporating Beautiful Soup into a test framework. @@ -3206,8 +3245,8 @@ Troubleshooting -------------- If you're having trouble understanding what Beautiful Soup does to a -document, pass the document into the ``diagnose()`` function. (New in -Beautiful Soup 4.2.0.) Beautiful Soup will print out a report showing +document, pass the document into the ``diagnose()`` function. (This function is new in +Beautiful Soup 4.2.0.) Beautiful Soup will print out a report showing you how different parsers handle the document, and tell you if you're missing a parser that Beautiful Soup could be using:: @@ -3226,7 +3265,7 @@ missing a parser that Beautiful Soup could be using:: # Here's what html.parser did with the document: # ... -Just looking at the output of diagnose() may show you how to solve the +Just looking at the output of diagnose() might show you how to solve the problem. Even if not, you can paste the output of ``diagnose()`` when asking for help. @@ -3235,12 +3274,12 @@ Errors when parsing a document There are two different kinds of parse errors. There are crashes, where you feed a document to Beautiful Soup and it raises an -exception, usually an ``HTMLParser.HTMLParseError``. And there is +exception (usually an ``HTMLParser.HTMLParseError``). And there is unexpected behavior, where a Beautiful Soup parse tree looks a lot different than the document used to create it. -Almost none of these problems turn out to be problems with Beautiful -Soup. This is not because Beautiful Soup is an amazingly well-written +These problems are almost never problems with Beautiful Soup itself. +This is not because Beautiful Soup is an amazingly well-written piece of software. It's because Beautiful Soup doesn't include any parsing code. Instead, it relies on external parsers. If one parser isn't working on a certain document, the best solution is to try a @@ -3257,7 +3296,7 @@ The most common type of unexpected behavior is that you can't find a tag that you know is in the document. You saw it going in, but ``find_all()`` returns ``[]`` or ``find()`` returns ``None``. This is another common problem with Python's built-in HTML parser, which -sometimes skips tags it doesn't understand. Again, the best solution is to +sometimes skips tags it doesn't understand. Again, the best solution is to :ref:`install lxml or html5lib. <parser-installation>` Version mismatch problems @@ -3326,7 +3365,7 @@ Miscellaneous your console doesn't know how to display. (See `this page on the Python wiki <http://wiki.python.org/moin/PrintFails>`_ for help.) Second, when you're writing to a file and you pass in a Unicode - character that's not supported by your default encoding. In this + character that's not supported by your default encoding. In this case, the simplest solution is to explicitly encode the Unicode string into UTF-8 with ``u.encode("utf8")``. @@ -3447,10 +3486,10 @@ becomes this:: is that you're trying to run Beautiful Soup 4 code, but you only have Beautiful Soup 3 installed. -Although BS4 is mostly backwards-compatible with BS3, most of its +Although BS4 is mostly backward-compatible with BS3, most of its methods have been deprecated and given new names for `PEP 8 compliance <http://www.python.org/dev/peps/pep-0008/>`_. There are numerous other -renames and changes, and a few of them break backwards compatibility. +renames and changes, and a few of them break backward compatibility. Here's what you'll need to know to convert your BS3 code and habits to BS4: @@ -3467,7 +3506,7 @@ may find that Beautiful Soup 4 gives you a different parse tree than Beautiful Soup 3 for the same markup. If you swap out ``html.parser`` for lxml or html5lib, you may find that the parse tree changes yet again. If this happens, you'll need to update your scraping code to -deal with the new tree. +process the new tree. Method names ^^^^^^^^^^^^ @@ -3507,14 +3546,14 @@ I renamed one attribute to use more accurate terminology: I renamed three attributes to avoid using words that have special meaning to Python. Unlike the others, these changes are *not backwards compatible.* If you used these attributes in BS3, your code will break -on BS4 until you change them. +in BS4 until you change them. * ``UnicodeDammit.unicode`` -> ``UnicodeDammit.unicode_markup`` * ``Tag.next`` -> ``Tag.next_element`` * ``Tag.previous`` -> ``Tag.previous_element`` These methods are left over from the Beautiful Soup 2 API. They've -been deprecated since 2006, and should not be used at all: +been deprecated since 2006 and should not be used at all: * ``Tag.fetchNextSiblings`` * ``Tag.fetchPreviousSiblings`` @@ -3602,7 +3641,7 @@ contains a single tag B and nothing else, then A.string is the same as B.string. (Previously, it was None.) `Multi-valued attributes`_ like ``class`` have lists of strings as -their values, not strings. This may affect the way you search by CSS +their values, not simple strings. This may affect the way you search by CSS class. :py:class:`Tag` objects now implement the ``__hash__`` method, such that two @@ -3613,8 +3652,8 @@ objects into a dictionary or set. If you pass one of the ``find*`` methods both :ref:`string <string>` `and` a tag-specific argument like :ref:`name <name>`, Beautiful Soup will search for tags that match your tag-specific criteria and whose -:ref:`Tag.string <.string>` matches your value for :ref:`string -<string>`. It will `not` find the strings themselves. Previously, +:ref:`Tag.string <.string>` matches your :ref:`string <string>` +value. It will `not` find the strings themselves. Previously, Beautiful Soup ignored the tag-specific arguments and looked for strings. diff --git a/pyproject.toml b/pyproject.toml index 31b0b7f..b4c2da9 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -7,7 +7,7 @@ name = "beautifulsoup4" dynamic = ["version"] description = "Screen-scraping library" readme = "README.md" -license = "MIT" +license = { text = "MIT License" } requires-python = ">=3.6.0" authors = [ { name = "Leonard Richardson", email = "leonardr@segfault.org" }, |