Ran through all of the documentation code examples using Python 3, corrected discrepancies and errors, and updated representations.

author: Leonard Richardson <leonardr@segfault.org> 2020-07-29 22:43:48 -0400
committer: Leonard Richardson <leonardr@segfault.org> 2020-07-29 22:43:48 -0400
commit: bd479f6ba3ed9db76d26cf36f12f1e9744f85ce4 (patch)
tree: 3eaea193cfff6a82ce28eb30f9db2bd47127b003 /doc
parent: 89bbbf3626a783cc15484cedbb4c5a663d95e824 (diff)
1 files changed, 458 insertions, 473 deletions
diff --git a/doc/source/index.rst b/doc/source/index.rst
index f655327..76a32e9 100644
--- a/doc/source/index.rst
+++ b/doc/source/index.rst
@@ -54,8 +54,7 @@ Quick Start
 Here's an HTML document I'll be using as an example throughout this
 document. It's part of a story from `Alice in Wonderland`::
 
- html_doc = """
- <html><head><title>The Dormouse's story</title></head>
+ html_doc = """<html><head><title>The Dormouse's story</title></head>
  <body>
  <p class="title"><b>The Dormouse's story</b></p>
 
@@ -186,7 +185,7 @@ works on Python 2 and Python 3. Make sure you use the right version of
 
 :kbd:`$ pip install beautifulsoup4`
 
-(The ``BeautifulSoup`` package is probably `not` what you want. That's
+(The ``BeautifulSoup`` package is `not` what you want. That's
 the previous major release, `Beautiful Soup 3`_. Lots of software uses
 BS3, so it's still available, but if you're writing new code you
 should install ``beautifulsoup4``.)
@@ -307,14 +306,14 @@ constructor. You can pass in a string or an open filehandle::
  from bs4 import BeautifulSoup
 
  with open("index.html") as fp:
-     soup = BeautifulSoup(fp)
+     soup = BeautifulSoup(fp, 'html.parser')
 
- soup = BeautifulSoup("<html>a web page</html>")
+ soup = BeautifulSoup("<html>a web page</html>", 'html.parser')
 
 First, the document is converted to Unicode, and HTML entities are
 converted to Unicode characters::
 
- print(BeautifulSoup("<html><head></head><body>Sacr&eacute; bleu!</body></html>"))
+ print(BeautifulSoup("<html><head></head><body>Sacr&eacute; bleu!</body></html>", "html.parser"))
  # <html><head></head><body>Sacré bleu!</body></html>
 
 Beautiful Soup then parses the document using the best available
@@ -336,7 +335,7 @@ and ``Comment``.
 
 A ``Tag`` object corresponds to an XML or HTML tag in the original document::
 
- soup = BeautifulSoup('<b class="boldest">Extremely bold</b>')
+ soup = BeautifulSoup('<b class="boldest">Extremely bold</b>', 'html.parser')
  tag = soup.b
  type(tag)
  # <class 'bs4.element.Tag'>
@@ -351,7 +350,7 @@ Name
 Every tag has a name, accessible as ``.name``::
 
  tag.name
- # u'b'
+ # 'b'
 
 If you change a tag's name, the change will be reflected in any HTML
 markup generated by Beautiful Soup::
@@ -368,13 +367,14 @@ id="boldest">`` has an attribute "id" whose value is
 "boldest". You can access a tag's attributes by treating the tag like
 a dictionary::
 
+ tag = BeautifulSoup('<b id="boldest">bold</b>', 'html.parser').b
  tag['id']
- # u'boldest'
+ # 'boldest'
 
 You can access that dictionary directly as ``.attrs``::
 
  tag.attrs
- # {u'id': 'boldest'}
+ # {'id': 'boldest'}
 
 You can add, remove, and modify a tag's attributes. Again, this is
 done by treating the tag as a dictionary::
@@ -387,11 +387,11 @@ done by treating the tag as a dictionary::
  del tag['id']
  del tag['another-attribute']
  tag
- # <b></b>
+ # <b>bold</b>
 
  tag['id']
  # KeyError: 'id'
- print(tag.get('id'))
+ tag.get('id')
  # None
 
 .. _multivalue:
@@ -406,26 +406,26 @@ one CSS class). Others include ``rel``, ``rev``, ``accept-charset``,
 ``headers``, and ``accesskey``. Beautiful Soup presents the value(s)
 of a multi-valued attribute as a list::
 
- css_soup = BeautifulSoup('<p class="body"></p>')
+ css_soup = BeautifulSoup('<p class="body"></p>', 'html.parser')
  css_soup.p['class']
- # ["body"]
+ # ['body']
   
- css_soup = BeautifulSoup('<p class="body strikeout"></p>')
+ css_soup = BeautifulSoup('<p class="body strikeout"></p>', 'html.parser')
  css_soup.p['class']
- # ["body", "strikeout"]
+ # ['body', 'strikeout']
 
 If an attribute `looks` like it has more than one value, but it's not
 a multi-valued attribute as defined by any version of the HTML
 standard, Beautiful Soup will leave the attribute alone::
 
- id_soup = BeautifulSoup('<p id="my id"></p>')
+ id_soup = BeautifulSoup('<p id="my id"></p>', 'html.parser')
  id_soup.p['id']
  # 'my id'
 
 When you turn a tag back into a string, multiple attribute values are
 consolidated::
 
- rel_soup = BeautifulSoup('<p>Back to the <a rel="index">homepage</a></p>')
+ rel_soup = BeautifulSoup('<p>Back to the <a rel="index">homepage</a></p>', 'html.parser')
  rel_soup.a['rel']
  # ['index']
  rel_soup.a['rel'] = ['index', 'contents']
@@ -435,34 +435,34 @@ consolidated::
 You can disable this by passing ``multi_valued_attributes=None`` as a
 keyword argument into the ``BeautifulSoup`` constructor::
 
-  no_list_soup = BeautifulSoup('<p class="body strikeout"></p>', 'html', multi_valued_attributes=None)
-  no_list_soup.p['class']
-  # u'body strikeout'
+ no_list_soup = BeautifulSoup('<p class="body strikeout"></p>', 'html.parser', multi_valued_attributes=None)
+ no_list_soup.p['class']
+ # 'body strikeout'
 
 You can use ```get_attribute_list`` to get a value that's always a
 list, whether or not it's a multi-valued atribute::
 
-  id_soup.p.get_attribute_list('id')
-  # ["my id"]
+ id_soup.p.get_attribute_list('id')
+ # ["my id"]
  
 If you parse a document as XML, there are no multi-valued attributes::
 
  xml_soup = BeautifulSoup('<p class="body strikeout"></p>', 'xml')
  xml_soup.p['class']
- # u'body strikeout'
+ # 'body strikeout'
 
 Again, you can configure this using the ``multi_valued_attributes`` argument::
 
-  class_is_multi= { '*' : 'class'}
-  xml_soup = BeautifulSoup('<p class="body strikeout"></p>', 'xml', multi_valued_attributes=class_is_multi)
-  xml_soup.p['class']
-  # [u'body', u'strikeout']
+ class_is_multi= { '*' : 'class'}
+ xml_soup = BeautifulSoup('<p class="body strikeout"></p>', 'xml', multi_valued_attributes=class_is_multi)
+ xml_soup.p['class']
+ # ['body', 'strikeout']
 
 You probably won't need to do this, but if you do, use the defaults as
 a guide. They implement the rules described in the HTML specification::
 
-  from bs4.builder import builder_registry
-  builder_registry.lookup('html').DEFAULT_CDATA_LIST_ATTRIBUTES
+ from bs4.builder import builder_registry
+ builder_registry.lookup('html').DEFAULT_CDATA_LIST_ATTRIBUTES
 
   
 ``NavigableString``
@@ -471,28 +471,31 @@ a guide. They implement the rules described in the HTML specification::
 A string corresponds to a bit of text within a tag. Beautiful Soup
 uses the ``NavigableString`` class to contain these bits of text::
 
+ soup = BeautifulSoup('<b class="boldest">Extremely bold</b>', 'html.parser')
+ tag = soup.b
  tag.string
- # u'Extremely bold'
+ # 'Extremely bold'
  type(tag.string)
  # <class 'bs4.element.NavigableString'>
 
 A ``NavigableString`` is just like a Python Unicode string, except
 that it also supports some of the features described in `Navigating
 the tree`_ and `Searching the tree`_. You can convert a
-``NavigableString`` to a Unicode string with ``unicode()``::
+``NavigableString`` to a Unicode string with ``unicode()`` (in
+Python 2) or ``str`` (in Python 3)::
 
- unicode_string = unicode(tag.string)
+ unicode_string = str(tag.string)
  unicode_string
- # u'Extremely bold'
+ # 'Extremely bold'
  type(unicode_string)
- # <type 'unicode'>
+ # <type 'str'>
 
 You can't edit a string in place, but you can replace one string with
 another, using :ref:`replace_with()`::
 
  tag.string.replace_with("No longer bold")
  tag
- # <blockquote>No longer bold</blockquote>
+ # <b class="boldest">No longer bold</b>
 
 ``NavigableString`` supports most of the features described in
 `Navigating the tree`_ and `Searching the tree`_, but not all of
@@ -518,13 +521,13 @@ You can also pass a ``BeautifulSoup`` object into one of the methods
 defined in `Modifying the tree`_, just as you would a :ref:`Tag`. This
 lets you do things like combine two parsed documents::
 
-  doc = BeautifulSoup("<document><content/>INSERT FOOTER HERE</document", "xml")
-  footer = BeautifulSoup("<footer>Here's the footer</footer>", "xml")
-  doc.find(text="INSERT FOOTER HERE").replace_with(footer)
-  # u'INSERT FOOTER HERE'
-  print(doc)
-  # <?xml version="1.0" encoding="utf-8"?>
-  # <document><content/><footer>Here's the footer</footer></document>
+ doc = BeautifulSoup("<document><content/>INSERT FOOTER HERE</document", "xml")
+ footer = BeautifulSoup("<footer>Here's the footer</footer>", "xml")
+ doc.find(text="INSERT FOOTER HERE").replace_with(footer)
+ # 'INSERT FOOTER HERE'
+ print(doc)
+ # <?xml version="1.0" encoding="utf-8"?>
+ # <document><content/><footer>Here's the footer</footer></document>
 
 Since the ``BeautifulSoup`` object doesn't correspond to an actual
 HTML or XML tag, it has no name and no attributes. But sometimes it's
@@ -532,7 +535,7 @@ useful to look at its ``.name``, so it's been given the special
 ``.name`` "[document]"::
 
  soup.name
- # u'[document]'
+ # '[document]'
 
 Comments and other special strings
 ----------------------------------
@@ -543,7 +546,7 @@ leftover bits. The main one you'll probably encounter
 is the comment::
 
  markup = "<b><!--Hey, buddy. Want to buy a used parser?--></b>"
- soup = BeautifulSoup(markup)
+ soup = BeautifulSoup(markup, 'html.parser')
  comment = soup.b.string
  type(comment)
  # <class 'bs4.element.Comment'>
@@ -551,7 +554,7 @@ is the comment::
 The ``Comment`` object is just a special type of ``NavigableString``::
 
  comment
- # u'Hey, buddy. Want to buy a used parser'
+ # 'Hey, buddy. Want to buy a used parser'
 
 But when it appears as part of an HTML document, a ``Comment`` is
 displayed with special formatting::
@@ -666,13 +669,13 @@ A tag's children are available in a list called ``.contents``::
  # <head><title>The Dormouse's story</title></head>
 
  head_tag.contents
- [<title>The Dormouse's story</title>]
+ # [<title>The Dormouse's story</title>]
 
  title_tag = head_tag.contents[0]
  title_tag
  # <title>The Dormouse's story</title>
  title_tag.contents
- # [u'The Dormouse's story']
+ # ['The Dormouse's story']
 
 The ``BeautifulSoup`` object itself has children. In this case, the
 <html> tag is the child of the ``BeautifulSoup`` object.::
@@ -680,7 +683,7 @@ The ``BeautifulSoup`` object itself has children. In this case, the
  len(soup.contents)
  # 1
  soup.contents[0].name
- # u'html'
+ # 'html'
 
 A string does not have ``.contents``, because it can't contain
 anything::
@@ -725,7 +728,7 @@ descendants::
  len(list(soup.children))
  # 1
  len(list(soup.descendants))
- # 25
+ # 26
 
 .. _.string:
 
@@ -736,7 +739,7 @@ If a tag has only one child, and that child is a ``NavigableString``,
 the child is made available as ``.string``::
 
  title_tag.string
- # u'The Dormouse's story'
+ # 'The Dormouse's story'
 
 If a tag's only child is another tag, and `that` tag has a
 ``.string``, then the parent tag is considered to have the same
@@ -746,7 +749,7 @@ If a tag's only child is another tag, and `that` tag has a
  # [<title>The Dormouse's story</title>]
 
  head_tag.string
- # u'The Dormouse's story'
+ # 'The Dormouse's story'
 
 If a tag contains more than one thing, then it's not clear what
 ``.string`` should refer to, so ``.string`` is defined to be
@@ -765,36 +768,38 @@ just the strings. Use the ``.strings`` generator::
 
  for string in soup.strings:
      print(repr(string))
- # u"The Dormouse's story"
- # u'\n\n'
- # u"The Dormouse's story"
- # u'\n\n'
- # u'Once upon a time there were three little sisters; and their names were\n'
- # u'Elsie'
- # u',\n'
- # u'Lacie'
- # u' and\n'
- # u'Tillie'
- # u';\nand they lived at the bottom of a well.'
- # u'\n\n'
- # u'...'
- # u'\n'
+     '\n'
+ # "The Dormouse's story"
+ # '\n'
+ # '\n'
+ # "The Dormouse's story"
+ # '\n'
+ # 'Once upon a time there were three little sisters; and their names were\n'
+ # 'Elsie'
+ # ',\n'
+ # 'Lacie'
+ # ' and\n'
+ # 'Tillie'
+ # ';\nand they lived at the bottom of a well.'
+ # '\n'
+ # '...'
+ # '\n'
 
 These strings tend to have a lot of extra whitespace, which you can
 remove by using the ``.stripped_strings`` generator instead::
 
  for string in soup.stripped_strings:
      print(repr(string))
- # u"The Dormouse's story"
- # u"The Dormouse's story"
- # u'Once upon a time there were three little sisters; and their names were'
- # u'Elsie'
- # u','
- # u'Lacie'
- # u'and'
- # u'Tillie'
- # u';\nand they lived at the bottom of a well.'
- # u'...'
+ # "The Dormouse's story"
+ # "The Dormouse's story"
+ # 'Once upon a time there were three little sisters; and their names were'
+ # 'Elsie'
+ # ','
+ # 'Lacie'
+ # 'and'
+ # 'Tillie'
+ # ';\n and they lived at the bottom of a well.'
+ # '...'
 
 Here, strings consisting entirely of whitespace are ignored, and
 whitespace at the beginning and end of strings is removed.
@@ -851,25 +856,19 @@ buried deep within the document, to the very top of the document::
  link
  # <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>
  for parent in link.parents:
-     if parent is None:
-         print(parent)
-     else:
-         print(parent.name)
+     print(parent.name)
  # p
  # body
  # html
  # [document]
- # None
 
 Going sideways
 --------------
 
 Consider a simple document like this::
 
- sibling_soup = BeautifulSoup("<a><b>text1</b><c>text2</c></b></a>")
+ sibling_soup = BeautifulSoup("<a><b>text1</b><c>text2</c></b></a>", 'html.parser')
  print(sibling_soup.prettify())
- # <html>
- #  <body>
  #   <a>
  #    <b>
  #     text1
@@ -878,8 +877,6 @@ Consider a simple document like this::
  #     text2
  #    </c>
  #   </a>
- #  </body>
- # </html>
 
 The <b> tag and the <c> tag are at the same level: they're both direct
 children of the same tag. We call them `siblings`. When a document is
@@ -912,7 +909,7 @@ The strings "text1" and "text2" are `not` siblings, because they don't
 have the same parent::
 
  sibling_soup.b.string
- # u'text1'
+ # 'text1'
 
  print(sibling_soup.b.string.next_sibling)
  # None
@@ -921,9 +918,9 @@ In real documents, the ``.next_sibling`` or ``.previous_sibling`` of a
 tag will usually be a string containing whitespace. Going back to the
 "three sisters" document::
 
- <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>
- <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a>
- <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>
+ # <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>
+ # <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a>
+ # <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>
 
 You might think that the ``.next_sibling`` of the first <a> tag would
 be the second <a> tag. But actually, it's a string: the comma and
@@ -934,7 +931,7 @@ newline that separate the first <a> tag from the second::
  # <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>
 
  link.next_sibling
- # u',\n'
+ # ',\n '
 
 The second <a> tag is actually the ``.next_sibling`` of the comma::
 
@@ -951,29 +948,27 @@ You can iterate over a tag's siblings with ``.next_siblings`` or
 
  for sibling in soup.a.next_siblings:
      print(repr(sibling))
- # u',\n'
+ # ',\n'
  # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>
- # u' and\n'
+ # ' and\n'
  # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
- # u'; and they lived at the bottom of a well.'
- # None
+ # '; and they lived at the bottom of a well.'
 
  for sibling in soup.find(id="link3").previous_siblings:
      print(repr(sibling))
  # ' and\n'
  # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>
- # u',\n'
+ # ',\n'
  # <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>
- # u'Once upon a time there were three little sisters; and their names were\n'
- # None
+ # 'Once upon a time there were three little sisters; and their names were\n'
 
 Going back and forth
 --------------------
 
 Take a look at the beginning of the "three sisters" document::
 
- <html><head><title>The Dormouse's story</title></head>
- <p class="title"><b>The Dormouse's story</b></p>
+ # <html><head><title>The Dormouse's story</title></head>
+ # <p class="title"><b>The Dormouse's story</b></p>
 
 An HTML parser takes this string of characters and turns it into a
 series of events: "open an <html> tag", "open a <head> tag", "open a
@@ -999,14 +994,14 @@ interrupted by the start of the <a> tag.::
  # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
 
  last_a_tag.next_sibling
- # '; and they lived at the bottom of a well.'
+ # ';\nand they lived at the bottom of a well.'
 
 But the ``.next_element`` of that <a> tag, the thing that was parsed
 immediately after the <a> tag, is `not` the rest of that sentence:
 it's the word "Tillie"::
 
  last_a_tag.next_element
- # u'Tillie'
+ # 'Tillie'
 
 That's because in the original markup, the word "Tillie" appeared
 before that semicolon. The parser encountered an <a> tag, then the
@@ -1019,7 +1014,7 @@ The ``.previous_element`` attribute is the exact opposite of
 immediately before this one::
 
  last_a_tag.previous_element
- # u' and\n'
+ # ' and\n'
  last_a_tag.previous_element.next_element
  # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
 
@@ -1031,13 +1026,12 @@ forward or backward in the document as it was parsed::
 
  for element in last_a_tag.next_elements:
      print(repr(element))
- # u'Tillie'
- # u';\nand they lived at the bottom of a well.'
- # u'\n\n'
+ # 'Tillie'
+ # ';\nand they lived at the bottom of a well.'
+ # '\n'
  # <p class="story">...</p>
- # u'...'
- # u'\n'
- # None
+ # '...'
+ # '\n'
 
 Searching the tree
 ==================
@@ -1188,8 +1182,10 @@ If you pass in a function to filter on a specific attribute like
 value, not the whole tag. Here's a function that finds all ``a`` tags
 whose ``href`` attribute *does not* match a regular expression::
 
+ import re
  def not_lacie(href):
      return href and not re.compile("lacie").search(href)
+ 
  soup.find_all(href=not_lacie)
  # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
  #  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
@@ -1204,7 +1200,8 @@ objects::
              and isinstance(tag.previous_element, NavigableString))
 
  for tag in soup.find_all(surrounded_by_strings):
-     print tag.name
+     print(tag.name)
+ # body
  # p
  # a
  # a
@@ -1216,7 +1213,7 @@ Now we're ready to look at the search methods in detail.
 ``find_all()``
 --------------
 
-Signature: find_all(:ref:`name <name>`, :ref:`attrs <attrs>`, :ref:`recursive
+Method signature: find_all(:ref:`name <name>`, :ref:`attrs <attrs>`, :ref:`recursive
 <recursive>`, :ref:`string <string>`, :ref:`limit <limit>`, :ref:`**kwargs <kwargs>`)
 
 The ``find_all()`` method looks through a tag's descendants and
@@ -1239,7 +1236,7 @@ examples in `Kinds of filters`_, but here are a few more::
 
  import re
  soup.find(string=re.compile("sisters"))
- # u'Once upon a time there were three little sisters; and their names were\n'
+ # 'Once upon a time there were three little sisters; and their names were\n'
 
 Some of these should look familiar, but others are new. What does it
 mean to pass in a value for ``string``, or ``id``? Why does
@@ -1297,12 +1294,12 @@ You can filter multiple attributes at once by passing in more than one
 keyword argument::
 
  soup.find_all(href=re.compile("elsie"), id='link1')
- # [<a class="sister" href="http://example.com/elsie" id="link1">three</a>]
+ # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>]
 
 Some attributes, like the data-* attributes in HTML 5, have names that
 can't be used as the names of keyword arguments::
 
- data_soup = BeautifulSoup('<div data-foo="value">foo!</div>')
+ data_soup = BeautifulSoup('<div data-foo="value">foo!</div>', 'html.parser')
  data_soup.find_all(data-foo="value")
  # SyntaxError: keyword can't be an expression
 
@@ -1318,7 +1315,7 @@ because Beautiful Soup uses the ``name`` argument to contain the name
 of the tag itself. Instead, you can give a value to 'name' in the
 ``attrs`` argument::
 
- name_soup = BeautifulSoup('<input name="email"/>')
+ name_soup = BeautifulSoup('<input name="email"/>', 'html.parser')
  name_soup.find_all(name="email")
  # []
  name_soup.find_all(attrs={"name": "email"})
@@ -1359,7 +1356,7 @@ values for its "class" attribute. When you search for a tag that
 matches a certain CSS class, you're matching against `any` of its CSS
 classes::
 
- css_soup = BeautifulSoup('<p class="body strikeout"></p>')
+ css_soup = BeautifulSoup('<p class="body strikeout"></p>', 'html.parser')
  css_soup.find_all("p", class_="strikeout")
  # [<p class="body strikeout"></p>]
 
@@ -1403,20 +1400,20 @@ regular expression`_, `a list`_, `a function`_, or `the value True`_.
 Here are some examples::
 
  soup.find_all(string="Elsie")
- # [u'Elsie']
+ # ['Elsie']
 
  soup.find_all(string=["Tillie", "Elsie", "Lacie"])
- # [u'Elsie', u'Lacie', u'Tillie']
+ # ['Elsie', 'Lacie', 'Tillie']
 
  soup.find_all(string=re.compile("Dormouse"))
- [u"The Dormouse's story", u"The Dormouse's story"]
+ # ["The Dormouse's story", "The Dormouse's story"]
 
  def is_the_only_string_within_a_tag(s):
      """Return True if this string is the only child of its parent tag."""
      return (s == s.parent.string)
 
  soup.find_all(string=is_the_only_string_within_a_tag)
- # [u"The Dormouse's story", u"The Dormouse's story", u'Elsie', u'Lacie', u'Tillie', u'...']
+ # ["The Dormouse's story", "The Dormouse's story", 'Elsie', 'Lacie', 'Tillie', '...']
 
 Although ``string`` is for finding strings, you can combine it with
 arguments that find tags: Beautiful Soup will find all tags whose
@@ -1509,7 +1506,7 @@ These two lines are also equivalent::
 ``find()``
 ----------
 
-Signature: find(:ref:`name <name>`, :ref:`attrs <attrs>`, :ref:`recursive
+Method signature: find(:ref:`name <name>`, :ref:`attrs <attrs>`, :ref:`recursive
 <recursive>`, :ref:`string <string>`, :ref:`**kwargs <kwargs>`)
 
 The ``find_all()`` method scans the entire document looking for
@@ -1546,9 +1543,9 @@ names`_? That trick works by repeatedly calling ``find()``::
 ``find_parents()`` and ``find_parent()``
 ----------------------------------------
 
-Signature: find_parents(:ref:`name <name>`, :ref:`attrs <attrs>`, :ref:`string <string>`, :ref:`limit <limit>`, :ref:`**kwargs <kwargs>`)
+Method signature: find_parents(:ref:`name <name>`, :ref:`attrs <attrs>`, :ref:`string <string>`, :ref:`limit <limit>`, :ref:`**kwargs <kwargs>`)
 
-Signature: find_parent(:ref:`name <name>`, :ref:`attrs <attrs>`, :ref:`string <string>`, :ref:`**kwargs <kwargs>`)
+Method signature: find_parent(:ref:`name <name>`, :ref:`attrs <attrs>`, :ref:`string <string>`, :ref:`**kwargs <kwargs>`)
 
 I spent a lot of time above covering ``find_all()`` and
 ``find()``. The Beautiful Soup API defines ten other methods for
@@ -1564,22 +1561,22 @@ do the opposite: they work their way `up` the tree, looking at a tag's
 (or a string's) parents. Let's try them out, starting from a string
 buried deep in the "three daughters" document::
 
-  a_string = soup.find(string="Lacie")
-  a_string
-  # u'Lacie'
+ a_string = soup.find(string="Lacie")
+ a_string
+ # 'Lacie'
 
-  a_string.find_parents("a")
-  # [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]
+ a_string.find_parents("a")
+ # [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]
 
-  a_string.find_parent("p")
-  # <p class="story">Once upon a time there were three little sisters; and their names were
-  #  <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
-  #  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and
-  #  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>;
-  #  and they lived at the bottom of a well.</p>
+ a_string.find_parent("p")
+ # <p class="story">Once upon a time there were three little sisters; and their names were
+ #  <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
+ #  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and
+ #  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>;
+ #  and they lived at the bottom of a well.</p>
 
-  a_string.find_parents("p", class="title")
-  # []
+ a_string.find_parents("p", class_="title")
+ # []
 
 One of the three <a> tags is the direct parent of the string in
 question, so our search finds it. One of the three <p> tags is an
@@ -1597,9 +1594,9 @@ each one against the provided filter to see if it matches.
 ``find_next_siblings()`` and ``find_next_sibling()``
 ----------------------------------------------------
 
-Signature: find_next_siblings(:ref:`name <name>`, :ref:`attrs <attrs>`, :ref:`string <string>`, :ref:`limit <limit>`, :ref:`**kwargs <kwargs>`)
+Method signature: find_next_siblings(:ref:`name <name>`, :ref:`attrs <attrs>`, :ref:`string <string>`, :ref:`limit <limit>`, :ref:`**kwargs <kwargs>`)
 
-Signature: find_next_sibling(:ref:`name <name>`, :ref:`attrs <attrs>`, :ref:`string <string>`, :ref:`**kwargs <kwargs>`)
+Method signature: find_next_sibling(:ref:`name <name>`, :ref:`attrs <attrs>`, :ref:`string <string>`, :ref:`**kwargs <kwargs>`)
 
 These methods use :ref:`.next_siblings <sibling-generators>` to
 iterate over the rest of an element's siblings in the tree. The
@@ -1621,9 +1618,9 @@ and ``find_next_sibling()`` only returns the first one::
 ``find_previous_siblings()`` and ``find_previous_sibling()``
 ------------------------------------------------------------
 
-Signature: find_previous_siblings(:ref:`name <name>`, :ref:`attrs <attrs>`, :ref:`string <string>`, :ref:`limit <limit>`, :ref:`**kwargs <kwargs>`)
+Method signature: find_previous_siblings(:ref:`name <name>`, :ref:`attrs <attrs>`, :ref:`string <string>`, :ref:`limit <limit>`, :ref:`**kwargs <kwargs>`)
 
-Signature: find_previous_sibling(:ref:`name <name>`, :ref:`attrs <attrs>`, :ref:`string <string>`, :ref:`**kwargs <kwargs>`)
+Method signature: find_previous_sibling(:ref:`name <name>`, :ref:`attrs <attrs>`, :ref:`string <string>`, :ref:`**kwargs <kwargs>`)
 
 These methods use :ref:`.previous_siblings <sibling-generators>` to iterate over an element's
 siblings that precede it in the tree. The ``find_previous_siblings()``
@@ -1646,9 +1643,9 @@ method returns all the siblings that match, and
 ``find_all_next()`` and ``find_next()``
 ---------------------------------------
 
-Signature: find_all_next(:ref:`name <name>`, :ref:`attrs <attrs>`, :ref:`string <string>`, :ref:`limit <limit>`, :ref:`**kwargs <kwargs>`)
+Method signature: find_all_next(:ref:`name <name>`, :ref:`attrs <attrs>`, :ref:`string <string>`, :ref:`limit <limit>`, :ref:`**kwargs <kwargs>`)
 
-Signature: find_next(:ref:`name <name>`, :ref:`attrs <attrs>`, :ref:`string <string>`, :ref:`**kwargs <kwargs>`)
+Method signature: find_next(:ref:`name <name>`, :ref:`attrs <attrs>`, :ref:`string <string>`, :ref:`**kwargs <kwargs>`)
 
 These methods use :ref:`.next_elements <element-generators>` to
 iterate over whatever tags and strings that come after it in the
@@ -1660,8 +1657,8 @@ document. The ``find_all_next()`` method returns all matches, and
  # <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>
 
  first_link.find_all_next(string=True)
- # [u'Elsie', u',\n', u'Lacie', u' and\n', u'Tillie',
- #  u';\nand they lived at the bottom of a well.', u'\n\n', u'...', u'\n']
+ # ['Elsie', ',\n', 'Lacie', ' and\n', 'Tillie',
+ #  ';\nand they lived at the bottom of a well.', '\n', '...', '\n']
 
  first_link.find_next("p")
  # <p class="story">...</p>
@@ -1676,9 +1673,9 @@ show up later in the document than the starting element.
 ``find_all_previous()`` and ``find_previous()``
 -----------------------------------------------
 
-Signature: find_all_previous(:ref:`name <name>`, :ref:`attrs <attrs>`, :ref:`string <string>`, :ref:`limit <limit>`, :ref:`**kwargs <kwargs>`)
+Method signature: find_all_previous(:ref:`name <name>`, :ref:`attrs <attrs>`, :ref:`string <string>`, :ref:`limit <limit>`, :ref:`**kwargs <kwargs>`)
 
-Signature: find_previous(:ref:`name <name>`, :ref:`attrs <attrs>`, :ref:`string <string>`, :ref:`**kwargs <kwargs>`)
+Method signature: find_previous(:ref:`name <name>`, :ref:`attrs <attrs>`, :ref:`string <string>`, :ref:`**kwargs <kwargs>`)
 
 These methods use :ref:`.previous_elements <element-generators>` to
 iterate over the tags and strings that came before it in the
@@ -1837,9 +1834,9 @@ selectors.::
  soup.select("child")
  # [<ns1:child>I'm in namespace 1</ns1:child>, <ns2:child>I'm in namespace 2</ns2:child>]
 
- soup.select("ns1|child", namespaces=namespaces)
+ soup.select("ns1|child", namespaces=soup.namespaces)
  # [<ns1:child>I'm in namespace 1</ns1:child>]
-
+ 
 When handling a CSS selector that uses namespaces, Beautiful Soup
 uses the namespace abbreviations it found when parsing the
 document. You can override this by passing in your own dictionary of
@@ -1869,7 +1866,7 @@ I covered this earlier, in `Attributes`_, but it bears repeating. You
 can rename a tag, change the values of its attributes, add new
 attributes, and delete attributes::
 
- soup = BeautifulSoup('<b class="boldest">Extremely bold</b>')
+ soup = BeautifulSoup('<b class="boldest">Extremely bold</b>', 'html.parser')
  tag = soup.b
 
  tag.name = "blockquote"
@@ -1889,13 +1886,13 @@ Modifying ``.string``
 If you set a tag's ``.string`` attribute to a new string, the tag's contents are
 replaced with that string::
 
-  markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
-  soup = BeautifulSoup(markup)
+ markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
+ soup = BeautifulSoup(markup, 'html.parser')
 
-  tag = soup.a
-  tag.string = "New link text."
-  tag
-  # <a href="http://example.com/">New link text.</a>
+ tag = soup.a
+ tag.string = "New link text."
+ tag
+ # <a href="http://example.com/">New link text.</a>
   
 Be careful: if the tag contained other tags, they and all their
 contents will be destroyed.  
@@ -1906,13 +1903,13 @@ contents will be destroyed.
 You can add to a tag's contents with ``Tag.append()``. It works just
 like calling ``.append()`` on a Python list::
 
-   soup = BeautifulSoup("<a>Foo</a>")
-   soup.a.append("Bar")
+ soup = BeautifulSoup("<a>Foo</a>", 'html.parser')
+ soup.a.append("Bar")
 
-   soup
-   # <html><head></head><body><a>FooBar</a></body></html>
-   soup.a.contents
-   # [u'Foo', u'Bar']
+ soup
+ # <a>FooBar</a>
+ soup.a.contents
+ # ['Foo', 'Bar']
 
 ``extend()``
 ------------
@@ -1921,13 +1918,13 @@ Starting in Beautiful Soup 4.7.0, ``Tag`` also supports a method
 called ``.extend()``, which works just like calling ``.extend()`` on a
 Python list::
 
-   soup = BeautifulSoup("<a>Soup</a>")
-   soup.a.extend(["'s", " ", "on"])
+ soup = BeautifulSoup("<a>Soup</a>", 'html.parser')
+ soup.a.extend(["'s", " ", "on"])
 
-   soup
-   # <html><head></head><body><a>Soup's on</a></body></html>
-   soup.a.contents
-   # [u'Soup', u''s', u' ', u'on']
+ soup
+ # <a>Soup's on</a>
+ soup.a.contents
+ # ['Soup', ''s', ' ', 'on']
    
 ``NavigableString()`` and ``.new_tag()``
 -------------------------------------------------
@@ -1936,43 +1933,43 @@ If you need to add a string to a document, no problem--you can pass a
 Python string in to ``append()``, or you can call the ``NavigableString``
 constructor::
 
-   soup = BeautifulSoup("<b></b>")
-   tag = soup.b
-   tag.append("Hello")
-   new_string = NavigableString(" there")
-   tag.append(new_string)
-   tag
-   # <b>Hello there.</b>
-   tag.contents
-   # [u'Hello', u' there']
+ soup = BeautifulSoup("<b></b>", 'html.parser')
+ tag = soup.b
+ tag.append("Hello")
+ new_string = NavigableString(" there")
+ tag.append(new_string)
+ tag
+ # <b>Hello there.</b>
+ tag.contents
+ # ['Hello', ' there']
 
 If you want to create a comment or some other subclass of
 ``NavigableString``, just call the constructor::
 
-   from bs4 import Comment
-   new_comment = Comment("Nice to see you.")
-   tag.append(new_comment)
-   tag
-   # <b>Hello there<!--Nice to see you.--></b>
-   tag.contents
-   # [u'Hello', u' there', u'Nice to see you.']
+ from bs4 import Comment
+ new_comment = Comment("Nice to see you.")
+ tag.append(new_comment)
+ tag
+ # <b>Hello there<!--Nice to see you.--></b>
+ tag.contents
+ # ['Hello', ' there', 'Nice to see you.']
 
 `(This is a new feature in Beautiful Soup 4.4.0.)`
 
 What if you need to create a whole new tag?  The best solution is to
 call the factory method ``BeautifulSoup.new_tag()``::
 
-   soup = BeautifulSoup("<b></b>")
-   original_tag = soup.b
+ soup = BeautifulSoup("<b></b>", 'html.parser')
+ original_tag = soup.b
 
-   new_tag = soup.new_tag("a", href="http://www.example.com")
-   original_tag.append(new_tag)
-   original_tag
-   # <b><a href="http://www.example.com"></a></b>
+ new_tag = soup.new_tag("a", href="http://www.example.com")
+ original_tag.append(new_tag)
+ original_tag
+ # <b><a href="http://www.example.com"></a></b>
 
-   new_tag.string = "Link text."
-   original_tag
-   # <b><a href="http://www.example.com">Link text.</a></b>
+ new_tag.string = "Link text."
+ original_tag
+ # <b><a href="http://www.example.com">Link text.</a></b>
 
 Only the first argument, the tag name, is required.
 
@@ -1984,15 +1981,15 @@ doesn't necessarily go at the end of its parent's
 ``.contents``. It'll be inserted at whatever numeric position you
 say. It works just like ``.insert()`` on a Python list::
 
-  markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
-  soup = BeautifulSoup(markup)
-  tag = soup.a
+ markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
+ soup = BeautifulSoup(markup, 'html.parser')
+ tag = soup.a
 
-  tag.insert(1, "but did not endorse ")
-  tag
-  # <a href="http://example.com/">I linked to but did not endorse <i>example.com</i></a>
-  tag.contents
-  # [u'I linked to ', u'but did not endorse', <i>example.com</i>]
+ tag.insert(1, "but did not endorse ")
+ tag
+ # <a href="http://example.com/">I linked to but did not endorse <i>example.com</i></a>
+ tag.contents
+ # ['I linked to ', 'but did not endorse', <i>example.com</i>]
 
 ``insert_before()`` and ``insert_after()``
 ------------------------------------------
@@ -2000,36 +1997,36 @@ say. It works just like ``.insert()`` on a Python list::
 The ``insert_before()`` method inserts tags or strings immediately
 before something else in the parse tree::
 
-   soup = BeautifulSoup("<b>stop</b>")
-   tag = soup.new_tag("i")
-   tag.string = "Don't"
-   soup.b.string.insert_before(tag)
-   soup.b
-   # <b><i>Don't</i>stop</b>
+ soup = BeautifulSoup("<b>leave</b>", 'html.parser')
+ tag = soup.new_tag("i")
+ tag.string = "Don't"
+ soup.b.string.insert_before(tag)
+ soup.b
+ # <b><i>Don't</i>leave</b>
 
 The ``insert_after()`` method inserts tags or strings immediately
 following something else in the parse tree::
 
-   div = soup.new_tag('div')
-   div.string = 'ever'
-   soup.b.i.insert_after(" you ", div)
-   soup.b
-   # <b><i>Don't</i> you <div>ever</div> stop</b>
-   soup.b.contents
-   # [<i>Don't</i>, u' you', <div>ever</div>, u'stop']
+ div = soup.new_tag('div')
+ div.string = 'ever'
+ soup.b.i.insert_after(" you ", div)
+ soup.b
+ # <b><i>Don't</i> you <div>ever</div> leave</b>
+ soup.b.contents
+ # [<i>Don't</i>, ' you', <div>ever</div>, 'leave']
 
 ``clear()``
 -----------
 
 ``Tag.clear()`` removes the contents of a tag::
 
-  markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
-  soup = BeautifulSoup(markup)
-  tag = soup.a
+ markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
+ soup = BeautifulSoup(markup, 'html.parser')
+ tag = soup.a
 
-  tag.clear()
-  tag
-  # <a href="http://example.com/"></a>
+ tag.clear()
+ tag
+ # <a href="http://example.com/"></a>
 
 ``extract()``
 -------------
@@ -2037,34 +2034,34 @@ following something else in the parse tree::
 ``PageElement.extract()`` removes a tag or string from the tree. It
 returns the tag or string that was extracted::
 
-  markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
-  soup = BeautifulSoup(markup)
-  a_tag = soup.a
+ markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
+ soup = BeautifulSoup(markup, 'html.parser')
+ a_tag = soup.a
 
-  i_tag = soup.i.extract()
+ i_tag = soup.i.extract()
 
-  a_tag
-  # <a href="http://example.com/">I linked to</a>
+ a_tag
+ # <a href="http://example.com/">I linked to</a>
 
-  i_tag
-  # <i>example.com</i>
+ i_tag
+ # <i>example.com</i>
 
-  print(i_tag.parent)
-  None
+ print(i_tag.parent)
+ # None
 
 At this point you effectively have two parse trees: one rooted at the
 ``BeautifulSoup`` object you used to parse the document, and one rooted
 at the tag that was extracted. You can go on to call ``extract`` on
 a child of the element you extracted::
 
-  my_string = i_tag.string.extract()
-  my_string
-  # u'example.com'
+ my_string = i_tag.string.extract()
+ my_string
+ # 'example.com'
 
-  print(my_string.parent)
-  # None
-  i_tag
-  # <i></i>
+ print(my_string.parent)
+ # None
+ i_tag
+ # <i></i>
 
 
 ``decompose()``
@@ -2073,25 +2070,25 @@ a child of the element you extracted::
 ``Tag.decompose()`` removes a tag from the tree, then `completely
 destroys it and its contents`::
 
-  markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
-  soup = BeautifulSoup(markup)
-  a_tag = soup.a
-  i_tag = soup.i
+ markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
+ soup = BeautifulSoup(markup, 'html.parser')
+ a_tag = soup.a
+ i_tag = soup.i
 
-  i_tag.decompose()
-  a_tag
-  # <a href="http://example.com/">I linked to</a>
+ i_tag.decompose()
+ a_tag
+ # <a href="http://example.com/">I linked to</a>
 
 The behavior of a decomposed ``Tag`` or ``NavigableString`` is not
 defined and you should not use it for anything. If you're not sure
 whether something has been decomposed, you can check its
 ``.decomposed`` property `(new in Beautiful Soup 4.9.0)`::
 
-  i_tag.decomposed
-  # True
+ i_tag.decomposed
+ # True
 
-  a_tag.decomposed
-  # False
+ a_tag.decomposed
+ # False
 
 
 .. _replace_with():
@@ -2102,16 +2099,16 @@ whether something has been decomposed, you can check its
 ``PageElement.replace_with()`` removes a tag or string from the tree,
 and replaces it with the tag or string of your choice::
 
-  markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
-  soup = BeautifulSoup(markup)
-  a_tag = soup.a
+ markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
+ soup = BeautifulSoup(markup, 'html.parser')
+ a_tag = soup.a
 
-  new_tag = soup.new_tag("b")
-  new_tag.string = "example.net"
-  a_tag.i.replace_with(new_tag)
+ new_tag = soup.new_tag("b")
+ new_tag.string = "example.net"
+ a_tag.i.replace_with(new_tag)
 
-  a_tag
-  # <a href="http://example.com/">I linked to <b>example.net</b></a>
+ a_tag
+ # <a href="http://example.com/">I linked to <b>example.net</b></a>
 
 ``replace_with()`` returns the tag or string that was replaced, so
 that you can examine it or add it back to another part of the tree.
@@ -2122,11 +2119,11 @@ that you can examine it or add it back to another part of the tree.
 ``PageElement.wrap()`` wraps an element in the tag you specify. It
 returns the new wrapper::
 
- soup = BeautifulSoup("<p>I wish I was bold.</p>")
+ soup = BeautifulSoup("<p>I wish I was bold.</p>", 'html.parser')
  soup.p.string.wrap(soup.new_tag("b"))
  # <b>I wish I was bold.</b>
 
- soup.p.wrap(soup.new_tag("div")
+ soup.p.wrap(soup.new_tag("div"))
  # <div><p><b>I wish I was bold.</b></p></div>
 
 This method is new in Beautiful Soup 4.0.5.
@@ -2137,13 +2134,13 @@ This method is new in Beautiful Soup 4.0.5.
 ``Tag.unwrap()`` is the opposite of ``wrap()``. It replaces a tag with
 whatever's inside that tag. It's good for stripping out markup::
 
-  markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
-  soup = BeautifulSoup(markup)
-  a_tag = soup.a
+ markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
+ soup = BeautifulSoup(markup, 'html.parser')
+ a_tag = soup.a
 
-  a_tag.i.unwrap()
-  a_tag
-  # <a href="http://example.com/">I linked to example.com</a>
+ a_tag.i.unwrap()
+ a_tag
+ # <a href="http://example.com/">I linked to example.com</a>
 
 Like ``replace_with()``, ``unwrap()`` returns the tag
 that was replaced.
@@ -2153,27 +2150,27 @@ that was replaced.
 
 After calling a bunch of methods that modify the parse tree, you may end up with two or more ``NavigableString`` objects next to each other. Beautiful Soup doesn't have any problems with this, but since it can't happen in a freshly parsed document, you might not expect behavior like the following::
 
-  soup = BeautifulSoup("<p>A one</p>")
-  soup.p.append(", a two")
+ soup = BeautifulSoup("<p>A one</p>", 'html.parser')
+ soup.p.append(", a two")
 
-  soup.p.contents
-  # [u'A one', u', a two']
+ soup.p.contents
+ # ['A one', ', a two']
 
-  print(soup.p.encode())
-  # <p>A one, a two</p>
+ print(soup.p.encode())
+ # b'<p>A one, a two</p>'
 
-  print(soup.p.prettify())
-  # <p>
-  #  A one
-  #  , a two
-  # </p>
+ print(soup.p.prettify())
+ # <p>
+ #  A one
+ #  , a two
+ # </p>
 
 You can call ``Tag.smooth()`` to clean up the parse tree by consolidating adjacent strings::
 
  soup.smooth()
 
  soup.p.contents
- # [u'A one, a two']
+ # ['A one, a two']
 
  print(soup.p.prettify())
  # <p>
@@ -2194,35 +2191,35 @@ The ``prettify()`` method will turn a Beautiful Soup parse tree into a
 nicely formatted Unicode string, with a separate line for each
 tag and each string::
 
-  markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
-  soup = BeautifulSoup(markup)
-  soup.prettify()
-  # '<html>\n <head>\n </head>\n <body>\n  <a href="http://example.com/">\n...'
-
-  print(soup.prettify())
-  # <html>
-  #  <head>
-  #  </head>
-  #  <body>
-  #   <a href="http://example.com/">
-  #    I linked to
-  #    <i>
-  #     example.com
-  #    </i>
-  #   </a>
-  #  </body>
-  # </html>
+ markup = '<html><head><body><a href="http://example.com/">I linked to <i>example.com</i></a>'
+ soup = BeautifulSoup(markup, 'html.parser')
+ soup.prettify()
+ # '<html>\n <head>\n </head>\n <body>\n  <a href="http://example.com/">\n...'
+
+ print(soup.prettify())
+ # <html>
+ #  <head>
+ #  </head>
+ #  <body>
+ #   <a href="http://example.com/">
+ #    I linked to
+ #    <i>
+ #     example.com
+ #    </i>
+ #   </a>
+ #  </body>
+ # </html>
 
 You can call ``prettify()`` on the top-level ``BeautifulSoup`` object,
 or on any of its ``Tag`` objects::
 
-  print(soup.a.prettify())
-  # <a href="http://example.com/">
-  #  I linked to
-  #  <i>
-  #   example.com
-  #  </i>
-  # </a>
+ print(soup.a.prettify())
+ # <a href="http://example.com/">
+ #  I linked to
+ #  <i>
+ #   example.com
+ #  </i>
+ # </a>
 
 Since it adds whitespace (in the form of newlines), ``prettify()``
 changes the meaning of an HTML document and should not be used to
@@ -2233,14 +2230,14 @@ Non-pretty printing
 -------------------
 
 If you just want a string, with no fancy formatting, you can call
-``unicode()`` or ``str()`` on a ``BeautifulSoup`` object, or a ``Tag``
-within it::
+``str()`` on a ``BeautifulSoup`` object (``unicode()`` in Python 2),
+or on a ``Tag`` within it::
 
  str(soup)
  # '<html><head></head><body><a href="http://example.com/">I linked to <i>example.com</i></a></body></html>'
 
- unicode(soup.a)
- # u'<a href="http://example.com/">I linked to <i>example.com</i></a>'
+ str(soup.a)
+ # '<a href="http://example.com/">I linked to <i>example.com</i></a>'
 
 The ``str()`` function returns a string encoded in UTF-8. See
 `Encodings`_ for other options.
@@ -2256,26 +2253,26 @@ Output formatters
 If you give Beautiful Soup a document that contains HTML entities like
 "&lquot;", they'll be converted to Unicode characters::
 
- soup = BeautifulSoup("&ldquo;Dammit!&rdquo; he said.")
- unicode(soup)
- # u'<html><head></head><body>\u201cDammit!\u201d he said.</body></html>'
+ soup = BeautifulSoup("&ldquo;Dammit!&rdquo; he said.", 'html.parser')
+ str(soup)
+ # '“Dammit!” he said.'
 
-If you then convert the document to a string, the Unicode characters
+If you then convert the document to a bytestring, the Unicode characters
 will be encoded as UTF-8. You won't get the HTML entities back::
 
- str(soup)
- # '<html><head></head><body>\xe2\x80\x9cDammit!\xe2\x80\x9d he said.</body></html>'
+ soup.encode("utf8")
+ # b'\xe2\x80\x9cDammit!\xe2\x80\x9d he said.'
 
 By default, the only characters that are escaped upon output are bare
 ampersands and angle brackets. These get turned into "&amp;", "&lt;",
 and "&gt;", so that Beautiful Soup doesn't inadvertently generate
 invalid HTML or XML::
 
- soup = BeautifulSoup("<p>The law firm of Dewey, Cheatem, & Howe</p>")
+ soup = BeautifulSoup("<p>The law firm of Dewey, Cheatem, & Howe</p>", 'html.parser')
  soup.p
  # <p>The law firm of Dewey, Cheatem, &amp; Howe</p>
 
- soup = BeautifulSoup('<a href="http://example.com/?foo=val1&bar=val2">A link</a>')
+ soup = BeautifulSoup('<a href="http://example.com/?foo=val1&bar=val2">A link</a>', 'html.parser')
  soup.a
  # <a href="http://example.com/?foo=val1&amp;bar=val2">A link</a>
 
@@ -2288,56 +2285,44 @@ The default is ``formatter="minimal"``. Strings will only be processed
 enough to ensure that Beautiful Soup generates valid HTML/XML::
 
  french = "<p>Il a dit &lt;&lt;Sacr&eacute; bleu!&gt;&gt;</p>"
- soup = BeautifulSoup(french)
+ soup = BeautifulSoup(french, 'html.parser')
  print(soup.prettify(formatter="minimal"))
- # <html>
- #  <body>
- #   <p>
- #    Il a dit &lt;&lt;Sacré bleu!&gt;&gt;
- #   </p>
- #  </body>
- # </html>
+ # <p>
+ #  Il a dit &lt;&lt;Sacré bleu!&gt;&gt;
+ # </p>
 
 If you pass in ``formatter="html"``, Beautiful Soup will convert
 Unicode characters to HTML entities whenever possible::
 
  print(soup.prettify(formatter="html"))
- # <html>
- #  <body>
- #   <p>
- #    Il a dit &lt;&lt;Sacr&eacute; bleu!&gt;&gt;
- #   </p>
- #  </body>
- # </html>
+ # <p>
+ #  Il a dit &lt;&lt;Sacr&eacute; bleu!&gt;&gt;
+ # </p>
 
 If you pass in ``formatter="html5"``, it's the same as
 ``formatter="html"``, but Beautiful Soup will
 omit the closing slash in HTML void tags like "br"::
 
- soup = BeautifulSoup("<br>")
+ br = BeautifulSoup("<br>", 'html.parser').br
  
- print(soup.encode(formatter="html"))
- # <html><body><br/></body></html>
+ print(br.encode(formatter="html"))
+ # b'<br/>'
  
- print(soup.encode(formatter="html5"))
- # <html><body><br></body></html>
+ print(br.encode(formatter="html5"))
+ # b'<br>'
  
 If you pass in ``formatter=None``, Beautiful Soup will not modify
 strings at all on output. This is the fastest option, but it may lead
 to Beautiful Soup generating invalid HTML/XML, as in these examples::
 
  print(soup.prettify(formatter=None))
- # <html>
- #  <body>
- #   <p>
- #    Il a dit <<Sacré bleu!>>
- #   </p>
- #  </body>
- # </html>
+ # <p>
+ #  Il a dit <<Sacré bleu!>>
+ # </p>
 
- link_soup = BeautifulSoup('<a href="http://example.com/?foo=val1&bar=val2">A link</a>')
+ link_soup = BeautifulSoup('<a href="http://example.com/?foo=val1&bar=val2">A link</a>', 'html.parser')
  print(link_soup.a.encode(formatter=None))
- # <a href="http://example.com/?foo=val1&bar=val2">A link</a>
+ # b'<a href="http://example.com/?foo=val1&bar=val2">A link</a>'
 
 If you need more sophisticated control over your output, you can
 use Beautiful Soup's ``Formatter`` class. Here's a formatter that
@@ -2347,16 +2332,13 @@ attribute value::
  from bs4.formatter import HTMLFormatter
  def uppercase(str):
      return str.upper()
+ 
  formatter = HTMLFormatter(uppercase)
 
  print(soup.prettify(formatter=formatter))
- # <html>
- #  <body>
- #   <p>
- #    IL A DIT <<SACRÉ BLEU!>>
- #   </p>
- #  </body>
- # </html>
+ # <p>
+ #  IL A DIT <<SACRÉ BLEU!>>
+ # </p>
 
  print(link_soup.a.prettify(formatter=formatter))
  # <a href="HTTP://EXAMPLE.COM/?FOO=VAL1&BAR=VAL2">
@@ -2367,7 +2349,7 @@ Subclassing ``HTMLFormatter`` or ``XMLFormatter`` will give you even
 more control over the output. For example, Beautiful Soup sorts the
 attributes in every tag by default::
 
- attr_soup = BeautifulSoup(b'<p z="1" m="2" a="3"></p>')
+ attr_soup = BeautifulSoup(b'<p z="1" m="2" a="3"></p>', 'html.parser')
  print(attr_soup.p.encode())
  # <p a="3" m="2" z="1"></p>
 
@@ -2380,8 +2362,9 @@ whenever it appears::
      def attributes(self, tag):
          for k, v in tag.attrs.items():
              if k == 'm':
-	         continue
+                 continue
              yield k, v
+ 
  print(attr_soup.p.encode(formatter=UnsortedAttributes())) 
  # <p z="1" a="3"></p>
 
@@ -2393,9 +2376,9 @@ all the strings in the document or something, but it will ignore the
 return value::
 
  from bs4.element import CData
- soup = BeautifulSoup("<a></a>")
+ soup = BeautifulSoup("<a></a>", 'html.parser')
  soup.a.string = CData("one < three")
- print(soup.a.prettify(formatter="xml"))
+ print(soup.a.prettify(formatter="html"))
  # <a>
  #  <![CDATA[one < three]]>
  # </a>
@@ -2408,31 +2391,31 @@ If you only want the human-readable text inside a document or tag, you can use t
 ``get_text()`` method. It returns all the text in a document or
 beneath a tag, as a single Unicode string::
 
-  markup = '<a href="http://example.com/">\nI linked to <i>example.com</i>\n</a>'
-  soup = BeautifulSoup(markup)
+ markup = '<a href="http://example.com/">\nI linked to <i>example.com</i>\n</a>'
+ soup = BeautifulSoup(markup, 'html.parser')
 
-  soup.get_text()
-  u'\nI linked to example.com\n'
-  soup.i.get_text()
-  u'example.com'
+ soup.get_text()
+ '\nI linked to example.com\n'
+ soup.i.get_text()
+ 'example.com'
 
 You can specify a string to be used to join the bits of text
 together::
 
  # soup.get_text("|")
- u'\nI linked to |example.com|\n'
+ '\nI linked to |example.com|\n'
 
 You can tell Beautiful Soup to strip whitespace from the beginning and
 end of each bit of text::
 
  # soup.get_text("|", strip=True)
- u'I linked to|example.com'
+ 'I linked to|example.com'
 
 But at that point you might want to use the :ref:`.stripped_strings <string-generators>`
 generator instead, and process the text yourself::
 
  [text for text in soup.stripped_strings]
- # [u'I linked to', u'example.com']
+ # ['I linked to', 'example.com']
 
 *As of Beautiful Soup version 4.9.0, when lxml or html.parser are in
 use, the contents of <script>, <style>, and <template>
@@ -2549,11 +2532,11 @@ or UTF-8.  But when you load that document into Beautiful Soup, you'll
 discover it's been converted to Unicode::
 
  markup = "<h1>Sacr\xc3\xa9 bleu!</h1>"
- soup = BeautifulSoup(markup)
+ soup = BeautifulSoup(markup, 'html.parser')
  soup.h1
  # <h1>Sacré bleu!</h1>
  soup.h1.string
- # u'Sacr\xe9 bleu!'
+ # 'Sacr\xe9 bleu!'
 
 It's not magic. (That sure would be nice.) Beautiful Soup uses a
 sub-library called `Unicode, Dammit`_ to detect a document's encoding
@@ -2575,29 +2558,29 @@ Unicode, Dammit can't get a lock on it, and misidentifies it as
 ISO-8859-7::
 
  markup = b"<h1>\xed\xe5\xec\xf9</h1>"
- soup = BeautifulSoup(markup)
- soup.h1
- <h1>νεμω</h1>
- soup.original_encoding
- 'ISO-8859-7'
+ soup = BeautifulSoup(markup, 'html.parser')
+ print(soup.h1)
+ # <h1>νεμω</h1>
+ print(soup.original_encoding)
+ # iso-8859-7
 
 We can fix this by passing in the correct ``from_encoding``::
 
- soup = BeautifulSoup(markup, from_encoding="iso-8859-8")
- soup.h1
- <h1>םולש</h1>
- soup.original_encoding
- 'iso8859-8'
+ soup = BeautifulSoup(markup, 'html.parser', from_encoding="iso-8859-8")
+ print(soup.h1)
+ # <h1>םולש</h1>
+ print(soup.original_encoding)
+ # iso8859-8
 
 If you don't know what the correct encoding is, but you know that
 Unicode, Dammit is guessing wrong, you can pass the wrong guesses in
 as ``exclude_encodings``::
 
- soup = BeautifulSoup(markup, exclude_encodings=["ISO-8859-7"])
- soup.h1
- <h1>םולש</h1>
- soup.original_encoding
- 'WINDOWS-1255'
+ soup = BeautifulSoup(markup, 'html.parser', exclude_encodings=["iso-8859-7"])
+ print(soup.h1)
+ # <h1>םולש</h1>
+ print(soup.original_encoding)
+ # WINDOWS-1255
 
 Windows-1255 isn't 100% correct, but that encoding is a compatible
 superset of ISO-8859-8, so it's close enough. (``exclude_encodings``
@@ -2633,7 +2616,7 @@ document written in the Latin-1 encoding::
   </html>
  '''
 
- soup = BeautifulSoup(markup)
+ soup = BeautifulSoup(markup, 'html.parser')
  print(soup.prettify())
  # <html>
  #  <head>
@@ -2661,17 +2644,17 @@ You can also call encode() on the ``BeautifulSoup`` object, or any
 element in the soup, just as if it were a Python string::
 
  soup.p.encode("latin-1")
- # '<p>Sacr\xe9 bleu!</p>'
+ # b'<p>Sacr\xe9 bleu!</p>'
 
  soup.p.encode("utf-8")
- # '<p>Sacr\xc3\xa9 bleu!</p>'
+ # b'<p>Sacr\xc3\xa9 bleu!</p>'
 
 Any characters that can't be represented in your chosen encoding will
 be converted into numeric XML entity references. Here's a document
 that includes the Unicode character SNOWMAN::
 
  markup = u"<b>\N{SNOWMAN}</b>"
- snowman_soup = BeautifulSoup(markup)
+ snowman_soup = BeautifulSoup(markup, 'html.parser')
  tag = snowman_soup.b
 
 The SNOWMAN character can be part of a UTF-8 document (it looks like
@@ -2679,13 +2662,13 @@ The SNOWMAN character can be part of a UTF-8 document (it looks like
 ASCII, so it's converted into "&#9731" for those encodings::
 
  print(tag.encode("utf-8"))
- # <b>☃</b>
+ # b'<b>\xe2\x98\x83</b>'
 
- print tag.encode("latin-1")
- # <b>&#9731;</b>
+ print(tag.encode("latin-1"))
+ # b'<b>&#9731;</b>'
 
- print tag.encode("ascii")
- # <b>&#9731;</b>
+ print(tag.encode("ascii"))
+ # b'<b>&#9731;</b>'
 
 Unicode, Dammit
 ---------------
@@ -2725,15 +2708,15 @@ entities::
  markup = b"<p>I just \x93love\x94 Microsoft Word\x92s smart quotes</p>"
 
  UnicodeDammit(markup, ["windows-1252"], smart_quotes_to="html").unicode_markup
- # u'<p>I just &ldquo;love&rdquo; Microsoft Word&rsquo;s smart quotes</p>'
+ # '<p>I just &ldquo;love&rdquo; Microsoft Word&rsquo;s smart quotes</p>'
 
  UnicodeDammit(markup, ["windows-1252"], smart_quotes_to="xml").unicode_markup
- # u'<p>I just &#x201C;love&#x201D; Microsoft Word&#x2019;s smart quotes</p>'
+ # '<p>I just &#x201C;love&#x201D; Microsoft Word&#x2019;s smart quotes</p>'
 
 You can also convert Microsoft smart quotes to ASCII quotes::
 
  UnicodeDammit(markup, ["windows-1252"], smart_quotes_to="ascii").unicode_markup
- # u'<p>I just "love" Microsoft Word\'s smart quotes</p>'
+ # '<p>I just "love" Microsoft Word\'s smart quotes</p>'
 
 Hopefully you'll find this feature useful, but Beautiful Soup doesn't
 use it. Beautiful Soup prefers the default behavior, which is to
@@ -2741,7 +2724,7 @@ convert Microsoft smart quotes to Unicode characters along with
 everything else::
 
  UnicodeDammit(markup, ["windows-1252"]).unicode_markup
- # u'<p>I just \u201clove\u201d Microsoft Word\u2019s smart quotes</p>'
+ # '<p>I just “love” Microsoft Word’s smart quotes</p>'
 
 Inconsistent encodings
 ^^^^^^^^^^^^^^^^^^^^^^
@@ -2798,31 +2781,31 @@ the original document each Tag was found. You can access this
 information as ``Tag.sourceline`` (line number) and ``Tag.sourcepos``
 (position of the start tag within a line)::
 
-   markup = "<p\n>Paragraph 1</p>\n    <p>Paragraph 2</p>"
-   soup = BeautifulSoup(markup, 'html.parser')
-   for tag in soup.find_all('p'):
-       print(tag.sourceline, tag.sourcepos, tag.string)
-   # (1, 0, u'Paragraph 1')
-   # (2, 3, u'Paragraph 2')
+ markup = "<p\n>Paragraph 1</p>\n    <p>Paragraph 2</p>"
+ soup = BeautifulSoup(markup, 'html.parser')
+ for tag in soup.find_all('p'):
+     print(repr((tag.sourceline, tag.sourcepos, tag.string)))
+ # (1, 0, 'Paragraph 1')
+ # (3, 4, 'Paragraph 2')
 
 Note that the two parsers mean slightly different things by
 ``sourceline`` and ``sourcepos``. For html.parser, these numbers
 represent the position of the initial less-than sign. For html5lib,
 these numbers represent the position of the final greater-than sign::
    
-   soup = BeautifulSoup(markup, 'html5lib')
-   for tag in soup.find_all('p'):
-       print(tag.sourceline, tag.sourcepos, tag.string)
-   # (2, 1, u'Paragraph 1')
-   # (3, 7, u'Paragraph 2')
+ soup = BeautifulSoup(markup, 'html5lib')
+ for tag in soup.find_all('p'):
+     print(repr((tag.sourceline, tag.sourcepos, tag.string)))
+ # (2, 0, 'Paragraph 1')
+ # (3, 6, 'Paragraph 2')
 
 You can shut off this feature by passing ``store_line_numbers=False`
 into the ``BeautifulSoup`` constructor::
 
-   markup = "<p\n>Paragraph 1</p>\n    <p>Paragraph 2</p>"
-   soup = BeautifulSoup(markup, 'html.parser', store_line_numbers=False)
-   soup.p.sourceline
-   # None
+ markup = "<p\n>Paragraph 1</p>\n    <p>Paragraph 2</p>"
+ soup = BeautifulSoup(markup, 'html.parser', store_line_numbers=False)
+ print(soup.p.sourceline)
+ # None
   
 `This feature is new in 4.8.1, and the parsers based on lxml don't
 support it.`
@@ -2839,16 +2822,16 @@ in different parts of the object tree, because they both look like
  markup = "<p>I want <b>pizza</b> and more <b>pizza</b>!</p>"
  soup = BeautifulSoup(markup, 'html.parser')
  first_b, second_b = soup.find_all('b')
- print first_b == second_b
+ print(first_b == second_b)
  # True
 
- print first_b.previous_element == second_b.previous_element
+ print(first_b.previous_element == second_b.previous_element)
  # False
 
 If you want to see whether two variables refer to exactly the same
 object, use `is`::
 
- print first_b is second_b
+ print(first_b is second_b)
  # False
 
 Copying Beautiful Soup objects
@@ -2859,23 +2842,23 @@ You can use ``copy.copy()`` to create a copy of any ``Tag`` or
 
  import copy
  p_copy = copy.copy(soup.p)
- print p_copy
+ print(p_copy)
  # <p>I want <b>pizza</b> and more <b>pizza</b>!</p>
 
 The copy is considered equal to the original, since it represents the
 same markup as the original, but it's not the same object::
 
- print soup.p == p_copy
+ print(soup.p == p_copy)
  # True
 
- print soup.p is p_copy
+ print(soup.p is p_copy)
  # False
 
 The only real difference is that the copy is completely detached from
 the original Beautiful Soup object tree, just as if ``extract()`` had
 been called on it::
 
- print p_copy.parent
+ print(p_copy.parent)
  # None
 
 This is because two different ``Tag`` objects can't occupy the same
@@ -2922,7 +2905,7 @@ three ``SoupStrainer`` objects::
  only_tags_with_id_link2 = SoupStrainer(id="link2")
 
  def is_short_string(string):
-     return len(string) < 10
+     return string is not None and len(string) < 10
 
  only_short_strings = SoupStrainer(string=is_short_string)
 
@@ -2930,8 +2913,7 @@ I'm going to bring back the "three sisters" document one more time,
 and we'll see what the document looks like when it's parsed with these
 three ``SoupStrainer`` objects::
 
- html_doc = """
- <html><head><title>The Dormouse's story</title></head>
+ html_doc = """<html><head><title>The Dormouse's story</title></head>
  <body>
  <p class="title"><b>The Dormouse's story</b></p>
 
@@ -2973,10 +2955,10 @@ You can also pass a ``SoupStrainer`` into any of the methods covered
 in `Searching the tree`_. This probably isn't terribly useful, but I
 thought I'd mention it::
 
- soup = BeautifulSoup(html_doc)
+ soup = BeautifulSoup(html_doc, 'html.parser')
  soup.find_all(only_short_strings)
- # [u'\n\n', u'\n\n', u'Elsie', u',\n', u'Lacie', u' and\n', u'Tillie',
- #  u'\n\n', u'...', u'\n']
+ # ['\n\n', '\n\n', 'Elsie', ',\n', 'Lacie', ' and\n', 'Tillie',
+ #  '\n\n', '...', '\n']
 
 Customizing multi-valued attributes
 -----------------------------------
@@ -2985,22 +2967,22 @@ In an HTML document, an attribute like ``class`` is given a list of
 values, and an attribute like ``id`` is given a single value, because
 the HTML specification treats those attributes differently::
 
-  markup = '<a class="cls1 cls2" id="id1 id2">'
-  soup = BeautifulSoup(markup)
-  soup.a['class']
-  # ['cls1', 'cls2']
-  soup.a['id']
-  # 'id1 id2'
+ markup = '<a class="cls1 cls2" id="id1 id2">'
+ soup = BeautifulSoup(markup, 'html.parser')
+ soup.a['class']
+ # ['cls1', 'cls2']
+ soup.a['id']
+ # 'id1 id2'
 
 You can turn this off by passing in
 ``multi_valued_attributes=None``. Than all attributes will be given a
 single value::
 
-  soup = BeautifulSoup(markup, multi_valued_attributes=None)
-  soup.a['class']
-  # 'cls1 cls2'
-  soup.a['id']
-  # 'id1 id2'
+ soup = BeautifulSoup(markup, 'html.parser', multi_valued_attributes=None)
+ soup.a['class']
+ # 'cls1 cls2'
+ soup.a['id']
+ # 'id1 id2'
 
 You can customize this behavior quite a bit by passing in a
 dictionary for ``multi_valued_attributes``. If you need this, look at
@@ -3018,38 +3000,38 @@ When using the ``html.parser`` parser, you can use the
 Beautiful Soup does when it encounters a tag that defines the same
 attribute more than once::
 
-  markup = '<a href="http://url1/" href="http://url2/">'
+ markup = '<a href="http://url1/" href="http://url2/">'
 
 The default behavior is to use the last value found for the tag::
 
-  soup = BeautifulSoup(markup, 'html.parser')
-  soup.a['href']
-  # http://url2/
+ soup = BeautifulSoup(markup, 'html.parser')
+ soup.a['href']
+ # http://url2/
 
-  soup = BeautifulSoup(markup, 'html.parser', on_duplicate_attribute='replace')
-  soup.a['href']
-  # http://url2/
+ soup = BeautifulSoup(markup, 'html.parser', on_duplicate_attribute='replace')
+ soup.a['href']
+ # http://url2/
   
 With ``on_duplicate_attribute='ignore'`` you can tell Beautiful Soup
 to use the `first` value found and ignore the rest::
 
-  soup = BeautifulSoup(markup, 'html.parser', on_duplicate_attribute='ignore')
-  soup.a['href']
-  # http://url1/
+ soup = BeautifulSoup(markup, 'html.parser', on_duplicate_attribute='ignore')
+ soup.a['href']
+ # http://url1/
 
 (lxml and html5lib always do it this way; their behavior can't be
 configured from within Beautiful Soup.)
 
 If you need more, you can pass in a function that's called on each duplicate value::
 
-  def accumulate(attributes_so_far, key, value):
-      if not isinstance(attributes_so_far[key], list):
-          attributes_so_far[key] = [attributes_so_far[key]]
-      attributes_so_far[key].append(value)
+ def accumulate(attributes_so_far, key, value):
+     if not isinstance(attributes_so_far[key], list):
+         attributes_so_far[key] = [attributes_so_far[key]]
+     attributes_so_far[key].append(value)
 
-  soup = BeautifulSoup(markup, 'html.parser', on_duplicate_attribute=accumulate)
-  soup.a['href']
-  # ["http://url1/", "http://url2/"]
+ soup = BeautifulSoup(markup, 'html.parser', on_duplicate_attribute=accumulate)
+ soup.a['href']
+ # ["http://url1/", "http://url2/"]
 
 `(This is a new feature in Beautiful Soup 4.9.1.)`
 
@@ -3062,26 +3044,28 @@ contain that information. Instead of that default behavior, you can
 tell Beautiful Soup to instantiate `subclasses` of ``Tag`` or
 ``NavigableString``, subclasses you define with custom behavior::
 
-  from bs4 import Tag, NavigableString
-  class MyTag(Tag):
-      pass
-  
-  class MyString(NavigableString):
-      pass
-
-  markup = "<div>some text</div>"
-  soup = BeautifulSoup(markup)
-  isinstance(soup.div, MyTag)
-  # False
-  isinstance(soup.div.string, MyString)
-  # False 
-
-  my_classes = { Tag: MyTag, NavigableString: MyString }
-  soup = BeautifulSoup(markup, element_classes=my_classes)
-  isinstance(soup.div, MyTag)
-  # True
-  isinstance(soup.div.string, MyString)
-  # True  
+ from bs4 import Tag, NavigableString
+ class MyTag(Tag):
+     pass
+
+
+ class MyString(NavigableString):
+     pass
+
+
+ markup = "<div>some text</div>"
+ soup = BeautifulSoup(markup, 'html.parser')
+ isinstance(soup.div, MyTag)
+ # False
+ isinstance(soup.div.string, MyString)
+ # False 
+
+ my_classes = { Tag: MyTag, NavigableString: MyString }
+ soup = BeautifulSoup(markup, 'html.parser', element_classes=my_classes)
+ isinstance(soup.div, MyTag)
+ # True
+ isinstance(soup.div.string, MyString)
+ # True  
 
 This can be useful when incorporating Beautiful Soup into a test
 framework.
@@ -3105,6 +3089,7 @@ missing a parser that Beautiful Soup could be using::
  from bs4.diagnose import diagnose
  with open("bad.html") as fp:
      data = fp.read()
+
  diagnose(data)
 
  # Diagnostic running on Beautiful Soup 4.2.0
@@ -3154,7 +3139,7 @@ Version mismatch problems
 -------------------------
 
 * ``SyntaxError: Invalid syntax`` (on the line ``ROOT_TAG_NAME =
-  u'[document]'``): Caused by running the Python 2 version of
+  '[document]'``): Caused by running the Python 2 version of
   Beautiful Soup under Python 3, without converting the code.
 
 * ``ImportError: No module named HTMLParser`` - Caused by running the
@@ -3210,7 +3195,7 @@ Miscellaneous
 -------------
 
 * ``UnicodeEncodeError: 'charmap' codec can't encode character
-  u'\xfoo' in position bar`` (or just about any other
+  '\xfoo' in position bar`` (or just about any other
   ``UnicodeEncodeError``) - This problem shows up in two main
   situations. First, when you try to print a Unicode character that
   your console doesn't know how to display. (See `this page on the
@@ -3222,8 +3207,8 @@ Miscellaneous
 
 * ``KeyError: [attr]`` - Caused by accessing ``tag['attr']`` when the
   tag in question doesn't define the ``attr`` attribute. The most
-  common errors are ``KeyError: 'href'`` and ``KeyError:
-  'class'``. Use ``tag.get('attr')`` if you're not sure ``attr`` is
+  common errors are ``KeyError: 'href'`` and ``KeyError: 'class'``.
+  Use ``tag.get('attr')`` if you're not sure ``attr`` is
   defined, just as you would with a Python dictionary.
 
 * ``AttributeError: 'ResultSet' object has no attribute 'foo'`` - This
@@ -3323,11 +3308,11 @@ Most code written against Beautiful Soup 3 will work against Beautiful
 Soup 4 with one simple change. All you should have to do is change the
 package name from ``BeautifulSoup`` to ``bs4``. So this::
 
-  from BeautifulSoup import BeautifulSoup
+ from BeautifulSoup import BeautifulSoup
 
 becomes this::
 
-  from bs4 import BeautifulSoup
+ from bs4 import BeautifulSoup
 
 * If you get the ``ImportError`` "No module named BeautifulSoup", your
   problem is that you're trying to run Beautiful Soup 3 code, but you
author	Leonard Richardson <leonardr@segfault.org>	2020-07-29 22:43:48 -0400
committer	Leonard Richardson <leonardr@segfault.org>	2020-07-29 22:43:48 -0400
commit	bd479f6ba3ed9db76d26cf36f12f1e9744f85ce4 (patch)
tree	3eaea193cfff6a82ce28eb30f9db2bd47127b003 /doc
parent	89bbbf3626a783cc15484cedbb4c5a663d95e824 (diff)