By default, turn unrecognized characters into numeric XML entity refs.

author: Leonard Richardson <leonard.richardson@canonical.com> 2012-02-16 13:55:20 -0500
committer: Leonard Richardson <leonard.richardson@canonical.com> 2012-02-16 13:55:20 -0500
commit: 1a50d9623831990ae0a78ea3a7e66fa098fe92ac (patch)
tree: d31578ac86c753c6e3427f574408a1ad960d80ac /bs4/doc/source
parent: ffcebc274b84b85a0b8c93c2aca8756df4baa236 (diff)
1 files changed, 21 insertions, 0 deletions
diff --git a/bs4/doc/source/index.rst b/bs4/doc/source/index.rst
index 200317a..0467c00 100644
--- a/bs4/doc/source/index.rst
+++ b/bs4/doc/source/index.rst
@@ -2160,6 +2160,27 @@ element in the soup, just as if it were a Python string::
  soup.p.encode("utf-8")
  # '<p>Sacr\xc3\xa9 bleu!</p>'
 
+Any characters that can't be represented in your chosen encoding will
+be converted into numeric XML entity references. For instance, here's
+a document that includes the Unicode character SNOWMAN::
+
+ markup = u"<b>\N{SNOWMAN}</b>"
+ snowman_soup = BeautifulSoup(markup)
+ tag = snowman_soup.b
+
+The SNOWMAN character can be part of a UTF-8 document (it looks like
+☃), but there's no representation for that character in ISO-Latin-1 or
+ASCII, so it's converted into "&#9731" for those encodings::
+
+ print(tag.encode("utf-8"))
+ # <b>☃</b>
+
+ print tag.encode("latin-1")
+ # <b>&#9731;</b>
+
+ print tag.encode("ascii")
+ # <b>&#9731;</b>
+
 Unicode, Dammit
 ---------------
author	Leonard Richardson <leonard.richardson@canonical.com>	2012-02-16 13:55:20 -0500
committer	Leonard Richardson <leonard.richardson@canonical.com>	2012-02-16 13:55:20 -0500
commit	1a50d9623831990ae0a78ea3a7e66fa098fe92ac (patch)
tree	d31578ac86c753c6e3427f574408a1ad960d80ac /bs4/doc/source
parent	ffcebc274b84b85a0b8c93c2aca8756df4baa236 (diff)