As a last-ditch attempt to turn data into Unicode, use errors=replace instead of errors=strict.

author: Leonard Richardson <leonard.richardson@canonical.com> 2012-02-09 16:15:56 -0500
committer: Leonard Richardson <leonard.richardson@canonical.com> 2012-02-09 16:15:56 -0500
commit: 4aff2ee4d6f077e06159c92ab05c0f2ea527c6fa (patch)
tree: 40951a60046f184794a011a498187053e8ad2a92 /bs4/doc/source
parent: caeb168dc47470607b3cd091e1d35db45c089385 (diff)
1 files changed, 9 insertions, 0 deletions
diff --git a/bs4/doc/source/index.rst b/bs4/doc/source/index.rst
index abea5c6..d28787b 100644
--- a/bs4/doc/source/index.rst
+++ b/bs4/doc/source/index.rst
@@ -2076,6 +2076,15 @@ We can fix this by passing in the correct ``from_encoding``::
  soup.original_encoding
  'iso8859-8'
 
+In rare cases (usually when a UTF-8 document contains text written in
+a completely different encoding), the only way to get Unicode may be
+to replace some characters with the special Unicode character
+"REPLACEMENT CHARACTER" (U+FFFD, �). If Unicode, Dammit needs to do
+this, it will set the ``.characters_were_replaced`` attribute to
+``True`` on the ``UnicodeDammit`` or ``BeautifulSoup`` object. This
+lets you know that the Unicode representation is not an exact
+representation of the original--some data was lost.
+
 Output encoding
 ---------------
author	Leonard Richardson <leonard.richardson@canonical.com>	2012-02-09 16:15:56 -0500
committer	Leonard Richardson <leonard.richardson@canonical.com>	2012-02-09 16:15:56 -0500
commit	4aff2ee4d6f077e06159c92ab05c0f2ea527c6fa (patch)
tree	40951a60046f184794a011a498187053e8ad2a92 /bs4/doc/source
parent	caeb168dc47470607b3cd091e1d35db45c089385 (diff)