Added an exclude_encodings argument to UnicodeDammit and to the

Beautiful Soup constructor, which lets you prohibit the detection of an encoding that you know is wrong. [bug=1469408]
author: Leonard Richardson <leonardr@segfault.org> 2015-06-27 09:55:40 -0400
committer: Leonard Richardson <leonardr@segfault.org> 2015-06-27 09:55:40 -0400
commit: feffc5a1146e2520c90682bc2c33f5fa7d3943f0 (patch)
tree: 6dce892919c201b629628647f86843382b29a60a /doc/source
parent: d728b9cbd6cd5954acf7c9c32fe2f1878809d6e8 (diff)
1 files changed, 13 insertions, 0 deletions
diff --git a/doc/source/index.rst b/doc/source/index.rst
index 1b7b1e6..821dad4 100644
--- a/doc/source/index.rst
+++ b/doc/source/index.rst
@@ -2397,6 +2397,19 @@ We can fix this by passing in the correct ``from_encoding``::
  soup.original_encoding
  'iso8859-8'
 
+If you don't know what the correct encoding is, but you know that
+Unicode, Dammit is guessing wrong, you can pass the wrong guesses in
+as ``exclude_encodings``::
+
+ soup = BeautifulSoup(markup, exclude_encodings=["ISO-8859-7"])
+ soup.h1
+ <h1>םולש</h1>
+ soup.original_encoding
+ 'WINDOWS-1255'
+
+(This isn't 100% correct, but Windows-1255 is a compatible superset of
+ISO-8859-8, so it's close enough.)
+
 In rare cases (usually when a UTF-8 document contains text written in
 a completely different encoding), the only way to get Unicode may be
 to replace some characters with the special Unicode character
author	Leonard Richardson <leonardr@segfault.org>	2015-06-27 09:55:40 -0400
committer	Leonard Richardson <leonardr@segfault.org>	2015-06-27 09:55:40 -0400
commit	feffc5a1146e2520c90682bc2c33f5fa7d3943f0 (patch)
tree	6dce892919c201b629628647f86843382b29a60a /doc/source
parent	d728b9cbd6cd5954acf7c9c32fe2f1878809d6e8 (diff)