summaryrefslogtreecommitdiff
path: root/doc/source
diff options
context:
space:
mode:
authorLeonard Richardson <leonardr@segfault.org>2012-05-29 13:25:01 -0400
committerLeonard Richardson <leonardr@segfault.org>2012-05-29 13:25:01 -0400
commit8ec2a7d9423a6269f74c47ec2475b6c5fd143437 (patch)
treec56ac2168b2d10da6057e9b84c129f081cf039e6 /doc/source
parent49aa4dd243353f7d0f25d7c5ea51ba3344110a47 (diff)
Prep for release.
Diffstat (limited to 'doc/source')
-rw-r--r--doc/source/index.rst16
1 files changed, 9 insertions, 7 deletions
diff --git a/doc/source/index.rst b/doc/source/index.rst
index 3a2069d..16c6020 100644
--- a/doc/source/index.rst
+++ b/doc/source/index.rst
@@ -2465,8 +2465,9 @@ UTF-8. Here's a simple example::
quote = (u"\N{LEFT DOUBLE QUOTATION MARK}I like snowmen!\N{RIGHT DOUBLE QUOTATION MARK}")
doc = snowmen.encode("utf8") + quote.encode("windows_1252")
-This document is a mess. You can display the snowmen or the smart
-quotes, but not both::
+This document is a mess. The snowmen are in UTF-8 and the quotes are
+in Windows-1252. You can display the snowmen or the quotes, but not
+both::
print(doc)
# ☃☃☃�I like snowmen!�
@@ -2474,10 +2475,11 @@ quotes, but not both::
print(doc.decode("windows-1252"))
# ☃☃☃“I like snowmen!”
-Decoding the document as UTF-8 will raise a ``UnicodeDecodeError``,
-but ``UnicodeDammit.detwingle()`` will convert the document to pure
-UTF-8, allowing you to decode it and display the snowmen and
-quote marks simultaneously::
+Decoding the document as UTF-8 raises a ``UnicodeDecodeError``, and
+decoding it as Windows-1252 gives you gibberish. Fortunately,
+``UnicodeDammit.detwingle()`` will convert the string to pure UTF-8,
+allowing you to decode it to Unicode and display the snowmen and quote
+marks simultaneously::
new_doc = UnicodeDammit.detwingle(doc)
print(new_doc.decode("utf8"))
@@ -2493,7 +2495,7 @@ constructor. Beautiful Soup assumes that a document has a single
encoding, whatever it might be. If you pass it a document that
contains both UTF-8 and Windows-1252, it's likely to think the whole
document is Windows-1252, and the document will come out looking like
-`` ☃☃☃“I like snowmen!”``.
+` ☃☃☃“I like snowmen!”`.
``UnicodeDammit.detwingle()`` is new in Beautiful Soup 4.1.0.