diff options
author | Leonard Richardson <leonardr@segfault.org> | 2012-05-29 13:25:01 -0400 |
---|---|---|
committer | Leonard Richardson <leonardr@segfault.org> | 2012-05-29 13:25:01 -0400 |
commit | 8ec2a7d9423a6269f74c47ec2475b6c5fd143437 (patch) | |
tree | c56ac2168b2d10da6057e9b84c129f081cf039e6 /doc/source | |
parent | 49aa4dd243353f7d0f25d7c5ea51ba3344110a47 (diff) |
Prep for release.
Diffstat (limited to 'doc/source')
-rw-r--r-- | doc/source/index.rst | 16 |
1 files changed, 9 insertions, 7 deletions
diff --git a/doc/source/index.rst b/doc/source/index.rst index 3a2069d..16c6020 100644 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -2465,8 +2465,9 @@ UTF-8. Here's a simple example:: quote = (u"\N{LEFT DOUBLE QUOTATION MARK}I like snowmen!\N{RIGHT DOUBLE QUOTATION MARK}") doc = snowmen.encode("utf8") + quote.encode("windows_1252") -This document is a mess. You can display the snowmen or the smart -quotes, but not both:: +This document is a mess. The snowmen are in UTF-8 and the quotes are +in Windows-1252. You can display the snowmen or the quotes, but not +both:: print(doc) # ☃☃☃�I like snowmen!� @@ -2474,10 +2475,11 @@ quotes, but not both:: print(doc.decode("windows-1252")) # ☃☃☃“I like snowmen!” -Decoding the document as UTF-8 will raise a ``UnicodeDecodeError``, -but ``UnicodeDammit.detwingle()`` will convert the document to pure -UTF-8, allowing you to decode it and display the snowmen and -quote marks simultaneously:: +Decoding the document as UTF-8 raises a ``UnicodeDecodeError``, and +decoding it as Windows-1252 gives you gibberish. Fortunately, +``UnicodeDammit.detwingle()`` will convert the string to pure UTF-8, +allowing you to decode it to Unicode and display the snowmen and quote +marks simultaneously:: new_doc = UnicodeDammit.detwingle(doc) print(new_doc.decode("utf8")) @@ -2493,7 +2495,7 @@ constructor. Beautiful Soup assumes that a document has a single encoding, whatever it might be. If you pass it a document that contains both UTF-8 and Windows-1252, it's likely to think the whole document is Windows-1252, and the document will come out looking like -`` ☃☃☃“I like snowmen!”``. +` ☃☃☃“I like snowmen!”`. ``UnicodeDammit.detwingle()`` is new in Beautiful Soup 4.1.0. |