diff options
author | Leonard Richardson <leonard.richardson@canonical.com> | 2012-02-09 16:15:56 -0500 |
---|---|---|
committer | Leonard Richardson <leonard.richardson@canonical.com> | 2012-02-09 16:15:56 -0500 |
commit | 4aff2ee4d6f077e06159c92ab05c0f2ea527c6fa (patch) | |
tree | 40951a60046f184794a011a498187053e8ad2a92 /NEWS.txt | |
parent | caeb168dc47470607b3cd091e1d35db45c089385 (diff) |
As a last-ditch attempt to turn data into Unicode, use errors=replace instead of errors=strict.
Diffstat (limited to 'NEWS.txt')
-rw-r--r-- | NEWS.txt | 6 |
1 files changed, 6 insertions, 0 deletions
@@ -20,6 +20,12 @@ * Unicode, Dammit now detects the encoding in HTML 5-style <meta> tags like <meta charset="utf-8" />. [bug=837268] +* If Unicode, Dammit can't figure out a consistent encoding for a + page, it will try each of its guesses again, with errors="replace" + instead of errors="strict". This may mean that some data gets + replaced with REPLACEMENT CHARACTER, but at least most of it will + get turned into Unicode. [bug=754903] + * Patched over a bug in html5lib (?) that was crashing Beautiful Soup on certain kinds of markup. [bug=838800] |