Merged in big encoding-detection refactoring branch.

author: Leonard Richardson <leonard.richardson@canonical.com> 2013-06-02 22:19:37 -0400
committer: Leonard Richardson <leonard.richardson@canonical.com> 2013-06-02 22:19:37 -0400
commit: 4a9444ac0b74fbf84cf86b9fcf6055c85e65f62a (patch)
tree: 570cbcb2c9ab9cf458edee87490afeffd8377560 /NEWS.txt
parent: 11dad27424b319a2034f59f5a7f48286551102d0 (diff)
parent: 4f9a654766df9ddd05e3ef274b4715b42668724f (diff)
1 files changed, 18 insertions, 0 deletions
diff --git a/NEWS.txt b/NEWS.txt
index bb90d04..3d0846f 100644
--- a/NEWS.txt
+++ b/NEWS.txt
@@ -1,3 +1,21 @@
+= 4.3.0 (Unreleased) =
+
+* Instead of converting incoming data to Unicode and feeding it to the
+  lxml tree builder, Beautiful Soup now makes successive guesses at
+  the encoding of the incoming data, and tells lxml to parse the data
+  as that encoding. This improves performance and avoids an issue in
+  which lxml was refusing to parse strings because they were Unicode
+  strings.
+
+  This required a major overhaul of the tree builder architecture. If
+  you wrote your own tree builder and didn't tell me, you'll need to
+  modify your prepare_markup() method.
+
+* The UnicodeDammit code that makes guesses at encodings has been
+  split into its own class, EncodingDetector. A lot of apparently
+  redundant code has been removed from Unicode, Dammit, and some
+  undocumented features have also been removed.
+
 = 4.2.1 (20130531) =
 
 * The default XML formatter will now replace ampersands even if they
author	Leonard Richardson <leonard.richardson@canonical.com>	2013-06-02 22:19:37 -0400
committer	Leonard Richardson <leonard.richardson@canonical.com>	2013-06-02 22:19:37 -0400
commit	4a9444ac0b74fbf84cf86b9fcf6055c85e65f62a (patch)
tree	570cbcb2c9ab9cf458edee87490afeffd8377560 /NEWS.txt
parent	11dad27424b319a2034f59f5a7f48286551102d0 (diff)
parent	4f9a654766df9ddd05e3ef274b4715b42668724f (diff)