summaryrefslogtreecommitdiff
path: root/NEWS.txt
diff options
context:
space:
mode:
authorLeonard Richardson <leonard.richardson@canonical.com>2013-06-02 22:19:37 -0400
committerLeonard Richardson <leonard.richardson@canonical.com>2013-06-02 22:19:37 -0400
commit4a9444ac0b74fbf84cf86b9fcf6055c85e65f62a (patch)
tree570cbcb2c9ab9cf458edee87490afeffd8377560 /NEWS.txt
parent11dad27424b319a2034f59f5a7f48286551102d0 (diff)
parent4f9a654766df9ddd05e3ef274b4715b42668724f (diff)
Merged in big encoding-detection refactoring branch.
Diffstat (limited to 'NEWS.txt')
-rw-r--r--NEWS.txt18
1 files changed, 18 insertions, 0 deletions
diff --git a/NEWS.txt b/NEWS.txt
index bb90d04..3d0846f 100644
--- a/NEWS.txt
+++ b/NEWS.txt
@@ -1,3 +1,21 @@
+= 4.3.0 (Unreleased) =
+
+* Instead of converting incoming data to Unicode and feeding it to the
+ lxml tree builder, Beautiful Soup now makes successive guesses at
+ the encoding of the incoming data, and tells lxml to parse the data
+ as that encoding. This improves performance and avoids an issue in
+ which lxml was refusing to parse strings because they were Unicode
+ strings.
+
+ This required a major overhaul of the tree builder architecture. If
+ you wrote your own tree builder and didn't tell me, you'll need to
+ modify your prepare_markup() method.
+
+* The UnicodeDammit code that makes guesses at encodings has been
+ split into its own class, EncodingDetector. A lot of apparently
+ redundant code has been removed from Unicode, Dammit, and some
+ undocumented features have also been removed.
+
= 4.2.1 (20130531) =
* The default XML formatter will now replace ampersands even if they