diff options
author | Leonard Richardson <leonard.richardson@canonical.com> | 2013-06-02 22:19:37 -0400 |
---|---|---|
committer | Leonard Richardson <leonard.richardson@canonical.com> | 2013-06-02 22:19:37 -0400 |
commit | 4a9444ac0b74fbf84cf86b9fcf6055c85e65f62a (patch) | |
tree | 570cbcb2c9ab9cf458edee87490afeffd8377560 /NEWS.txt | |
parent | 11dad27424b319a2034f59f5a7f48286551102d0 (diff) | |
parent | 4f9a654766df9ddd05e3ef274b4715b42668724f (diff) |
Merged in big encoding-detection refactoring branch.
Diffstat (limited to 'NEWS.txt')
-rw-r--r-- | NEWS.txt | 18 |
1 files changed, 18 insertions, 0 deletions
@@ -1,3 +1,21 @@ += 4.3.0 (Unreleased) = + +* Instead of converting incoming data to Unicode and feeding it to the + lxml tree builder, Beautiful Soup now makes successive guesses at + the encoding of the incoming data, and tells lxml to parse the data + as that encoding. This improves performance and avoids an issue in + which lxml was refusing to parse strings because they were Unicode + strings. + + This required a major overhaul of the tree builder architecture. If + you wrote your own tree builder and didn't tell me, you'll need to + modify your prepare_markup() method. + +* The UnicodeDammit code that makes guesses at encodings has been + split into its own class, EncodingDetector. A lot of apparently + redundant code has been removed from Unicode, Dammit, and some + undocumented features have also been removed. + = 4.2.1 (20130531) = * The default XML formatter will now replace ampersands even if they |