Prep for an alpha release.

author: Leonard Richardson <leonard.richardson@canonical.com> 2011-02-27 20:21:18 -0500
committer: Leonard Richardson <leonard.richardson@canonical.com> 2011-02-27 20:21:18 -0500
commit: 57c9ba5583abb7209a1613a54c5c55f7c779a88f (patch)
tree: 466ab17969cb3cd40b2359e24535d2929bb898a9
parent: 63dc8117e8f396b25688623c7f1920b4f0911373 (diff)
2 files changed, 24 insertions, 6 deletions
diff --git a/CHANGELOG b/CHANGELOG
index 4449279..6d05ae4 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -1,5 +1,22 @@
 = 4.0 =
 
+This is a nearly-complete rewrite that removes Beautiful Soup's custom
+HTML parser in favor of a system that lets you write a little glue
+code and plug in whatever HTML or XML parser you want.
+
+Beautiful Soup 4.0 comes with glue code for four parsers: an Python's
+HTMLParser, lxml's HTML and XML parsers, and html5lib's HTML
+parser. HTMLParser is the default, but I recommend you install one of
+the other parsers, or you'll have problems handling real-world HTML.
+
+== The module name has changed ==
+
+Previously you imported the BeautifulSoup class from a module also
+called BeautifulSoup. To save keystrokes and make it clear which
+version of the API is in use, the module is now called 'bs4':
+
+    >>> from bs4 import BeautifulSoup
+
 == Better method names ==
 
 Methods have been renamed to comply with PEP 8. The old names still
@@ -25,7 +42,6 @@ So have some arguments to popular methods:
 
  * BeautifulSoup(parseOnlyThese=...) -> BeautifulSoup(parse_only=...)
  * BeautifulSoup(fromEncoding=...) -> BeautifulSoup(from_encoding=...)
- * Tag.encode(prettyPrint=...) -> Tag.encode(pretty_print=...)
 
 == Generators are now properties ==
 
@@ -77,12 +93,13 @@ being an empty-element tag.
 
 An HTML or XML entity is always converted into the corresponding
 Unicode character. There are no longer any smartQuotesTo or
-convert_entities arguments. (Unicode Dammit still has smart_quotes_to,
-but the default is now to turn smart quotes into Unicode.)
+convertEntities arguments. (Unicode, Dammit still has smart_quotes_to,
+but its default is now to turn smart quotes into Unicode.)
 
 == CDATA sections are normal text, if they're understood at all. ==
 
-Currently, both HTML parsers ignore CDATA sections in markup:
+Currently, the lxml and html5lib HTML parsers ignore CDATA sections in
+markup:
 
  <p><![CDATA[foo]]></p> => <p></p>
 
diff --git a/README.txt b/README.txt
index 6e789c2..8baa022 100644
--- a/README.txt
+++ b/README.txt
@@ -1,8 +1,9 @@
 = About Beautiful Soup 4 =
 
 Earlier versions of Beautiful Soup included a custom HTML
-parser. Beautiful Soup 4 does not include a parser. You'll need to
-install either lxml or html5lib.
+parser. Beautiful Soup 4 uses Python's default HTMLParser, which does
+fairly poorly on real-world HTML. By installing lxml or html5lib you
+can get more accurate parsing and possibly better performance as well.
 
 = Introduction =
author	Leonard Richardson <leonard.richardson@canonical.com>	2011-02-27 20:21:18 -0500
committer	Leonard Richardson <leonard.richardson@canonical.com>	2011-02-27 20:21:18 -0500
commit	57c9ba5583abb7209a1613a54c5c55f7c779a88f (patch)
tree	466ab17969cb3cd40b2359e24535d2929bb898a9
parent	63dc8117e8f396b25688623c7f1920b4f0911373 (diff)