summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorLeonard Richardson <leonard.richardson@canonical.com>2011-02-27 20:21:18 -0500
committerLeonard Richardson <leonard.richardson@canonical.com>2011-02-27 20:21:18 -0500
commit57c9ba5583abb7209a1613a54c5c55f7c779a88f (patch)
tree466ab17969cb3cd40b2359e24535d2929bb898a9
parent63dc8117e8f396b25688623c7f1920b4f0911373 (diff)
Prep for an alpha release.
-rw-r--r--CHANGELOG25
-rw-r--r--README.txt5
2 files changed, 24 insertions, 6 deletions
diff --git a/CHANGELOG b/CHANGELOG
index 4449279..6d05ae4 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -1,5 +1,22 @@
= 4.0 =
+This is a nearly-complete rewrite that removes Beautiful Soup's custom
+HTML parser in favor of a system that lets you write a little glue
+code and plug in whatever HTML or XML parser you want.
+
+Beautiful Soup 4.0 comes with glue code for four parsers: an Python's
+HTMLParser, lxml's HTML and XML parsers, and html5lib's HTML
+parser. HTMLParser is the default, but I recommend you install one of
+the other parsers, or you'll have problems handling real-world HTML.
+
+== The module name has changed ==
+
+Previously you imported the BeautifulSoup class from a module also
+called BeautifulSoup. To save keystrokes and make it clear which
+version of the API is in use, the module is now called 'bs4':
+
+ >>> from bs4 import BeautifulSoup
+
== Better method names ==
Methods have been renamed to comply with PEP 8. The old names still
@@ -25,7 +42,6 @@ So have some arguments to popular methods:
* BeautifulSoup(parseOnlyThese=...) -> BeautifulSoup(parse_only=...)
* BeautifulSoup(fromEncoding=...) -> BeautifulSoup(from_encoding=...)
- * Tag.encode(prettyPrint=...) -> Tag.encode(pretty_print=...)
== Generators are now properties ==
@@ -77,12 +93,13 @@ being an empty-element tag.
An HTML or XML entity is always converted into the corresponding
Unicode character. There are no longer any smartQuotesTo or
-convert_entities arguments. (Unicode Dammit still has smart_quotes_to,
-but the default is now to turn smart quotes into Unicode.)
+convertEntities arguments. (Unicode, Dammit still has smart_quotes_to,
+but its default is now to turn smart quotes into Unicode.)
== CDATA sections are normal text, if they're understood at all. ==
-Currently, both HTML parsers ignore CDATA sections in markup:
+Currently, the lxml and html5lib HTML parsers ignore CDATA sections in
+markup:
<p><![CDATA[foo]]></p> => <p></p>
diff --git a/README.txt b/README.txt
index 6e789c2..8baa022 100644
--- a/README.txt
+++ b/README.txt
@@ -1,8 +1,9 @@
= About Beautiful Soup 4 =
Earlier versions of Beautiful Soup included a custom HTML
-parser. Beautiful Soup 4 does not include a parser. You'll need to
-install either lxml or html5lib.
+parser. Beautiful Soup 4 uses Python's default HTMLParser, which does
+fairly poorly on real-world HTML. By installing lxml or html5lib you
+can get more accurate parsing and possibly better performance as well.
= Introduction =