summaryrefslogtreecommitdiff
path: root/CHANGELOG
diff options
context:
space:
mode:
authorLeonard Richardson <leonard.richardson@canonical.com>2012-02-08 09:21:39 -0500
committerLeonard Richardson <leonard.richardson@canonical.com>2012-02-08 09:21:39 -0500
commit23ec4e144b4d737e8fb8712e35532bb9f5e67cbf (patch)
tree2a42f41e14eda4b9acac56052a647de8a07fbcfc /CHANGELOG
parent428752ad43d7e9925f8e8b1ff884a41efb451a93 (diff)
Moved around a bunch of metadata.
Diffstat (limited to 'CHANGELOG')
-rw-r--r--CHANGELOG229
1 files changed, 0 insertions, 229 deletions
diff --git a/CHANGELOG b/CHANGELOG
deleted file mode 100644
index b0ad7be..0000000
--- a/CHANGELOG
+++ /dev/null
@@ -1,229 +0,0 @@
-= 4.0 beta 4 =
-
-Added BeautifulSoup.new_string() to go along with BeautifulSoup.new_tag()
-
-BeautifulSoup.new_tag() will follow the rules of whatever tree-builder
-was used to create the original BeautifulSoup object. A new <p> tag
-will look like "<p />" if the soup object was created to parse XML,
-but it will look like "<p></p>" if the soup object was created to
-parse HTML.
-
-We pass in strict=False to html.parser on Python 3, greatly improving
-html.parser's ability to handle bad HTML.
-
-Monkeypatch a serious bug in html.parser that made strict=False
-disastrous on Python 3.2.2.
-
-Replaced the "substitute_html_entities" argument with the "formatter" argument.
-
-Bare ampersands and angle brackets are always converted to XML
-entities unless the user prevents it.
-
-Added PageElement.insert_before().
-
-Added PageElement.insert_after().
-
-Raise an exception when the user tries to do something nonsensical
-like insert a tag into itself.
-
-= 4.0.0b3 =
-
-Beautiful Soup 4 is a nearly-complete rewrite that removes Beautiful
-Soup's custom HTML parser in favor of a system that lets you write a
-little glue code and plug in any HTML or XML parser you want.
-
-Beautiful Soup 4.0 comes with glue code for four parsers:
-
- * Python's standard HTMLParser (html.parser in Python 3)
- * lxml's HTML and XML parsers
- * html5lib's HTML parser
-
-HTMLParser is the default, but I recommend you install lxml if you
-can.
-
-For complete documentation, see the Sphinx documentation in
-bs4/doc/source/. What follows is a summary of the changes from
-Beautiful Soup 3.
-
-=== The module name has changed ===
-
-Previously you imported the BeautifulSoup class from a module also
-called BeautifulSoup. To save keystrokes and make it clear which
-version of the API is in use, the module is now called 'bs4':
-
- >>> from bs4 import BeautifulSoup
-
-=== It works with Python 3 ===
-
-Beautiful Soup 3.1.0 worked with Python 3, but the parser it used was
-so bad that it barely worked at all. Beautiful Soup 4 works with
-Python 3, and since its parser is pluggable, you don't sacrifice
-quality.
-
-Special thanks to Thomas Kluyver and Ezio Melotti for getting Python 3
-support to the finish line. Ezio Melotti is also to thank for greatly
-improving the HTML parser that comes with Python 3.2.
-
-=== CDATA sections are normal text, if they're understood at all. ===
-
-Currently, the lxml and html5lib HTML parsers ignore CDATA sections in
-markup:
-
- <p><![CDATA[foo]]></p> => <p></p>
-
-A future version of html5lib will turn CDATA sections into text nodes,
-but only within tags like <svg> and <math>:
-
- <svg><![CDATA[foo]]></svg> => <p>foo</p>
-
-The default XML parser (which uses lxml behind the scenes) turns CDATA
-sections into ordinary text elements:
-
- <p><![CDATA[foo]]></p> => <p>foo</p>
-
-In theory it's possible to preserve the CDATA sections when using the
-XML parser, but I don't see how to get it to work in practice.
-
-=== Miscellaneous other stuff ===
-
-If the BeautifulSoup instance has .is_xml set to True, an appropriate
-XML declaration will be emitted when the tree is transformed into a
-string:
-
- <?xml version="1.0" encoding="utf-8">
- <markup>
- ...
- </markup>
-
-The ['lxml', 'xml'] tree builder sets .is_xml to True; the other tree
-builders set it to False. If you want to parse XHTML with an HTML
-parser, you can set it manually.
-
-
-= 3.2.0 =
-
-The 3.1 series wasn't very useful, so I renamed the 3.0 series to 3.2
-to make it obvious which one you should use.
-
-= 3.1.0 =
-
-A hybrid version that supports 2.4 and can be automatically converted
-to run under Python 3.0. There are three backwards-incompatible
-changes you should be aware of, but no new features or deliberate
-behavior changes.
-
-1. str() may no longer do what you want. This is because the meaning
-of str() inverts between Python 2 and 3; in Python 2 it gives you a
-byte string, in Python 3 it gives you a Unicode string.
-
-The effect of this is that you can't pass an encoding to .__str__
-anymore. Use encode() to get a string and decode() to get Unicode, and
-you'll be ready (well, readier) for Python 3.
-
-2. Beautiful Soup is now based on HTMLParser rather than SGMLParser,
-which is gone in Python 3. There's some bad HTML that SGMLParser
-handled but HTMLParser doesn't, usually to do with attribute values
-that aren't closed or have brackets inside them:
-
- <a href="foo</a>, </a><a href="bar">baz</a>
- <a b="<a>">', '<a b="&lt;a&gt;"></a><a>"></a>
-
-A later version of Beautiful Soup will allow you to plug in different
-parsers to make tradeoffs between speed and the ability to handle bad
-HTML.
-
-3. In Python 3 (but not Python 2), HTMLParser converts entities within
-attributes to the corresponding Unicode characters. In Python 2 it's
-possible to parse this string and leave the &eacute; intact.
-
- <a href="http://crummy.com?sacr&eacute;&bleu">
-
-In Python 3, the &eacute; is always converted to \xe9 during
-parsing.
-
-
-= 3.0.7a =
-
-Added an import that makes BS work in Python 2.3.
-
-
-= 3.0.7 =
-
-Fixed a UnicodeDecodeError when unpickling documents that contain
-non-ASCII characters.
-
-Fixed a TypeError that occured in some circumstances when a tag
-contained no text.
-
-Jump through hoops to avoid the use of chardet, which can be extremely
-slow in some circumstances. UTF-8 documents should never trigger the
-use of chardet.
-
-Whitespace is preserved inside <pre> and <textarea> tags that contain
-nothing but whitespace.
-
-Beautiful Soup can now parse a doctype that's scoped to an XML namespace.
-
-
-= 3.0.6 =
-
-Got rid of a very old debug line that prevented chardet from working.
-
-Added a Tag.decompose() method that completely disconnects a tree or a
-subset of a tree, breaking it up into bite-sized pieces that are
-easy for the garbage collecter to collect.
-
-Tag.extract() now returns the tag that was extracted.
-
-Tag.findNext() now does something with the keyword arguments you pass
-it instead of dropping them on the floor.
-
-Fixed a Unicode conversion bug.
-
-Fixed a bug that garbled some <meta> tags when rewriting them.
-
-
-= 3.0.5 =
-
-Soup objects can now be pickled, and copied with copy.deepcopy.
-
-Tag.append now works properly on existing BS objects. (It wasn't
-originally intended for outside use, but it can be now.) (Giles
-Radford)
-
-Passing in a nonexistent encoding will no longer crash the parser on
-Python 2.4 (John Nagle).
-
-Fixed an underlying bug in SGMLParser that thinks ASCII has 255
-characters instead of 127 (John Nagle).
-
-Entities are converted more consistently to Unicode characters.
-
-Entity references in attribute values are now converted to Unicode
-characters when appropriate. Numeric entities are always converted,
-because SGMLParser always converts them outside of attribute values.
-
-ALL_ENTITIES happens to just be the XHTML entities, so I renamed it to
-XHTML_ENTITIES.
-
-The regular expression for bare ampersands was too loose. In some
-cases ampersands were not being escaped. (Sam Ruby?)
-
-Non-breaking spaces and other special Unicode space characters are no
-longer folded to ASCII spaces. (Robert Leftwich)
-
-Information inside a TEXTAREA tag is now parsed literally, not as HTML
-tags. TEXTAREA now works exactly the same way as SCRIPT. (Zephyr Fang)
-
-
-= 3.0.4 =
-
-Fixed a bug that crashed Unicode conversion in some cases.
-
-Fixed a bug that prevented UnicodeDammit from being used as a
-general-purpose data scrubber.
-
-Fixed some unit test failures when running against Python 2.5.
-
-When considering whether to convert smart quotes, UnicodeDammit now
-looks at the original encoding in a case-insensitive way.