From 3ed96ff67abaa06a1784153bc45fd68ffa121872 Mon Sep 17 00:00:00 2001 From: Leonard Richardson Date: Wed, 8 Feb 2012 09:48:00 -0500 Subject: Moved the historical changelog into NEWS. --- NEWS.txt | 391 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++- README.txt | 2 + 2 files changed, 390 insertions(+), 3 deletions(-) diff --git a/NEWS.txt b/NEWS.txt index 564a3fe..8515b80 100644 --- a/NEWS.txt +++ b/NEWS.txt @@ -1,4 +1,4 @@ -= 4.0.0b4 = += 4.0.0b4 (20120208) = * Added BeautifulSoup.new_string() to go along with BeautifulSoup.new_tag() @@ -28,7 +28,7 @@ like insert a tag into itself. -= 4.0.0b3 = += 4.0.0b3 (20120203) = Beautiful Soup 4 is a nearly-complete rewrite that removes Beautiful Soup's custom HTML parser in favor of a system that lets you write a @@ -217,7 +217,6 @@ longer folded to ASCII spaces. (Robert Leftwich) Information inside a TEXTAREA tag is now parsed literally, not as HTML tags. TEXTAREA now works exactly the same way as SCRIPT. (Zephyr Fang) - = 3.0.4 = Fixed a bug that crashed Unicode conversion in some cases. @@ -229,3 +228,389 @@ Fixed some unit test failures when running against Python 2.5. When considering whether to convert smart quotes, UnicodeDammit now looks at the original encoding in a case-insensitive way. + += 3.0.3 (20060606) = + +Beautiful Soup is now usable as a way to clean up invalid XML/HTML (be +sure to pass in an appropriate value for convertEntities, or XML/HTML +entities might stick around that aren't valid in HTML/XML). The result +may not validate, but it should be good enough to not choke a +real-world XML parser. Specifically, the output of a properly +constructed soup object should always be valid as part of an XML +document, but parts may be missing if they were missing in the +original. As always, if the input is valid XML, the output will also +be valid. + += 3.0.2 (20060602) = + +Previously, Beautiful Soup correctly handled attribute values that +contained embedded quotes (sometimes by escaping), but not other kinds +of XML character. Now, it correctly handles or escapes all special XML +characters in attribute values. + +I aliased methods to the 2.x names (fetch, find, findText, etc.) for +backwards compatibility purposes. Those names are deprecated and if I +ever do a 4.0 I will remove them. I will, I tell you! + +Fixed a bug where the findAll method wasn't passing along any keyword +arguments. + +When run from the command line, Beautiful Soup now acts as an HTML +pretty-printer, not an XML pretty-printer. + += 3.0.1 (20060530) = + +Reintroduced the "fetch by CSS class" shortcut. I thought keyword +arguments would replace it, but they don't. You can't call soup('a', +class='foo') because class is a Python keyword. + +If Beautiful Soup encounters a meta tag that declares the encoding, +but a SoupStrainer tells it not to parse that tag, Beautiful Soup will +no longer try to rewrite the meta tag to mention the new +encoding. Basically, this makes SoupStrainers work in real-world +applications instead of crashing the parser. + += 3.0.0 "Who would not give all else for two p" (20060528) = + +This release is not backward-compatible with previous releases. If +you've got code written with a previous version of the library, go +ahead and keep using it, unless one of the features mentioned here +really makes your life easier. Since the library is self-contained, +you can include an old copy of the library in your old applications, +and use the new version for everything else. + +The documentation has been rewritten and greatly expanded with many +more examples. + +Beautiful Soup autodetects the encoding of a document (or uses the one +you specify), and converts it from its native encoding to +Unicode. Internally, it only deals with Unicode strings. When you +print out the document, it converts to UTF-8 (or another encoding you +specify). [Doc reference] + +It's now easy to make large-scale changes to the parse tree without +screwing up the navigation members. The methods are extract, +replaceWith, and insert. [Doc reference. See also Improving Memory +Usage with extract] + +Passing True in as an attribute value gives you tags that have any +value for that attribute. You don't have to create a regular +expression. Passing None for an attribute value gives you tags that +don't have that attribute at all. + +Tag objects now know whether or not they're self-closing. This avoids +the problem where Beautiful Soup thought that tags like
were +self-closing even in XML documents. You can customize the self-closing +tags for a parser object by passing them in as a list of +selfClosingTags: you don't have to subclass anymore. + +There's a new built-in parser, MinimalSoup, which has most of +BeautifulSoup's HTML-specific rules, but no tag nesting rules. [Doc +reference] + +You can use a SoupStrainer to tell Beautiful Soup to parse only part +of a document. This saves time and memory, often making Beautiful Soup +about as fast as a custom-built SGMLParser subclass. [Doc reference, +SoupStrainer reference] + +You can (usually) use keyword arguments instead of passing a +dictionary of attributes to a search method. That is, you can replace +soup(args={"id" : "5"}) with soup(id="5"). You can still use args if +(for instance) you need to find an attribute whose name clashes with +the name of an argument to findAll. [Doc reference: **kwargs attrs] + +The method names have changed to the better method names used in +Rubyful Soup. Instead of find methods and fetch methods, there are +only find methods. Instead of a scheme where you can't remember which +method finds one element and which one finds them all, we have find +and findAll. In general, if the method name mentions All or a plural +noun (eg. findNextSiblings), then it finds many elements +method. Otherwise, it only finds one element. [Doc reference] + +Some of the argument names have been renamed for clarity. For instance +avoidParserProblems is now parserMassage. + +Beautiful Soup no longer implements a feed method. You need to pass a +string or a filehandle into the soup constructor, not with feed after +the soup has been created. There is still a feed method, but it's the +feed method implemented by SGMLParser and calling it will bypass +Beautiful Soup and cause problems. + +The NavigableText class has been renamed to NavigableString. There is +no NavigableUnicodeString anymore, because every string inside a +Beautiful Soup parse tree is a Unicode string. + +findText and fetchText are gone. Just pass a text argument into find +or findAll. + +Null was more trouble than it was worth, so I got rid of it. Anything +that used to return Null now returns None. + +Special XML constructs like comments and CDATA now have their own +NavigableString subclasses, instead of being treated as oddly-formed +data. If you parse a document that contains CDATA and write it back +out, the CDATA will still be there. + +When you're parsing a document, you can get Beautiful Soup to convert +XML or HTML entities into the corresponding Unicode characters. [Doc +reference] + += 2.1.1 (20050918) = + +Fixed a serious performance bug in BeautifulStoneSoup which was +causing parsing to be incredibly slow. + +Corrected several entities that were previously being incorrectly +translated from Microsoft smart-quote-like characters. + +Fixed a bug that was breaking text fetch. + +Fixed a bug that crashed the parser when text chunks that look like +HTML tag names showed up within a SCRIPT tag. + +THEAD, TBODY, and TFOOT tags are now nestable within TABLE +tags. Nested tables should parse more sensibly now. + +BASE is now considered a self-closing tag. + += 2.1.0 "Game, or any other dish?" (20050504) = + +Added a wide variety of new search methods which, given a starting +point inside the tree, follow a particular navigation member (like +nextSibling) over and over again, looking for Tag and NavigableText +objects that match certain criteria. The new methods are findNext, +fetchNext, findPrevious, fetchPrevious, findNextSibling, +fetchNextSiblings, findPreviousSibling, fetchPreviousSiblings, +findParent, and fetchParents. All of these use the same basic code +used by first and fetch, so you can pass your weird ways of matching +things into these methods. + +The fetch method and its derivatives now accept a limit argument. + +You can now pass keyword arguments when calling a Tag object as though +it were a method. + +Fixed a bug that caused all hand-created tags to share a single set of +attributes. + += 2.0.3 (20050501) = + +Fixed Python 2.2 support for iterators. + +Fixed a bug that gave the wrong representation to tags within quote +tags like