summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2012-07-03Mentioned cchardet in docs.Leonard Richardson
2012-07-03When sniffing encodings, if the cchardet library is installed, use it ↵Leonard Richardson
instead of chardet. It's much faster. [bug=1020748]
2012-07-03Use logging.warning() instead of warning.warn() to notify the user that ↵Leonard Richardson
characters were replaced with REPLACEMENT CHARACTER. [bug=1013862]
2012-07-03Prep for release.Leonard Richardson
2012-07-03get_text() now returns an empty Unicode string if there is no text, rather ↵Leonard Richardson
than an empty bytestring. [bug=1020387]
2012-07-03Added test for bug 1020300.Leonard Richardson
2012-07-02Fixed a typo that made parsing much slower than it should have been. ↵Leonard Richardson
[bug=1020268]
2012-07-02Correctly handle closing tags with an XML namespace declared. Patch by ↵Leonard Richardson
Andreas Kostyrka. [bug=1019635]
2012-06-30Fixed an html5lib tree builder crash which happened when html5libLeonard Richardson
moved a tag with a multivalued attribute from one part of the tree to another. [bug=1019603]
2012-06-10Made it clear in the doc that Beautiful Soup calls search() on regular ↵Leonard Richardson
expressions, not match()
2012-05-29Removed breakpoints.Leonard Richardson
2012-05-29Prep for release.Leonard Richardson
2012-05-24 Fixed the inability to search for non-ASCII attributeLeonard Richardson
values. [bug=1003974] This caused a major refactoring of the search code. All the tests pass, but it's possible that some searches will behave differently.
2012-05-24Fixed the basic failure in [bug=1003974], but not more advanced cases.Leonard Richardson
2012-05-24 Fixed some edge-case bugs having to do with inserting an elementLeonard Richardson
into a tag it's already inside, and replacing one of a tag's children with another. [bug=997529]
2012-05-24Fixed a bug with the lxml treebuilder that prevented the user from adding ↵Leonard Richardson
attributes to a tag that didn't originally have any. [bug=1002378] Thanks to Oliver Beattie for the patch.
2012-05-24Comments, processing instructions, document type declarations, and markup ↵Leonard Richardson
declarations are now treated as preformatted strings, the way CData blocks are. [bug=1001025] Also in this commit: renamed detwingle method to detwingle().
2012-05-03Fixed the handling of " with the built-in parser. [bug=993871]Leonard Richardson
2012-04-27Fixed NEWS.Leonard Richardson
2012-04-27Added experimental support for fixing Windows-1252 characters embedded in ↵Leonard Richardson
UTF-8 documents.
2012-04-27Tweaked doc.Leonard Richardson
2012-04-27Removed completed TODO.Leonard Richardson
2012-04-27Prep for release.Leonard Richardson
2012-04-26Added a new method, wrap().Leonard Richardson
2012-04-26Renamed replace_with_children() to the jQuery name, unwrap().Leonard Richardson
2012-04-26Fixed a bug in decoding data that contained a byte-order mark, such as data ↵Leonard Richardson
encoded in UTF-16LE. [bug=988980]
2012-04-26Upon document generation, CData objects are no longer run through the ↵Leonard Richardson
formatter. [bug=988905]
2012-04-26The test suite now passes when lxml is not installed, whether or not ↵Leonard Richardson
html5lib is installed. [bug=987004]
2012-04-26Fixed test failure when lxml is not installed.Leonard Richardson
2012-04-18Got rid of contains_substitutions.Leonard Richardson
2012-04-18Made encoding substitution in <meta> tags completely transparent (no more ↵Leonard Richardson
%SOUP-ENCODING%).
2012-04-18Changed wording slightly.Leonard Richardson
2012-04-18Print a warning on HTMLParseErrors to let people know they should install an ↵Leonard Richardson
external parser.
2012-04-18Fixed a bug that made the HTMLParser treebuilder generate XML definitions ↵Leonard Richardson
ending with two question marks instead of one. [bug=984258]
2012-04-16Prep for release.Leonard Richardson
2012-04-16Doc update.Leonard Richardson
2012-04-16Unicode, Dammit now has an option to turn MS smart quotes into ASCII characters.Leonard Richardson
2012-04-16Attribute values are now run through the provided output formatter. ↵Leonard Richardson
Previously they were always run through the 'minimal' formatter. [bug=980237]
2012-04-16 Fixed a bug with the string setter that moved a string around theLeonard Richardson
tree instead of copying it. [bug=983050]
2012-04-16Give a more useful error when the user tries to run the Python 2 version of ↵Leonard Richardson
BS under Python 3.
2012-04-11Added more common errors to doc.Leonard Richardson
2012-04-11Added renderContents back.Leonard Richardson
2012-04-07Have objects_was_parsed set the previous element's next_element if possible. ↵Leonard Richardson
[bug=975926]
2012-04-03Prep for release.Leonard Richardson
2012-04-03Got rid of the 4.0.2 workaround for HTML documents--it was unnecessary and ↵Leonard Richardson
the workaround was triggering a (possibly different, but related) bug in lxml. [bug=972466]
2012-04-03Don't split up the markup into chunks when using the lxml HTML parser, which ↵Leonard Richardson
doesn't have the problems of the XML parser.
2012-03-30Corrected typo.Leonard Richardson
2012-03-30Corrected typo.Leonard Richardson
2012-03-30Mentioned the empty-list problem people often encounter.Leonard Richardson
2012-03-30Added a section to the documentation on common errors.Leonard Richardson