summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2013-08-15Make sure the optimized find_all() ResultSets actually contain the right data.Leonard Richardson
2013-08-13* Fixed yet another problem with the html5lib tree builder, caused byLeonard Richardson
html5lib's tendency to rearrange the tree during parsing. [bug=1189267]
2013-08-12Prep for release.Leonard Richardson
2013-08-12Fixed incorrect superclass in super() Call.Leonard Richardson
2013-08-12All find_all calls should now return a ResultSet object. Patch byLeonard Richardson
Aaron DeVore. [bug=1194034]
2013-08-12A little cleanup.Leonard Richardson
2013-06-03Updated NEWS.Leonard Richardson
2013-06-03A NavigableString object now has an immutable '.name' property whoseLeonard Richardson
+ value is always None. This makes it easier to iterate over a mixed + list of tags and strings without having to check whether each + element is a tag or a string.
2013-06-03_last_descendant can be optimized in some cases.Leonard Richardson
2013-06-03Save another Element creation.Leonard Richardson
2013-06-03Improved performance for html5lib.Leonard Richardson
2013-06-03Added raw html5lib to the list of parsers that get tested.Leonard Richardson
2013-06-03Changed _popToTag to run through a single range instead of two.Leonard Richardson
2013-06-03Improved _popToTag a tiny bit.Leonard Richardson
2013-06-03Inlined some commonly called code to save a function call.Leonard Richardson
2013-06-03Limit how much of the document is searched via regular expression for a ↵Leonard Richardson
declared encoding.
2013-06-03Improved performance of _replace_cdata_list_attribute_values, and greatly ↵Leonard Richardson
reduced the number of times it is called.
2013-06-03Made it a lot faster to check whether whitespace is being preserved.Leonard Richardson
2013-06-03Put the more frequently-used ASCII spaces in front.Leonard Richardson
2013-06-03Wrote a more efficient replacement for string.translate() when checking ↵Leonard Richardson
whether a string is nothing but ASCII spaces.
2013-06-03Let's get some profiling going.Leonard Richardson
2013-06-03Test that the filename warning isn't given unless the file actually exists ↵Leonard Richardson
on disk.
2013-06-03Beautiful Soup will issue a warning if instead of markup you pass itLeonard Richardson
a URL or the name of a file on disk (a common beginner mistake).
2013-06-02Merged in big encoding-detection refactoring branch.Leonard Richardson
2013-06-02Turns out we had two bits of code to strip byte-order marks.Leonard Richardson
2013-06-02It turns out most of the untested code wasn't doing anything useful.Leonard Richardson
2013-06-02Treat an lxml ParserError as a ParserRejectedMarkup.Leonard Richardson
2013-05-31Prep for release.Leonard Richardson
2013-05-31The html.parser treebuilder can now handle numeric attributes inLeonard Richardson
text when the hexidecimal name of the attribute starts with a capital X.
2013-05-31Reverted the patch that gives NavigableString a .name property, because ↵Leonard Richardson
that's too big an API change for a bugfix release.
2013-05-31Create a new lxml parser object for every new parsing strategy.Leonard Richardson
2013-05-30Refactored code a bit.Leonard Richardson
2013-05-30Split out the code that guesses at encodings from the code that tries to ↵Leonard Richardson
decode a bytestring based on those encodings. This is necessary because lxml wants to do the decoding itself.
2013-05-20The default XML formatter will now replace ampersands even if they appear to ↵Leonard Richardson
be part of entities. That is, "<" will become "<".[bug=1182183]
2013-05-20A NavigableString object now has an immutable '.name' property whoseLeonard Richardson
value is always None. This makes it easier to iterate over a mixed list of tags and strings without having to check whether each element is a tag or a string.
2013-05-20The .previous_element of a BeautifulSoup object is now always None,Leonard Richardson
2013-05-20The .next_element attribute used during parsing was confusingly similar to ↵Leonard Richardson
the .next_element navigation attribute. Renamed the former to _most_recent_element.
2013-05-20Fixed another bug by which the html5lib tree builder could create aLeonard Richardson
disconnected tree. [bug=1182089]
2013-05-20Gave new_string() the ability to create subclasses ofLeonard Richardson
NavigableString. [bug=1181986]
2013-05-20html5lib now supports Python 3. Fixed some Python 2-specificLeonard Richardson
code in the html5lib test suite. [bug=1181624]
2013-05-20Fixed test failures when lxml is not installed.Leonard Richardson
2013-05-15How about actually parsing the same markup with different parsers.Leonard Richardson
2013-05-15Merge.Leonard Richardson
2013-05-14Prep for release.Leonard Richardson
2013-05-14Added diagnostic case for attempting to parse a URL as HTML.Leonard Richardson
2013-05-14Added warning about using NavigableString outside of Beautiful Soup.Leonard Richardson
2013-05-14Added a deprecation warning to has_key().Leonard Richardson
2013-05-09Changed lxml.feed() to handle the eventuality that it may be given a bytestring.Leonard Richardson
2013-05-09Added a basic benchmark function to the diagnose module.Leonard Richardson
2013-05-09Added a diagnostic function for randomly generating a simple, invalid HTML ↵Leonard Richardson
document.