summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2013-06-03Changed _popToTag to run through a single range instead of two.Leonard Richardson
2013-06-03Improved _popToTag a tiny bit.Leonard Richardson
2013-06-03Inlined some commonly called code to save a function call.Leonard Richardson
2013-06-03Limit how much of the document is searched via regular expression for a ↵Leonard Richardson
declared encoding.
2013-06-03Improved performance of _replace_cdata_list_attribute_values, and greatly ↵Leonard Richardson
reduced the number of times it is called.
2013-06-03Made it a lot faster to check whether whitespace is being preserved.Leonard Richardson
2013-06-03Put the more frequently-used ASCII spaces in front.Leonard Richardson
2013-06-03Wrote a more efficient replacement for string.translate() when checking ↵Leonard Richardson
whether a string is nothing but ASCII spaces.
2013-06-03Let's get some profiling going.Leonard Richardson
2013-06-03Test that the filename warning isn't given unless the file actually exists ↵Leonard Richardson
on disk.
2013-06-03Beautiful Soup will issue a warning if instead of markup you pass itLeonard Richardson
a URL or the name of a file on disk (a common beginner mistake).
2013-06-02Merged in big encoding-detection refactoring branch.Leonard Richardson
2013-06-02Turns out we had two bits of code to strip byte-order marks.Leonard Richardson
2013-06-02It turns out most of the untested code wasn't doing anything useful.Leonard Richardson
2013-06-02Treat an lxml ParserError as a ParserRejectedMarkup.Leonard Richardson
2013-05-31Prep for release.Leonard Richardson
2013-05-31The html.parser treebuilder can now handle numeric attributes inLeonard Richardson
text when the hexidecimal name of the attribute starts with a capital X.
2013-05-31Reverted the patch that gives NavigableString a .name property, because ↵Leonard Richardson
that's too big an API change for a bugfix release.
2013-05-31Create a new lxml parser object for every new parsing strategy.Leonard Richardson
2013-05-30Refactored code a bit.Leonard Richardson
2013-05-30Split out the code that guesses at encodings from the code that tries to ↵Leonard Richardson
decode a bytestring based on those encodings. This is necessary because lxml wants to do the decoding itself.
2013-05-20The default XML formatter will now replace ampersands even if they appear to ↵Leonard Richardson
be part of entities. That is, "<" will become "<".[bug=1182183]
2013-05-20A NavigableString object now has an immutable '.name' property whoseLeonard Richardson
value is always None. This makes it easier to iterate over a mixed list of tags and strings without having to check whether each element is a tag or a string.
2013-05-20The .previous_element of a BeautifulSoup object is now always None,Leonard Richardson
2013-05-20The .next_element attribute used during parsing was confusingly similar to ↵Leonard Richardson
the .next_element navigation attribute. Renamed the former to _most_recent_element.
2013-05-20Fixed another bug by which the html5lib tree builder could create aLeonard Richardson
disconnected tree. [bug=1182089]
2013-05-20Gave new_string() the ability to create subclasses ofLeonard Richardson
NavigableString. [bug=1181986]
2013-05-20html5lib now supports Python 3. Fixed some Python 2-specificLeonard Richardson
code in the html5lib test suite. [bug=1181624]
2013-05-20Fixed test failures when lxml is not installed.Leonard Richardson
2013-05-15How about actually parsing the same markup with different parsers.Leonard Richardson
2013-05-15Merge.Leonard Richardson
2013-05-14Prep for release.Leonard Richardson
2013-05-14Added diagnostic case for attempting to parse a URL as HTML.Leonard Richardson
2013-05-14Added warning about using NavigableString outside of Beautiful Soup.Leonard Richardson
2013-05-14Added a deprecation warning to has_key().Leonard Richardson
2013-05-09Changed lxml.feed() to handle the eventuality that it may be given a bytestring.Leonard Richardson
2013-05-09Added a basic benchmark function to the diagnose module.Leonard Richardson
2013-05-09Added a diagnostic function for randomly generating a simple, invalid HTML ↵Leonard Richardson
document.
2013-05-09Thanks to data-*, there's now a good use for attrs again. This lets me clean ↵Leonard Richardson
up the docs quite a bit.
2013-05-09Added <body> tag to sample doc so it will work the same on all parsers.Leonard Richardson
2013-05-08Updated docs with new examples.Leonard Richardson
2013-05-08Updated docs with new examples.Leonard Richardson
2013-05-08A CSS selector should never match the same tag twice.Leonard Richardson
2013-05-08Refactored the CSS selector support, and added the sibling combinators.Leonard Richardson
2013-05-08Minor cleanup.Leonard Richardson
2013-05-08Added tests.Leonard Richardson
2013-05-08Fixed terminology.Leonard Richardson
2013-05-08Updated news.Leonard Richardson
2013-05-08Moved select() to Tag. It was always an error to call select() on a string, ↵Leonard Richardson
so there's no reason for it to be in PageElement.
2013-05-08Give the checker the ability to stop the iteration over the generator by ↵Leonard Richardson
raising StopIteration.