summaryrefslogtreecommitdiff
path: root/bs4/builder/_htmlparser.py
AgeCommit message (Collapse)Author
2019-07-21Implemented line number tracking for html5lib.Leonard Richardson
2019-07-21Adapt Chris Mayo's code to track line number and position when using ↵Leonard Richardson
html.parser.
2019-07-07It's now possible to override a TreeBuilder's cdata_list_attributes ↵Leonard Richardson
dictionary by passing in a replacement. None will disable the feature altogether. [bug=1832978]
2018-12-24Clarified the software license.Leonard Richardson
2018-07-28Correctly handle invalid HTML numeric character entities like “Leonard Richardson
which reference code points that are not Unicode code points. Note that this is only fixed when Beautiful Soup is used with the html.parser parser -- html5lib already worked and I couldn't fix it with lxml. [bug=1782933]
2018-07-21Fixed a problem where the html.parser tree builder interpretedLeonard Richardson
a string like '&foo ' as the character entity '&foo;' [bug=1728706]
2018-07-15Stop data loss when encountering an empty numeric entity, andLeonard Richardson
possibly in other cases. Thanks to tos.kamiya for the fix. [bug=1698503]
2018-07-14Stopped HTMLParser from raising an exception in very rare cases ofLeonard Richardson
bad markup. [bug=1708831]
2017-05-06 Improved the handling of empty-element tags like <br> when using theLeonard Richardson
html.parser parser. [bug=1676935]
2016-07-16Removed imports to pdb, since pdb is not available in some environments. ↵Leonard Richardson
[bug=1491700]
2016-07-16Added a separate class for XML processing instructions, which have a ↵Leonard Richardson
slightly different format from SGML processing instructions. [bug=1504383]
2016-07-16Rename COPYING.txt to LICENSE. Add a reference to LICENSE in every source file.Leonard Richardson
2015-06-28 It's now possible to pickle a BeautifulSoup object no matter whichLeonard Richardson
tree builder was used to create it. However, the only tree builder that survives the pickling process is the HTMLParserTreeBuilder ('html.parser'). If you unpickle a BeautifulSoup object created with some other tree builder, soup.builder will be None. [bug=1231545]
2015-06-27Added an exclude_encodings argument to UnicodeDammit and to theLeonard Richardson
Beautiful Soup constructor, which lets you prohibit the detection of an encoding that you know is wrong. [bug=1469408]
2015-06-24Fixed an import error in Python 3.5 caused by the removal of theLeonard Richardson
2015-06-24Made double sure that we don't use the 'strict' constructor argument when ↵Leonard Richardson
it's deprecated. [bug=1341055]
2014-12-11Improved the lxml tree builder's handling of processingLeonard Richardson
instructions. [bug=1294645]
2014-12-07In Python 3.4 and above, set the new convert_charrefs argument toLeonard Richardson
the html.parser constructor to avoid a warning and future failures. Patch by Stefano Revera. [bug=1375721]
2014-12-07Issue a warning if the BeautifulSoup constructor arguments do not explicitly ↵Leonard Richardson
name a parser.
2013-10-01Fixed a bug in which short Unicode input was improperly encoded to ASCII ↵Leonard Richardson
when checking whether or not it was a file on disk. [bug=1227016]
2013-06-02Merged in big encoding-detection refactoring branch.Leonard Richardson
2013-05-31The html.parser treebuilder can now handle numeric attributes inLeonard Richardson
text when the hexidecimal name of the attribute starts with a capital X.
2013-05-31Create a new lxml parser object for every new parsing strategy.Leonard Richardson
2013-05-07Now that lxml's segfault on invalid doctype has been fixed, fix aLeonard Richardson
corresponding problem on the Beautiful Soup end that was previously invisible. [bug=984936]
2012-04-18Changed wording slightly.Leonard Richardson
2012-04-18Print a warning on HTMLParseErrors to let people know they should install an ↵Leonard Richardson
external parser.
2012-04-18Fixed a bug that made the HTMLParser treebuilder generate XML definitions ↵Leonard Richardson
ending with two question marks instead of one. [bug=984258]
2012-02-21Added nsprefix argument to the tag class.Leonard Richardson
2012-02-21Merged from trunk.Leonard Richardson
2012-02-20It's now possible to copy a BeautifulSoup object created with the ↵Leonard Richardson
html.parser treebuilder.
2012-02-20Changd the class structure so that the default parser test class uses ↵Leonard Richardson
html.parser.
2012-02-16It's a start, at least.Leonard Richardson
2012-02-09As a last-ditch attempt to turn data into Unicode, use errors=replace ↵Leonard Richardson
instead of errors=strict.
2012-02-09Minor Unicode, Dammit cleanup.Leonard Richardson
2012-02-06Monkeypatch Python 3.2 versions prior to 3.2.3 to solve some major ↵Leonard Richardson
HTMLParser bugs.
2012-01-20Made it easier to convert BS3 code to BS4.Leonard Richardson
2012-01-20Got the test suite to pass on Python 3.2 (skipping the html5lib stuff, which ↵Leonard Richardson
doesn't seem to have Python 3 support yet.)
2011-02-27Added a tree builder for the built-in HTMLParser, and tests.Leonard Richardson