Age | Commit message (Collapse) | Author | |
---|---|---|---|
2021-11-29 | Do a better job of keeping track of namespaces as an XML document is | Leonard Richardson | |
parsed, so that CSS selectors that use namespaces will do the right thing more often. [bug=1946243] | |||
2021-10-24 | Issue a warning when an HTML parser is used to parse a document that | Leonard Richardson | |
looks like XML but not XHTML. [bug=1939121] | |||
2021-10-23 | Added a workaround for an lxml bug ↵ | Leonard Richardson | |
(https://bugs.launchpad.net/lxml/+bug/1948551) that caused problems when parsing a Unicode string beginning with BYTE ORDER MARK. [bug=1947768] | |||
2021-09-07 | Goodbye, Python 2. [bug=1942919] | Leonard Richardson | |
2021-02-13 | Added a second way to pass specify encodings to UnicodeDammit and | Leonard Richardson | |
EncodingDetector, based on the order of precedence defined in the HTML5 spec, starting at: https://html.spec.whatwg.org/multipage/parsing.html#parsing-with-a-known-character-encoding Encodings in 'known_definite_encodings' are tried first, then byte-order-mark sniffing is run, then encodings in 'user_encodings' are tried. The old argument, 'override_encodings', is now a deprecated alias for 'known_definite_encodings'. This changes the default behavior of the html.parser and lxml tree builders, in a way that may slightly improve encoding detection but will probably have no effect. [bug=1889014] | |||
2019-12-24 | Added docstrings for some but not all tree buidlers. | Leonard Richardson | |
2019-11-11 | Added a Brazilian Portuguese translation by Cezar Peixeiro. | Leonard Richardson | |
2019-09-02 | Avoid a crash when trying to detect the declared encoding of a | Leonard Richardson | |
Unicode document. Raise an explanatory exception when the underlying parser completely rejects the incoming markup. [bug=1838877] | |||
2019-07-21 | Implemented line number tracking for html5lib. | Leonard Richardson | |
2019-07-07 | It's now possible to override a TreeBuilder's cdata_list_attributes ↵ | Leonard Richardson | |
dictionary by passing in a replacement. None will disable the feature altogether. [bug=1832978] | |||
2019-01-06 | Don't track un-prefixed namespaces | Isaac Muse | |
2018-12-24 | Clarified the software license. | Leonard Richardson | |
2018-12-24 | Keep track of the namespace abbreviations found while parsing the document. ↵ | Leonard Richardson | |
This makes select() work most of the time without requiring a value for 'namespaces'. | |||
2018-07-18 | Preserve XML namespaces when they are introduced inside an XML | Leonard Richardson | |
document, not just the ones introduced at the top level. [bug=1718787] | |||
2018-07-14 | Stopped HTMLParser from raising an exception in very rare cases of | Leonard Richardson | |
bad markup. [bug=1708831] | |||
2016-07-30 | Explained why we test both unicode and bytestring processing instructions. | Leonard Richardson | |
2016-07-26 | Fixed a reported (but not duplicated) bug involving processing instructions ↵ | Leonard Richardson | |
fed into the lxml HTML parser. | |||
2016-07-16 | Removed imports to pdb, since pdb is not available in some environments. ↵ | Leonard Richardson | |
[bug=1491700] | |||
2016-07-16 | Added a separate class for XML processing instructions, which have a ↵ | Leonard Richardson | |
slightly different format from SGML processing instructions. [bug=1504383] | |||
2016-07-16 | Rename COPYING.txt to LICENSE. Add a reference to LICENSE in every source file. | Leonard Richardson | |
2015-06-28 | Accept 'xml' as an unambiguous identifier for the lxml XML parser, since ↵ | Leonard Richardson | |
it's the only XML parser supported at the moment. | |||
2015-06-27 | Added an exclude_encodings argument to UnicodeDammit and to the | Leonard Richardson | |
Beautiful Soup constructor, which lets you prohibit the detection of an encoding that you know is wrong. [bug=1469408] | |||
2014-12-11 | Improved the lxml tree builder's handling of processing | Leonard Richardson | |
instructions. [bug=1294645] | |||
2014-12-07 | Tweaked the parser warning. | Leonard Richardson | |
2014-12-07 | Issue a warning if the BeautifulSoup constructor arguments do not explicitly ↵ | Leonard Richardson | |
name a parser. | |||
2013-06-02 | Turns out we had two bits of code to strip byte-order marks. | Leonard Richardson | |
2013-06-02 | It turns out most of the untested code wasn't doing anything useful. | Leonard Richardson | |
2013-06-02 | Treat an lxml ParserError as a ParserRejectedMarkup. | Leonard Richardson | |
2013-05-31 | Create a new lxml parser object for every new parsing strategy. | Leonard Richardson | |
2013-05-09 | Changed lxml.feed() to handle the eventuality that it may be given a bytestring. | Leonard Richardson | |
2013-05-09 | Added a diagnostic function for randomly generating a simple, invalid HTML ↵ | Leonard Richardson | |
document. | |||
2012-10-11 | Fix a bug in the lxml treebuilder which crashed when a tag included | Leonard Richardson | |
an attribute from the predefined xml: namespace. [bug=1065617] | |||
2012-09-28 | Fixed package name. | Leonard Richardson | |
2012-08-16 | Use namespace prefixes for namespaced attribute names, instead of | Leonard Richardson | |
the fully-qualified names given by the lxml parser. [bug=1037597] | |||
2012-05-29 | Removed breakpoints. | Leonard Richardson | |
2012-05-29 | Prep for release. | Leonard Richardson | |
2012-05-24 | Fixed a bug with the lxml treebuilder that prevented the user from adding ↵ | Leonard Richardson | |
attributes to a tag that didn't originally have any. [bug=1002378] Thanks to Oliver Beattie for the patch. | |||
2012-04-03 | Got rid of the 4.0.2 workaround for HTML documents--it was unnecessary and ↵ | Leonard Richardson | |
the workaround was triggering a (possibly different, but related) bug in lxml. [bug=972466] | |||
2012-04-03 | Don't split up the markup into chunks when using the lxml HTML parser, which ↵ | Leonard Richardson | |
doesn't have the problems of the XML parser. | |||
2012-03-24 | Pass data into XMLParser.feed() in chunks. [bug=963880] | Leonard Richardson | |
2012-02-28 | Fixed the generated XML declaration. | Leonard Richardson | |
2012-02-23 | Fixed handling of the closing of namespaced tags. | Leonard Richardson | |
2012-02-23 | Merge from trunk and added tests. | Leonard Richardson | |
2012-02-22 | Added comments. | Leonard Richardson | |
2012-02-22 | Treat a new namespace mapping as a set of attributes on the tag that defines ↵ | Leonard Richardson | |
it, so we don't lose the mappings. | |||
2012-02-21 | Have lxml invert namespace maps as they come in and set each tag's prefix ↵ | Leonard Richardson | |
appropriately. | |||
2012-02-21 | Added nsprefix argument to the tag class. | Leonard Richardson | |
2012-02-16 | It's a start, at least. | Leonard Richardson | |
2012-02-09 | As a last-ditch attempt to turn data into Unicode, use errors=replace ↵ | Leonard Richardson | |
instead of errors=strict. | |||
2012-02-09 | Minor Unicode, Dammit cleanup. | Leonard Richardson | |