Age | Commit message (Collapse) | Author | |
---|---|---|---|
2015-06-27 | Added an exclude_encodings argument to UnicodeDammit and to the | Leonard Richardson | |
Beautiful Soup constructor, which lets you prohibit the detection of an encoding that you know is wrong. [bug=1469408] | |||
2015-06-26 | Added a sanity check helper method that makes sure all the elements of a ↵ | Leonard Richardson | |
tree are properly connected via .next_element and .previous_element. | |||
2015-06-25 | Fixed a crash in Unicode, Dammit's encoding detector when the name | Leonard Richardson | |
of the encoding itself contained invalid bytes. [bug=1360913] | |||
2013-10-02 | Fixed a bug that caused Unicode data put into UnicodeDammit to | Leonard Richardson | |
return None instead of the original data. [bug=1214983] | |||
2013-06-03 | Inlined some commonly called code to save a function call. | Leonard Richardson | |
2013-06-03 | Limit how much of the document is searched via regular expression for a ↵ | Leonard Richardson | |
declared encoding. | |||
2013-06-02 | Turns out we had two bits of code to strip byte-order marks. | Leonard Richardson | |
2013-06-02 | It turns out most of the untested code wasn't doing anything useful. | Leonard Richardson | |
2013-05-31 | Create a new lxml parser object for every new parsing strategy. | Leonard Richardson | |
2013-05-30 | Refactored code a bit. | Leonard Richardson | |
2013-05-30 | Split out the code that guesses at encodings from the code that tries to ↵ | Leonard Richardson | |
decode a bytestring based on those encodings. This is necessary because lxml wants to do the decoding itself. | |||
2013-05-20 | The default XML formatter will now replace ampersands even if they appear to ↵ | Leonard Richardson | |
be part of entities. That is, "<" will become "&lt;".[bug=1182183] | |||
2012-11-03 | Doc fixes. | Leonard Richardson | |
2012-08-17 | Fixed cchardet import. | Leonard Richardson | |
2012-07-03 | Mentioned cchardet in docs. | Leonard Richardson | |
2012-07-03 | When sniffing encodings, if the cchardet library is installed, use it ↵ | Leonard Richardson | |
instead of chardet. It's much faster. [bug=1020748] | |||
2012-07-03 | Use logging.warning() instead of warning.warn() to notify the user that ↵ | Leonard Richardson | |
characters were replaced with REPLACEMENT CHARACTER. [bug=1013862] | |||
2012-05-24 | Comments, processing instructions, document type declarations, and markup ↵ | Leonard Richardson | |
declarations are now treated as preformatted strings, the way CData blocks are. [bug=1001025] Also in this commit: renamed detwingle method to detwingle(). | |||
2012-05-03 | Fixed the handling of " with the built-in parser. [bug=993871] | Leonard Richardson | |
2012-04-27 | Added experimental support for fixing Windows-1252 characters embedded in ↵ | Leonard Richardson | |
UTF-8 documents. | |||
2012-04-26 | Fixed a bug in decoding data that contained a byte-order mark, such as data ↵ | Leonard Richardson | |
encoded in UTF-16LE. [bug=988980] | |||
2012-04-16 | Unicode, Dammit now has an option to turn MS smart quotes into ASCII characters. | Leonard Richardson | |
2012-04-16 | Attribute values are now run through the provided output formatter. ↵ | Leonard Richardson | |
Previously they were always run through the 'minimal' formatter. [bug=980237] | |||
2012-02-16 | Issue a warning if characters were replaced with REPLACEMENT CHARACTER ↵ | Leonard Richardson | |
during Unicode conversion. | |||
2012-02-09 | As a last-ditch attempt to turn data into Unicode, use errors=replace ↵ | Leonard Richardson | |
instead of errors=strict. | |||
2012-02-09 | Unicode, Dammit now detects the encoding in HTML 5-style <meta> tags like ↵ | Leonard Richardson | |
<meta charset="utf-8" />. [bug=837268] | |||
2012-02-09 | Minor Unicode, Dammit cleanup. | Leonard Richardson | |
2012-02-09 | Improved Unicode, Dammit's behavior when you give it Unicode to begin with. | Leonard Richardson | |
2011-06-29 | Various changes so most tests pass on Python 3. | Thomas Kluyver | |
2011-05-21 | OK, figured that out. | Leonard Richardson | |
2011-05-21 | Changed dammit.py to require fewer changes to be Python 3 compatible. | Leonard Richardson | |
2011-03-05 | PEP8ifying | Aaron DeVore | |
2011-02-27 | Added a tree builder for the built-in HTMLParser, and tests. | Leonard Richardson | |