summaryrefslogtreecommitdiff
path: root/bs4/dammit.py
AgeCommit message (Expand)Author
2021-12-19Remove a huge list of HTML entities that was only necessary under Python 2.Leonard Richardson
2021-12-19Removed support for the iconv_codec library, which doesn't seemLeonard Richardson
2021-12-19If the charset-normalizer Python moduleLeonard Richardson
2021-09-07Goodbye, Python 2. [bug=1942919]Leonard Richardson
2021-05-31The html.parser tree builder can now handles named entitiesLeonard Richardson
2021-02-13Added a second way to pass specify encodings to UnicodeDammit andLeonard Richardson
2020-05-17Switch entirely to Python 3-style print statements, even in Python 2.Leonard Richardson
2019-12-24Fixed deprecation warning. [bug=1855301]Leonard Richardson
2019-12-24Added docstrings to all public methods in dammit.py.Leonard Richardson
2019-09-02Avoid a crash when trying to detect the declared encoding of aLeonard Richardson
2019-07-07' (which is valid in XML and XHTML, but not HTML 4) is nowLeonard Richardson
2018-12-24Clarified the software license.Leonard Richardson
2018-07-14Fixed code that was causing deprecation warnings in recent Python 3Leonard Richardson
2016-12-19Indentation change contributed by Pranav Salunke.Leonard Richardson
2016-07-17Use a dedicated logger instead of the root logger. [bug=1511661]Leonard Richardson
2016-07-17Use a dedicated logger instead of the root logger. [bug=1511661]Leonard Richardson
2016-07-16Removed imports to pdb, since pdb is not available in some environments. [bug...Leonard Richardson
2016-07-16Rename COPYING.txt to LICENSE. Add a reference to LICENSE in every source file.Leonard Richardson
2016-04-06Minor change. Extra indent for character so it looks nicer.Pranav Salunke
2015-09-28Add a __license__ statement to all source files.Leonard Richardson
2015-07-03Unicode data cannot have a byte-order mark. Returning early stops a warning f...Leonard Richardson
2015-06-27Added an exclude_encodings argument to UnicodeDammit and to theLeonard Richardson
2015-06-26Added a sanity check helper method that makes sure all the elements of a tree...Leonard Richardson
2015-06-25Fixed a crash in Unicode, Dammit's encoding detector when the nameLeonard Richardson
2013-10-02Fixed a bug that caused Unicode data put into UnicodeDammit toLeonard Richardson
2013-06-03Inlined some commonly called code to save a function call.Leonard Richardson
2013-06-03Limit how much of the document is searched via regular expression for a decla...Leonard Richardson
2013-06-02Turns out we had two bits of code to strip byte-order marks.Leonard Richardson
2013-06-02It turns out most of the untested code wasn't doing anything useful.Leonard Richardson
2013-05-31Create a new lxml parser object for every new parsing strategy.Leonard Richardson
2013-05-30Refactored code a bit.Leonard Richardson
2013-05-30Split out the code that guesses at encodings from the code that tries to deco...Leonard Richardson
2013-05-20The default XML formatter will now replace ampersands even if they appear to ...Leonard Richardson
2012-11-03Doc fixes.Leonard Richardson
2012-08-17Fixed cchardet import.Leonard Richardson
2012-07-03Mentioned cchardet in docs.Leonard Richardson
2012-07-03When sniffing encodings, if the cchardet library is installed, use it instead...Leonard Richardson
2012-07-03Use logging.warning() instead of warning.warn() to notify the user that chara...Leonard Richardson
2012-05-24Comments, processing instructions, document type declarations, and markup dec...Leonard Richardson
2012-05-03Fixed the handling of " with the built-in parser. [bug=993871]Leonard Richardson
2012-04-27Added experimental support for fixing Windows-1252 characters embedded in UTF...Leonard Richardson
2012-04-26Fixed a bug in decoding data that contained a byte-order mark, such as data e...Leonard Richardson
2012-04-16Unicode, Dammit now has an option to turn MS smart quotes into ASCII characters.Leonard Richardson
2012-04-16Attribute values are now run through the provided output formatter. Previousl...Leonard Richardson
2012-02-16Issue a warning if characters were replaced with REPLACEMENT CHARACTER during...Leonard Richardson
2012-02-09As a last-ditch attempt to turn data into Unicode, use errors=replace instead...Leonard Richardson
2012-02-09Unicode, Dammit now detects the encoding in HTML 5-style <meta> tags like <me...Leonard Richardson
2012-02-09Minor Unicode, Dammit cleanup.Leonard Richardson
2012-02-09Improved Unicode, Dammit's behavior when you give it Unicode to begin with.Leonard Richardson
2011-06-29Various changes so most tests pass on Python 3.Thomas Kluyver