Age | Commit message (Collapse) | Author |
|
|
|
on disk.
|
|
a URL or the name of a file on disk (a common beginner mistake).
|
|
|
|
|
|
decode a bytestring based on those encodings. This is necessary because lxml wants to do the decoding itself.
|
|
be part of entities. That is, "<" will become "&lt;".[bug=1182183]
|
|
|
|
|
|
|
|
non-ASCII characters.
|
|
|
|
instead of chardet. It's much faster. [bug=1020748]
|
|
characters were replaced with REPLACEMENT CHARACTER. [bug=1013862]
|
|
attributes to a tag that didn't originally have any. [bug=1002378] Thanks to Oliver Beattie for the patch.
|
|
UTF-8 documents.
|
|
encoded in UTF-16LE. [bug=988980]
|
|
|
|
%SOUP-ENCODING%).
|
|
|
|
deprecated wrapper around BeautifulSoup.
|
|
|
|
|
|
|
|
|
|
|
|
wouldn't break Beautiful Soup if it changed.
|
|
html.parser.
|
|
during Unicode conversion.
|
|
instead of errors=strict.
|
|
<meta charset="utf-8" />. [bug=837268]
|
|
running one command.
|
|
|
|
|
|
|