Age | Commit message (Collapse) | Author |
|
were markup. Thanks to James Salter for a patch and
test. [bug=1533762]
|
|
installed. [bug=1471359]
|
|
Beautiful Soup constructor, which lets you prohibit the detection of
an encoding that you know is wrong. [bug=1469408]
|
|
of the encoding itself contained invalid bytes. [bug=1360913]
|
|
name a parser.
|
|
return None instead of the original data. [bug=1214983]
|
|
filenames. [bug=1232604]
|
|
when checking whether or not it was a file on
disk. [bug=1227016]
|
|
run by nodetests. [bug=1212445]
|
|
|
|
on disk.
|
|
a URL or the name of a file on disk (a common beginner mistake).
|
|
|
|
|
|
decode a bytestring based on those encodings. This is necessary because lxml wants to do the decoding itself.
|
|
be part of entities. That is, "<" will become "&lt;".[bug=1182183]
|
|
|
|
|
|
|
|
non-ASCII characters.
|
|
|
|
instead of chardet. It's much faster. [bug=1020748]
|
|
characters were replaced with REPLACEMENT CHARACTER. [bug=1013862]
|
|
attributes to a tag that didn't originally have any. [bug=1002378] Thanks to Oliver Beattie for the patch.
|
|
UTF-8 documents.
|
|
encoded in UTF-16LE. [bug=988980]
|
|
|
|
%SOUP-ENCODING%).
|
|
|
|
deprecated wrapper around BeautifulSoup.
|
|
|
|
|
|
|
|
|
|
|
|
wouldn't break Beautiful Soup if it changed.
|
|
html.parser.
|
|
during Unicode conversion.
|
|
instead of errors=strict.
|
|
<meta charset="utf-8" />. [bug=837268]
|
|
running one command.
|
|
|
|
|
|
|