Age | Commit message (Collapse) | Author |
|
|
|
warnings to to make them less judgemental about what you ought to
be doing. [bug=1955450]
|
|
|
|
|
|
providing a value for the 'indent' argument to the Formatter
constructor. The 'indent' argument works very similarly to the
argument of the same name in the Python standard library's
json.dump() method. [bug=1955497]
|
|
|
|
to exist anymore and was never put up on PyPI. (The closest
replacement on PyPI, iconv_codecs, is GPL-licensed, so we can't use it.)
|
|
(https://pypi.org/project/charset-normalizer/) is installed, Beautiful
Soup will use it to detect the character sets of incoming documents.
This is also the module used by newer versions of the Requests library.
For the sake of backwards compatibility, chardet and cchardet both take
precedence if installed. [bug=1955346]
|
|
tree builder. [bug=1934003]
|
|
parsed, so that CSS selectors that use namespaces will do the right
thing more often. [bug=1946243]
|
|
|
|
looks like XML but not XHTML. [bug=1939121]
|
|
|
|
specifically checks that text is an alias for string.
|
|
commit to demonstrate that the renaming doesn't break anything. [bug=1947038]
|
|
(https://bugs.launchpad.net/lxml/+bug/1948551) that caused
problems when parsing a Unicode string beginning with BYTE ORDER MARK.
[bug=1947768]
|
|
html5lib parser. [bug=1948488]
|
|
|
|
|
|
to make it possible to treat ruby text specially in get_text() calls.
[bug=1941980]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
and can be used to replace a single element with a sequence of elements.
Patch by Bill Chandos.
|
|
found in the HTML5 spec in much the same way that the html5lib
tree builder does. Note that the lxml tree builder still handles
named entities differently. [bug=1924908]
|
|
suite.
|
|
method, as well as the properties .strings and
.stripped_strings. These methods will either return the string
itself, or nothing, so the only reason to use this is when iterating
over a list of mixed Tag and NavigableString objects. [bug=1904309]
|
|
empty string as HTML boolean attributes. Previously (and in other
formatters), an attribute value must be set as None to be treated as
a boolean attribute. In a future release, I plan to also give this
behavior to the 'html' formatter. Patch by Isaac Muse. [bug=1915424]
|
|
depending on the type of tag. The change is visible with HTML tags
like <script>, <style>, and <template>. Starting in 4.9.0, methods
like get_text() returned no results on such tags, because the
contents of those tags are not considered 'text' within the document
as a whole.
But a user who calls script.get_text() is working from a different
definition of 'text' than a user who calls div.get_text()--otherwise
there would be no need to call script.get_text() at all. In 4.10.0,
the contents of (e.g.) a <script> tag are considered 'text' during a
get_text() call on the tag itself, but not considered 'text' during
a get_text() call on the tag's parent.
Because of this change, calling get_text() on each child of a tag
may now return a different result than calling get_text() on the tag
itself. That's because different tags now have different
understandings of what counts as 'text'. [bug=1906226] [bug=1868861]
|
|
single tag may contain strings with different containers; such as
the <template> tag, which may contain both TemplateString objects
and Comment objects. [bug=1913406]
|
|
EncodingDetector, based on the order of precedence defined in the
HTML5 spec, starting at:
https://html.spec.whatwg.org/multipage/parsing.html#parsing-with-a-known-character-encoding
Encodings in 'known_definite_encodings' are tried first, then
byte-order-mark sniffing is run, then encodings in 'user_encodings'
are tried. The old argument, 'override_encodings', is now a
deprecated alias for 'known_definite_encodings'.
This changes the default behavior of the html.parser and lxml tree
builders, in a way that may slightly improve encoding
detection but will probably have no effect. [bug=1889014]
|
|
tree construction by 2%. Patch by Morotti. [bug=1899358]
|
|
the name of a regular file) is passed as markup into the BeautifulSoup
constructor. [bug=1913628]
|
|
namespaced attribute is the empty string, as opposed to
None. [bug=1915583]
|
|
|
|
|
|
|
|
|
|
searching the parse tree. Patch by Morotti. [bug=1898212]
|
|
|
|
|
|
|
|
a Tag, rather than a list, into Tag.extend(). [bug=1885710]
|
|
(which are not implemented) to match PageElement.insert_before and
insert_after, quieting warnings in some IDEs. [bug=1897120]
|
|
PEP 508. Patch by Mike Nerone. [bug=1893696]
|