summaryrefslogtreecommitdiff
path: root/bs4/element.py
AgeCommit message (Collapse)Author
2024-11-07dont use or opemasterhaturatu
2023-04-17Fixed a regression such that if you set .hidden on a tag, the tagLeonard Richardson
becomes invisible but its contents are still visible. User manipulation of .hidden is not a documented or supported feature, so don't do this, but it's not too difficult to keep the old behavior working.
2023-04-09Backported a bug fix that knocks a full second off the test run time.Leonard Richardson
2023-03-27Make it possible to pickle a deeply nested BeautifulSoup object.Leonard Richardson
2023-03-27Updated __copy__ docstrings.Leonard Richardson
2023-03-26Implement a proper BeautifulSoup.deepcopy rather than parsing the document ↵Leonard Richardson
again.
2023-03-24Make __copy__ call __deepcopy__ instead of the other way around.Leonard Richardson
2023-03-24Implement nonrecursive versions of copy and deepcopy using the new ↵Leonard Richardson
_event_strem generator.
2023-03-24Simplified the rules for going in and out of string_literal_tag, so less ↵Leonard Richardson
documentation in comments is necessary.
2023-03-24Keep track of the specific tag that put us into string literal mode, and ↵Leonard Richardson
only exit when that particular tag is closed.
2023-03-24Don't indent an empty string. 1084 of 1474 test documents now give identical ↵Leonard Richardson
results between versions.
2023-03-24Using a format string is very slightly slower than just adding all the bits ↵Leonard Richardson
of the string together.
2023-03-23Found and removed accidental calls to find(), greatly improving performance.Leonard Richardson
2023-03-21Reorganize code and rename saxlike, since this isn'Leonard Richardson
2023-03-21Removed old implementation code.Leonard Richardson
2023-03-21Reimplemented the pretty-print algorithm to remove recursive function calls.Leonard Richardson
2023-03-20Make sure PageElement has the known_xml attribute. [bug=2007895]Leonard Richardson
2023-02-03Move the Soup Sieve proxy and its tests into separate files.Leonard Richardson
2023-02-03Consistently use the name 'tag' instead of 'element,' since CSS selectors ↵Leonard Richardson
only operate on tags. Verify that select() and filter() return ResultSets.
2023-02-03Removed redundant whitespace.Leonard Richardson
2023-02-03Added some docstrings and made the return values more consistent.Leonard Richardson
2023-02-02Test implementation.Leonard Richardson
2023-01-27Implemented the more complicated case of providing an appropriate stacklevel ↵Leonard Richardson
for the warning issued when the deprecated 'text' argument is passed in.
2023-01-27Warnings now do their best to provide an appropriate stacklevel,Leonard Richardson
improving the usefulness of the message. [bug=1978744]
2023-01-25Tag.interesting_string_types is now propagated when a tag isLeonard Richardson
copied. [bug=1990400]
2023-01-25Passing a Tag's .contents into PageElement.extend() now works theLeonard Richardson
same way as passing the Tag itself.
2022-04-07Omit untrusted input when issuing warnings.Leonard Richardson
2021-12-21It's now possible to customize the way output is indented byLeonard Richardson
providing a value for the 'indent' argument to the Formatter constructor. The 'indent' argument works very similarly to the argument of the same name in the Python standard library's json.dump() method. [bug=1955497]
2021-11-29Do a better job of keeping track of namespaces as an XML document isLeonard Richardson
parsed, so that CSS selectors that use namespaces will do the right thing more often. [bug=1946243]
2021-10-24Used a warning to formally deprecate the 'text' argument in favor of 'string'.Leonard Richardson
2021-10-23Renamed the 'text' field to 'string' for real. Tests are not changed in this ↵Leonard Richardson
commit to demonstrate that the renaming doesn't break anything. [bug=1947038]
2021-10-11Added special string classes, RubyParenthesisString and RubyTextString,Leonard Richardson
to make it possible to treat ruby text specially in get_text() calls. [bug=1941980]
2021-10-11Broke up some monolithic unit test files.Leonard Richardson
2021-09-07Goodbye, Python 2. [bug=1942919]Leonard Richardson
2021-06-01The 'replace_with()' method now takes a variable number of arguments,Leonard Richardson
and can be used to replace a single element with a sequence of elements. Patch by Bill Chandos.
2021-02-14NavigableString and its subclasses now implement the get_text()Leonard Richardson
method, as well as the properties .strings and .stripped_strings. These methods will either return the string itself, or nothing, so the only reason to use this is when iterating over a list of mixed Tag and NavigableString objects. [bug=1904309]
2021-02-13The behavior of methods like .get_text() and .strings now differsLeonard Richardson
depending on the type of tag. The change is visible with HTML tags like <script>, <style>, and <template>. Starting in 4.9.0, methods like get_text() returned no results on such tags, because the contents of those tags are not considered 'text' within the document as a whole. But a user who calls script.get_text() is working from a different definition of 'text' than a user who calls div.get_text()--otherwise there would be no need to call script.get_text() at all. In 4.10.0, the contents of (e.g.) a <script> tag are considered 'text' during a get_text() call on the tag itself, but not considered 'text' during a get_text() call on the tag's parent. Because of this change, calling get_text() on each child of a tag may now return a different result than calling get_text() on the tag itself. That's because different tags now have different understandings of what counts as 'text'. [bug=1906226] [bug=1868861]
2021-02-13Corrected output when the namespace prefix associated with aLeonard Richardson
namespaced attribute is the empty string, as opposed to None. [bug=1915583]
2020-10-02Implemented a significant performance optimization to the process ofLeonard Richardson
searching the parse tree. Patch by Morotti. [bug=1898212]
2020-09-26Fixed a bug that inconsistently moved elements over when passingLeonard Richardson
a Tag, rather than a list, into Tag.extend(). [bug=1885710]
2020-05-17Switch entirely to Python 3-style print statements, even in Python 2.Leonard Richardson
2020-05-17Added a keyword argument on_duplicate_attribute to theLeonard Richardson
BeautifulSoupHTMLParser constructor (used by the html.parser tree builder) which lets you customize the handling of markup that contains the same attribute more than once, as in: <a href="url1" href="url2"> [bug=1878209]
2020-04-24If you encode a document with a Python-specific encoding likeLeonard Richardson
'unicode_escape', that encoding is no longer mentioned in the final XML or HTML document. Instead, encoding information is omitted or left blank. [bug=1874955]
2020-04-05Embedded CSS and Javascript is now stored in distinct Stylesheet andLeonard Richardson
Script tags, which are ignored by methods like get_text(). This feature is not supported by the html5lib treebuilder. [bug=1868861]
2020-04-04Use an :rtype: reported to work in pycharm.Leonard Richardson
2020-04-04select() always returns a Tag, so be more specific about its return type.Leonard Richardson
2020-03-09Make find() methods return a union type of the two most common PageElements, ↵Leonard Richardson
rather than PageElement itself.
2020-03-05Added a performance optimization to PageElement.extract(). Patch by Arthur ↵Leonard Richardson
Darcet.
2020-01-01API CHANGE - Added PageElement.decomposed, a new property which lets youLeonard Richardson
check whether you've already called decompose() on a Tag or NavigableString.
2019-12-24Added :rtype: to the find method docstrings.Leonard Richardson