= 4.0 = == Better method names == Methods have been renamed to comply with PEP 8. The old names still work. Here are the renames: * findAll -> find_all * findAllNext -> find_all_next * findAllPrevious -> find_all_previous * findNext -> find_next * findNextSibling -> find_next_sibling * findNextSiblings -> find_next_siblings * findParent -> find_parent * findParents -> find_parents * findPrevious -> find_previous * findPreviousSibling -> find_previous_sibling * findPreviousSiblings -> find_previous_siblings Some attributes have also been renamed: * Tag.isSelfClosing -> Tag.is_empty_element So have some arguments to popular methods: * BeautifulSoup(parseOnlyThese=...) -> BeautifulSoup(parse_only=...) * BeautifulSoup(fromEncoding=...) -> BeautifulSoup(from_encoding=...) * Tag.encode(prettyPrint=...) -> Tag.encode(pretty_print=...) == Generators are now properties == The generators have been given more sensible (and PEP 8-compliant) names, and turned into properties: * childGenerator() -> children * nextGenerator() -> next_elements * nextSiblingGenerator() -> next_siblings * previousGenerator() -> previous_elements * previousSiblingGenerator() -> previous_siblings * recursiveChildGenerator() -> recursive_children * parentGenerator() -> parents So instead of this: for parent in tag.parentGenerator(): ... You can write this: for parent in tag.parents: ... (But the old code will still work.) == tag.string is recursive == tag.string now operates recursively. If tag A contains a single tag B and nothing else, then A.string is the same as B.string. So: foo The value of a.string used to be None, and now it's "foo". == Empty-element tags == Beautiful Soup's handling of empty-element tags (aka self-closing tags) has been improved, especially when parsing XML. Previously you had to explicitly specify a list of empty-element tags when parsing XML. You can still do that, but if you don't, Beautiful Soup now considers any empty tag to be an empty-element tag. The determination of empty-element-ness is now made at runtime rather than parse time. If you add a child to an empty-element tag, it stops being an empty-element tag. == Entities are always converted to Unicode == An HTML or XML entity is always converted into the corresponding Unicode character. There are no longer any smartQuotesTo or convert_entities arguments. (Unicode Dammit still has smart_quotes_to, but the default is now to turn smart quotes into Unicode.) == CDATA sections are normal text, if they're understood at all. == Currently, both HTML parsers ignore CDATA sections in markup:
=> A future version of html5lib will turn CDATA sections into text nodes, but only within tags like