diff options
author | Leonard Richardson <leonardr@segfault.org> | 2018-12-24 09:54:10 -0500 |
---|---|---|
committer | Leonard Richardson <leonardr@segfault.org> | 2018-12-24 09:54:10 -0500 |
commit | 50ded6dbdfeba182233fff86eb98513e09fcde93 (patch) | |
tree | 5f05f605ac955b91b00d47d6c022c01fc0327d3b /doc/source | |
parent | d065751cdab7877feeffbd0f798819ef77a2aa66 (diff) |
Keep track of the namespace abbreviations found while parsing the document. This makes select() work most of the time without requiring a value for 'namespaces'.
Diffstat (limited to 'doc/source')
-rw-r--r-- | doc/source/index.rst | 23 |
1 files changed, 15 insertions, 8 deletions
diff --git a/doc/source/index.rst b/doc/source/index.rst index 2977029..9bf9cf1 100644 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -1781,8 +1781,7 @@ first tag that matches a selector:: # <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a> If you've parsed XML that defines namespaces, you can use them in CSS -selectors. You just have to pass a dictionary of the namespace -mappings into ``select()``:: +selectors.:: from bs4 import BeautifulSoup xml = """<tag xmlns:ns1="http://namespace1/" xmlns:ns2="http://namespace2/"> @@ -1794,15 +1793,23 @@ mappings into ``select()``:: soup.select("child") # [<ns1:child>I'm in namespace 1</ns1:child>, <ns2:child>I'm in namespace 2</ns2:child>] - namespaces = dict(ns1="http://namespace1/", ns2="http://namespace2/") soup.select("ns1|child", namespaces=namespaces) # [<ns1:child>I'm in namespace 1</ns1:child>] -All of this is a convenience for people who know the CSS selector -syntax. You can do all this stuff with the Beautiful Soup API. And if -CSS selectors are all you need, you should parse the document -with lxml: it's a lot faster. But this lets you `combine` CSS -selectors with the Beautiful Soup API. +When handling a CSS selector that uses namespaces, Beautiful Soup +uses the namespace abbreviations it found when parsing the +document. You can override this by passing in your own dictionary of +abbreviations:: + + namespaces = dict(first="http://namespace1/", second="http://namespace2/") + soup.select("second|child", namespaces=namespaces) + # [<ns1:child>I'm in namespace 2</ns1:child>] + +All this CSS selector stuff is a convenience for people who already +know the CSS selector syntax. You can do all of this with the +Beautiful Soup API. And if CSS selectors are all you need, you should +parse the document with lxml: it's a lot faster. But this lets you +`combine` CSS selectors with the Beautiful Soup API. Modifying the tree ================== |