summaryrefslogtreecommitdiff
path: root/doc/source
diff options
context:
space:
mode:
authorLeonard Richardson <leonardr@segfault.org>2019-07-21 14:58:16 -0400
committerLeonard Richardson <leonardr@segfault.org>2019-07-21 14:58:16 -0400
commitb2294f4f05d9e8583613560986f8aa64b18866b9 (patch)
tree5af13a59eca15ea082cb46ea286bc9c5b91996da /doc/source
parent819fa4255063d6b8d16f62469afa6c6e504f284a (diff)
Adapt Chris Mayo's code to track line number and position when using html.parser.
Diffstat (limited to 'doc/source')
-rw-r--r--doc/source/index.rst14
1 files changed, 14 insertions, 0 deletions
diff --git a/doc/source/index.rst b/doc/source/index.rst
index 0c94d6a..69976fe 100644
--- a/doc/source/index.rst
+++ b/doc/source/index.rst
@@ -2495,6 +2495,20 @@ machines, you should specify a parser in the ``BeautifulSoup``
constructor. That will reduce the chances that your users parse a
document differently from the way you parse it.
+Line numbers
+------------
+
+The html.parser parser will keep track of where in the original
+document it found each Tag. You can access this information as
+``Tag.lineno`` (line number) and ``Tag.offset`` (position of the start
+tag within a line)::
+
+ soup = BeautifulSoup("<p>Paragraph 1</p>\n <p>Paragraph 2</p>", 'html.parser')
+ for tag in soup.find_all('p'):
+ print(tag.lineno, tag.offset, tag.string)
+ # (1, 0, u'Paragraph 1')
+ # (2, 3, u'Paragraph 2')
+
Encodings
=========