Adapt Chris Mayo's code to track line number and position when using html.parser.

author: Leonard Richardson <leonardr@segfault.org> 2019-07-21 14:58:16 -0400
committer: Leonard Richardson <leonardr@segfault.org> 2019-07-21 14:58:16 -0400
commit: b2294f4f05d9e8583613560986f8aa64b18866b9 (patch)
tree: 5af13a59eca15ea082cb46ea286bc9c5b91996da /doc/source
parent: 819fa4255063d6b8d16f62469afa6c6e504f284a (diff)
1 files changed, 14 insertions, 0 deletions
diff --git a/doc/source/index.rst b/doc/source/index.rst
index 0c94d6a..69976fe 100644
--- a/doc/source/index.rst
+++ b/doc/source/index.rst
@@ -2495,6 +2495,20 @@ machines, you should specify a parser in the ``BeautifulSoup``
 constructor. That will reduce the chances that your users parse a
 document differently from the way you parse it.
 
+Line numbers
+------------
+
+The html.parser parser will keep track of where in the original
+document it found each Tag. You can access this information as
+``Tag.lineno`` (line number) and ``Tag.offset`` (position of the start
+tag within a line)::
+
+   soup = BeautifulSoup("<p>Paragraph 1</p>\n    <p>Paragraph 2</p>", 'html.parser')
+   for tag in soup.find_all('p'):
+       print(tag.lineno, tag.offset, tag.string)
+   # (1, 0, u'Paragraph 1')
+   # (2, 3, u'Paragraph 2')
+       
 Encodings
 =========
author	Leonard Richardson <leonardr@segfault.org>	2019-07-21 14:58:16 -0400
committer	Leonard Richardson <leonardr@segfault.org>	2019-07-21 14:58:16 -0400
commit	b2294f4f05d9e8583613560986f8aa64b18866b9 (patch)
tree	5af13a59eca15ea082cb46ea286bc9c5b91996da /doc/source
parent	819fa4255063d6b8d16f62469afa6c6e504f284a (diff)