From b2294f4f05d9e8583613560986f8aa64b18866b9 Mon Sep 17 00:00:00 2001 From: Leonard Richardson Date: Sun, 21 Jul 2019 14:58:16 -0400 Subject: Adapt Chris Mayo's code to track line number and position when using html.parser. --- doc/source/index.rst | 14 ++++++++++++++ 1 file changed, 14 insertions(+) (limited to 'doc/source') diff --git a/doc/source/index.rst b/doc/source/index.rst index 0c94d6a..69976fe 100644 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -2495,6 +2495,20 @@ machines, you should specify a parser in the ``BeautifulSoup`` constructor. That will reduce the chances that your users parse a document differently from the way you parse it. +Line numbers +------------ + +The html.parser parser will keep track of where in the original +document it found each Tag. You can access this information as +``Tag.lineno`` (line number) and ``Tag.offset`` (position of the start +tag within a line):: + + soup = BeautifulSoup("

Paragraph 1

\n

Paragraph 2

", 'html.parser') + for tag in soup.find_all('p'): + print(tag.lineno, tag.offset, tag.string) + # (1, 0, u'Paragraph 1') + # (2, 3, u'Paragraph 2') + Encodings ========= -- cgit v1.2.3