Read contents of table in Chrome browser

Hi,

How can I retrieve the contents of a table on a web page displayed in the Chrome browser please?

I don’t want to read all the text using OCR as this is cumbersome for an entire table and it can be unreliable at times also.

I also don’t want to resort to the “Get all text on web page” action because this does not get the contents in a format that is easily relatable to the structure of the table (e.g. it comes back with every table cell in a new line).

I can use “Find elements by tag in browser” to get an element returned (in this particular case there is only one table on the page). But from there I can’t figure out how to make use of that to extract what I need.

The value returned from the “Find elements by tag in browser” action is a list something like this:

[<selenium.webdriver.remote.webelement.WebElement (session=“f957e8979303469d6e3911430a48ab9d”, element=“6FCEB2BAF7941B384467003E96BAEAFA_element_8736”)>]

I don’t mind getting the HTML of the table - I can strip what I need out of that once I have the text of that HTML.

I tried to use something along the lines of

chrome.find_element_by_id(element_id)
web_element.get_attribute(‘outerHTML’)

but couldn’t figure out how to make that work either. The chrome object doesn’t recognise the find_element_by_id function.

Then stepping back a bit I tried using the “Find element by ID in browser” action just to see if I could use the info originally returned from “Find elements by tag in browser” action to identify the same element, but doesn’t seem to matter how you slice and dice the info returned it doesn’t seem to include the ID value required.

Phew! So, questions:

  1. If you use the actions such as “Find elements by tag in browser”, then what can you actually use the output of that action for, and how?

2) How can I extract the HTML of a specific table from a web page?

Thanks very much.

1 Like

Okay, I must have been too tired when I was fumbling with this previously :roll_eyes:

Using the .getAttribute(‘outerHTML’) function (in a Python Code stage) against the first item in the list returned from the “Find elements by tag in browser” action yields the HTML of the table that I was after. Not sure how I missed that previously.

Not only that, knowing that there is only one table on the screen, then of course using the “Find first element by tag in browser” action gets the web element item directly without having to extract it from a list :man_facepalming:

Finally, I also got this working using the “Find first element for an XPath in browser” action, which now seems blindingly obvious as well.

Of course now I need to parse the HTML structure to get extract the table contents into a useable form, but ChatGPT is becoming a close friend :smile:

Posting this reply here so the question can be closed (I don’t have permission to delete the topic), and just in case someone else stumbles across it and finds this info useful.

1 Like