HTML parsing

fractals · Post by **fractals** » 10 Jan 2017 15:49

I'm trying to find a way to read a webpage and display selected information from it using string variables. My attempts to parse the result of a HTTP GET Request using:

Code: Select all

evaluateXPathAsString

failed. Later, I've come across a post on this forum where Martin said that this function only works with XML proper and that it should not be used with HTML.

I know that some people on this forum were able to successfully retrieve information from a website but I did not find any specific information or suggestions how this could be done (my apologies if I've missed it). I'd rather avoid using substring operations for this because the HTML source I need to parse is dynamic and I think this approach could yield inconsistent results.

Would anyone have any suggestions how this could be done? Thanks in advance.

mbirth · Post by **mbirth** » 11 Jan 2017 12:08

I'm doing this in a flow with said evaluateXPathAsString(). However, the page I'm parsing is XHTML, so not only HTML, but valid XML, too. The queries I'm using are:

Code: Select all

DP_percent=evaluateXPathAsString(response, '//*[@class="progressBar"]/div/@style');
...
DP_text=trim(evaluateXPathAsString(response, 'string(//*[contains(@class, "barTextBelow")])'));
...
DP_info=evaluateXPathAsString(response, '(//body//p)[1]/text()[2]');
DP_expiry=evaluateXPathAsString(response, '//*[contains(@class, "expiryTime")[2]/text()');

And they all work as expected.

Automagic Forum

HTML parsing

HTML parsing

Re: HTML parsing