Is it possible to remove or replace text on XPath string?
Using XPath I get url with http://www and I want to remove http://www, so the same XPath query would return me only a link without http://www. I can't find anything about removing or replacing Xpath string.
Is it possible?
If so, how to do this?
Have you tried substring-after?
substring-after('http://www.stackoverflow.com', 'http://www.')
Example:
<demo>http://www.stackoverflow.com</demo>
XPath:
//demo/substring-after(., 'http://www.')
Yields:
stackoverflow.com
Check online demo here.
Related
I have the following which is the second span in html with the class of 'ProductListOurRef':
<span class="ProductListOurRef">Product Code: 60076</span>
Ive tried the following Xpath:
(//span[#class="ProductListOurRef"])[2]
But that returns 'Product Code: 60076'. But I need to use Xpath to strip the 'Product Code: ' to just give me the result of '60076'.
I believe 'substring-after' should do it but i dont know how to write it
If you are using XPath 1.0, then the result of an XPath expression must be either a node-set, a single string, a single number, or a single boolean.
As shown in comments on the question, you can write a query using substring-after(), whose result is a string.
However, some applications expect the result of an XPath expression always to be a node-set, and it looks as if you are stuck with such an application. Because you can't construct new nodes in XPath (you can only select nodes that are already present in the input), there is no way around this.
I am trying to get the following information extracted out of a link using XPath, for example:
I have LINK TEXT HERE
I would like to select the href value of the link but only anything following ag_num=
So I would end up with 470 for the link above. Any ideas are truly appreciated, thank you!!
You can use below XPath expression to get required value:
substring-after(//a/#href, "ag_num=")
how do I get "Div/yield" value from here? i've tried //td[node()='Div/yield' and //td[text()='Div/yield'.
and //td[#data-snapfield='latest_dividend-dividend_yield']/following-sibling::td
#sideshowbarker is correct in that there's a newline at the end so looking for an element with the exact text would return 0 results. Another way to do this (one is through #sideshowbarker's answer) is to look for an element that contains this text. So the first step is:
//td[contains(text(),'Div/yield')]
But you don't need this. Your last answer is on the right track. You've identified the element that you're after, but I think you're looking for the text. So you need to add text() at the end:
//td[#data-snapfield='latest_dividend-dividend_yield']/following-sibling::td/text()
But if you want to use the field name, so you could use the xpath for the other fields as well, then just combine these:
//td[contains(text(),'Field name')]/following-sibling::td/text()
Now just replace Field name with the field you're after..
e.g. 'Div/yield': //td[contains(text(),'Div/yield')]/following-sibling::td/text()
I'm trying to extract some words from the html document using XPath.
Syntax '//div[#class="adsmanager_ads_price"]/text()[3]' gives me string Name: Tim. How can I modify syntax to receive just Tim? But I need words after Name: because name can be more then one word.
You can use substring-after() for that:
substring-after(//div[#class="adsmanager_ads_price"]/text()[3], ':')
Will get all information after the :
substring-after(//div[#class="adsmanager_ads_price"]/text()[3], 'Name:')
Will get everything after Name:
NOTE:
substring-after() can only be used in XPath 2.0. If you use XPath 1.0 you should do the substring in your code...
I want to extract "Date: 2009-09-25, 1:54PM EDT" from this webpage
http://auburn.craigslist.org/sha/1392067187.html
But I don't understand how to write Xpath expressions for that.
Can anyone help me in that.
I am getting other fields also from this page.
Why don't you just run a regexp like the one below?
'Date:\s+([0-9]{4}-[0-9]{2}-[0-9]{2}.+?\<)'
It seams to be the easiest way. And if you don't want to use pure text you can use XPath 2.0 which has support for regexps (fn:matches).
Are you running the HTML through TIDY or some other process to turn it into XHTML? Or how are you able to execute XPATH against that HTML?
If the document was well-formed, then you could probably use the following XPATH:
/html/body/hr[1]/following-sibling::text()[1]
It finds the first HR element in the document, then selects the first text() node following it(which contains the string "Date: 2009-09-25, 1:54PM EDT"