XPath syntax for extract last word in the string - xpath

I'm trying to extract some words from the html document using XPath.
Syntax '//div[#class="adsmanager_ads_price"]/text()[3]' gives me string Name: Tim. How can I modify syntax to receive just Tim? But I need words after Name: because name can be more then one word.

You can use substring-after() for that:
substring-after(//div[#class="adsmanager_ads_price"]/text()[3], ':')
Will get all information after the :
substring-after(//div[#class="adsmanager_ads_price"]/text()[3], 'Name:')
Will get everything after Name:
NOTE:
substring-after() can only be used in XPath 2.0. If you use XPath 1.0 you should do the substring in your code...

Related

Using a regex to get a Nokogiri node

I'm parsing an XML file with Nokogiri.
Currently, I'm using the following to get the value I need (the document includes multiple Phase nodes):
xml.xpath("//Phase[#text=' = STER P=P(T) ']")
But now, the uploaded XML file can have a text attribute with a different value. Thus, I'm trying to update my code using a regular expression since the value always contains STER.
After looking at a few questions on SO, I tried
xml.xpath("//Phase[#text~=/STER/]")
However, when I run it, I get
ERROR: Invalid predicate: //Phase[#text~=/STER/] (Nokogiri::XML::XPath::SyntaxError)
What am I missing here?
Alternatively, is there an XPATH function similar to starts-with` that looks for the substring within the entire value and not just at the beginning of it?
There are two problems with your code: first off, there is no =~ operator in XPath. The way to test whether text matches a regex is using the matches function:
//Phase[matches(#text, 'STER')]
Secondly, regex matching is a feature of XPath 2.0, but Nokogiri implements XPath 1.0.
Luckily, you are not actually using any regex features, you are simply checking for a fixed string, which can be done with XPath 1.0 using the contains function:
//Phase[contains(#text, 'STER')]

XDMP-REGEX: (err:FORX0002) - String transformation with Regular expressions

I am working on xquery requirement to identify the xml tag name() from the XML document using the regex. Later , will do the transformation on data.It searches the entire document and If i found match, am doing string :replace using xquery/xpath.
Please find some sample code which am looking for.
let $full-doc := fn:doc($uri)
if(fn:matches($full-doc,"<Hyperlink\b[^\>]*?>([A-Z][a-z]{2} [0-3]?[0-9]
[12][890][0-9]{2})</Hyperlink>"))
then $full-doc
else "regex is not working"
I am getting the following Error.
regex-match :
[1.0-ml] XDMP-REGEX: (err:FORX0002) fn:matches(fn:doc("44215.xml"), "
<Hyperlink\b[^\>]*?>([A-Z][a-z]{2} [0-3]?[0-9] [12][890][0-9]{2}...") -
- Invalid regular expression
Could some one please explain why my regex is not working ?
Looking at your requirement:
I am working on xquery requirement to identify the xml tag name() from the XML document using the regex.
You are going about this entirely the wrong way. XQuery doesn't see the lexical XML, it sees a tree of nodes. To find the name of an element, use an XPath expression to find the element, then use the name() function to get its name.
If you want to find an element whose name matches a regex, use //*[matches(name(), $regex)]
The word boundary code \b is not supported in XQuery (see https://www.w3.org/TR/xpath-functions-31/#regex-syntax).
But I guess you are looking for Hyperlink elements, not for a <Hyperlink> substring, so you should use a path expression:
let $doc := fn:doc($uri)
where $doc//Hyperlink[matches(., '([A-Z][a-z]{2} [0-3]?[0-9] [12][890][0-9]{2})')]
return $doc

Xpath to strip text using substring-after

I have the following which is the second span in html with the class of 'ProductListOurRef':
<span class="ProductListOurRef">Product Code: 60076</span>
Ive tried the following Xpath:
(//span[#class="ProductListOurRef"])[2]
But that returns 'Product Code: 60076'. But I need to use Xpath to strip the 'Product Code: ' to just give me the result of '60076'.
I believe 'substring-after' should do it but i dont know how to write it
If you are using XPath 1.0, then the result of an XPath expression must be either a node-set, a single string, a single number, or a single boolean.
As shown in comments on the question, you can write a query using substring-after(), whose result is a string.
However, some applications expect the result of an XPath expression always to be a node-set, and it looks as if you are stuck with such an application. Because you can't construct new nodes in XPath (you can only select nodes that are already present in the input), there is no way around this.

Remove or replace some text from XPath string

Is it possible to remove or replace text on XPath string?
Using XPath I get url with http://www and I want to remove http://www, so the same XPath query would return me only a link without http://www. I can't find anything about removing or replacing Xpath string.
Is it possible?
If so, how to do this?
Have you tried substring-after?
substring-after('http://www.stackoverflow.com', 'http://www.')
Example:
<demo>http://www.stackoverflow.com</demo>
XPath:
//demo/substring-after(., 'http://www.')
Yields:
stackoverflow.com
Check online demo here.

XPath: Using substring-after returns only one match

My problem using XPath is whenever i use the "substring" function I get only one match and I want to get them all.
another problem is whenever I use the combination of "substring" and operator | it just won't work (no matches).
For example: http://www.tripadvisor.com/Hotel_Review-g52024-d653910-Reviews-Ace_Hotel_Portland-Portland_Oregon.html
on this webpage I used the query
//SPAN[#class='ratingDate relativeDate']/#title | //*[#class='ratingDate']/text()
I got 10 matches but some of them start with "Reviewed ". so I added "substring-after"
and didn't get any matches
the original syntax:
//SPAN[#class='ratingDate relativeDate']/#title | substring-after(//*[#class='ratingDate']/text(), 'Reviewed ')
With pure XPath 1.0 you can't solve that, if you use XPath 2.0 or XQuery 1.0 you can put the substring-after call into the last step of the path e.g. //*[#class='ratingDate']/substring-after(., 'REVIEWED').
If you only have XPath 1.0 then you first need to select the elements with XPath and then iterate over the result in your host language to extract the substring for each element; how you do that depends on the host language and the XPath API.

Resources