I want to write xpath to check node contain '#'
<node1>
<node11>Some text</node11>
<node11>#2o11 PickMe</node12>
</node1>
I want to write xpath like "//node11[contains(,'#\d+')]". Whats correct way to check #
The correct XPath expression is:
//node11[contains(., '#')]
In your XML, the closing tag of the second subnote should be </node11> instead of </node12>.
If you are using xpath 2.0 you should be able to use something like:
"//node11[matches(.,'#\d+')]"
However, if you aren't using 2.0 you won't have regex support directly. If you are using 1.0 then you won't be able to match using \d+. But this will work:
"//node11[contains(.,'#')]"
Or even:
"//node11[starts-with(.,'#')]"
Use:
/*/node11[contains(., '#')]
Note: It is recommended to avoid using the // pseudo-operator because this most often leads to very slow evaluation of the XPath expression.
Related
I'm parsing an XML file with Nokogiri.
Currently, I'm using the following to get the value I need (the document includes multiple Phase nodes):
xml.xpath("//Phase[#text=' = STER P=P(T) ']")
But now, the uploaded XML file can have a text attribute with a different value. Thus, I'm trying to update my code using a regular expression since the value always contains STER.
After looking at a few questions on SO, I tried
xml.xpath("//Phase[#text~=/STER/]")
However, when I run it, I get
ERROR: Invalid predicate: //Phase[#text~=/STER/] (Nokogiri::XML::XPath::SyntaxError)
What am I missing here?
Alternatively, is there an XPATH function similar to starts-with` that looks for the substring within the entire value and not just at the beginning of it?
There are two problems with your code: first off, there is no =~ operator in XPath. The way to test whether text matches a regex is using the matches function:
//Phase[matches(#text, 'STER')]
Secondly, regex matching is a feature of XPath 2.0, but Nokogiri implements XPath 1.0.
Luckily, you are not actually using any regex features, you are simply checking for a fixed string, which can be done with XPath 1.0 using the contains function:
//Phase[contains(#text, 'STER')]
The text is as:
text1text2
How can I specify this text in xpath. I tried:
.//*[#id='someid']//h6[text() ='text1text2]
.//*[#id='someid']//h6[text() ='text1\ntext2]
.//*[#id='someid']//h6[text() ='text1 text2]
None of them worked
Use .//*[#id='someid']//h6[. = 'text1
text2']. This assumes you are writing the path inside of XSLT or XForms where you can use
to escape a new line character. If you are not using XSLT you might want to tell us in which host language (e.g. PHP, C#, Java) you use XPath.
not very elegant but it works
.//*[#id='someid']//h6[contains(text(), 'text1') and contains(text(), 'text2')]
You can use normalize-space() to remove the line feed and compare text without this issue.
//*[#id='someid']//h6[normalize-space(text()) ='text1 text2']
This is the working code
.//*[#id='someid']//h6[. = 'text1text2']
Thank you.
My problem using XPath is whenever i use the "substring" function I get only one match and I want to get them all.
another problem is whenever I use the combination of "substring" and operator | it just won't work (no matches).
For example: http://www.tripadvisor.com/Hotel_Review-g52024-d653910-Reviews-Ace_Hotel_Portland-Portland_Oregon.html
on this webpage I used the query
//SPAN[#class='ratingDate relativeDate']/#title | //*[#class='ratingDate']/text()
I got 10 matches but some of them start with "Reviewed ". so I added "substring-after"
and didn't get any matches
the original syntax:
//SPAN[#class='ratingDate relativeDate']/#title | substring-after(//*[#class='ratingDate']/text(), 'Reviewed ')
With pure XPath 1.0 you can't solve that, if you use XPath 2.0 or XQuery 1.0 you can put the substring-after call into the last step of the path e.g. //*[#class='ratingDate']/substring-after(., 'REVIEWED').
If you only have XPath 1.0 then you first need to select the elements with XPath and then iterate over the result in your host language to extract the substring for each element; how you do that depends on the host language and the XPath API.
I am new to using XPath and I am trying to retrieve a node via its attribute but the problem is that the attribute is case insensitive meaning I won't exactly know how the string is cased in the document.
So for example:
Given the document:
<Document xmlns:my="http://www.MyDomain.com/MySchemaInstance">
<Machines>
<Machine FQDN="machine1.mydomain.com">
<...>
</Machine>
<Machine FQDN="Machine2.MyDomain.Com">
<...>
</Machine>
</Machines>
</Document>
If I want to retrieve the machine1 I would use the XPath:
//my:Machines/my:Machine/*[#FQDN='machine1.mydomain.com']
But a similar XPath to get machine2 would fail becuase the case does not match:
//my:Machines/my:Machine/*[#FQDN='machine2.mydomain.com'] //Fails
I have seen various posts mention using something like (I am not sure how to apply Namespaces to this):
translate(#FQDN, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz')
But even if I got it to work it would be really cumbersome considering the number of times I would be using it.
Finally I have read that XPath 2.0 supports matches() and lower-case() but being new to XPath I don't understand how to apply them:
For example if I try the following I get an "Invalid Qualified name":
//my:Machines/my:Machine/[matches(#FQDN, '(?i)machine1.mydomain.com')]
//my:Machines/my:Machine/[lower-case(#FQDN, 'machine1.mydomain.com')]
Can someone provide a sample XPath that includes handling of Namespaces that would work?
Thanks
Your example XML and XPath statements don't match.
The sample XML elements are not bound to a namespace. The "my" namespace-prefix is declared, but not used for those elements, so they are in the "no namespace".
Your sample XPath is using predicate filters on the children of Machine rather than on the Machine element that has the #FQDN.
You could use either of these methods to look for the value case-insensitive:
matches() function, with a flag for case-insensitive matching:
//Machines/Machine[matches(#FQDN,'machine2.mydomain.com','i')]
upper-case() function to evaluate the upper-case strings:
//Machines/Machine[upper-case(#FQDN)=upper-case('machine2.mydomain.com')]
lower-case() function to evaluate the lower-case strings:
//Machines/Machine[lower-case(#FQDN)=lower-case('machine2.mydomain.com')]
Can someone provide a sample XPath that includes handling of
Namespaces that would work?
Not sure what you meant by the handling of namespaces, but if you wanted to match on those elements regardless of their namespace then you can use the wildcard operator for the namespace:
//*:Machines/*:Machine[matches(#FQDN,'machine2.mydomain.com','i')]
I want to extract "Date: 2009-09-25, 1:54PM EDT" from this webpage
http://auburn.craigslist.org/sha/1392067187.html
But I don't understand how to write Xpath expressions for that.
Can anyone help me in that.
I am getting other fields also from this page.
Why don't you just run a regexp like the one below?
'Date:\s+([0-9]{4}-[0-9]{2}-[0-9]{2}.+?\<)'
It seams to be the easiest way. And if you don't want to use pure text you can use XPath 2.0 which has support for regexps (fn:matches).
Are you running the HTML through TIDY or some other process to turn it into XHTML? Or how are you able to execute XPATH against that HTML?
If the document was well-formed, then you could probably use the following XPATH:
/html/body/hr[1]/following-sibling::text()[1]
It finds the first HR element in the document, then selects the first text() node following it(which contains the string "Date: 2009-09-25, 1:54PM EDT"