Use XPath to parse element name containing a colon - xpath

So I'm trying to parse some of the WootAPI with XPath. A problem that I'm running into is that a couple of the elements have colons in their name, such as woot:price or woot:condition. Now, trying to use the XPath //rss/channel/item/woot:price won't grab the contents of the element woot:price, because of the colon, I think. What can I do to get it anyway?

The colons are because the elements have a namespace prefix and are bound to the Woot namespace.
You should read up on XML namespaces and how they affect XPATH and XSLT.
If you want to reference the Woot elements in your XPATH you will either need to:
Declare the Woot namespace http://www.woot.com/ so that when you use that namespace prefix in your XPATH it will be understood.
Use a more generic XPATH statement that uses predicate filters that use local-name() and namespace-uri() to match the element.
//rss/channel/item/*[local-name()='price' and namespace-uri()='http://www.woot.com/']

Related

XPath to be precised into one in order to extract text from a web page?

I have a few Xpaths as below:
//*[#id="904735f0-bb82-11ea-a473-6d0f51688222"]/div/p
//*[#id="729c0860-a71d-11ea-b994-53a3e91a35c2"]/div/div/div[1]/div/p
//*[#id="2555ab30-bb84-11ea-9e8b-277e7f6208b2"]/div/div/div[1]/div/p
//*[#id="7e100250-a71d-11ea-b994-53a3e91a35c2"]/div/div/div[1]/div/p
//*[#id="811727d0-a71d-11ea-b994-53a3e91a35c2"]/div/div/div[1]/div/p
All of the above are used to extract text from a single web page since text is located at different view--ports, but I wish to find a single xpath to extract text for all of them. Is it possible to use 'and' and multiple ID's to extract all of it through one xpath?
Any other suggestions would be appreciate.
You can use the or operator for the last four.
And the merge-nodes operator | to add the first one.
So to select all 5 expression in one, use the following expression:
//*[#id="904735f0-bb82-11ea-a473-6d0f51688222"]/div/p | //*[#id="729c0860-a71d-11ea-b994-53a3e91a35c2" or #id="2555ab30-bb84-11ea-9e8b-277e7f6208b2" or #id="7e100250-a71d-11ea-b994-53a3e91a35c2" or #id="811727d0-a71d-11ea-b994-53a3e91a35c2"]/div/div/div[1]/div/p
A shorter and more generic solution could be :
(//div/div/div[1]/div/p|//div/p)[parent::*[string-length(#id)=36 and substring(#id,24,1)="-"]]
First part with () is used to specify the end of the path. Since #id attributes have the same length, we use it inside the predicate. We also verify the presence of a - at a specific position with substring.

Using a regex to get a Nokogiri node

I'm parsing an XML file with Nokogiri.
Currently, I'm using the following to get the value I need (the document includes multiple Phase nodes):
xml.xpath("//Phase[#text=' = STER P=P(T) ']")
But now, the uploaded XML file can have a text attribute with a different value. Thus, I'm trying to update my code using a regular expression since the value always contains STER.
After looking at a few questions on SO, I tried
xml.xpath("//Phase[#text~=/STER/]")
However, when I run it, I get
ERROR: Invalid predicate: //Phase[#text~=/STER/] (Nokogiri::XML::XPath::SyntaxError)
What am I missing here?
Alternatively, is there an XPATH function similar to starts-with` that looks for the substring within the entire value and not just at the beginning of it?
There are two problems with your code: first off, there is no =~ operator in XPath. The way to test whether text matches a regex is using the matches function:
//Phase[matches(#text, 'STER')]
Secondly, regex matching is a feature of XPath 2.0, but Nokogiri implements XPath 1.0.
Luckily, you are not actually using any regex features, you are simply checking for a fixed string, which can be done with XPath 1.0 using the contains function:
//Phase[contains(#text, 'STER')]

Getting element's name in XPATH

If I selected an element using XPATH how can I get its name?
I mean something like text() function in //element/[#id=elid]/text().
Use name(). (Find docs for newer versions of the XPath language here.)
Here are modified versions of your example:
Works in XPath 2.0+ only:
//element/*[#id='elid']/name()
Works in XPath 1.0 and 2.0+*:
name(//element/*[#id='elid'])
*If using 2.0+, the expression //element/*[#id='elid'] must only return one element. Otherwise you'll get an error like A sequence of more than one item is not allowed as the first argument of fn:name()
You could also use local-name() which returns the local part of the expanded name (without any namespace prefix).
The tag names tree can also be obtained with
echo "du //Element/*" | xmllint --shell response-02.xml
Ele1
id
name
Nested1
id
name
Ele2

XPath 2.0: Retrieving nodes by attribute where value is case Insensitive

I am new to using XPath and I am trying to retrieve a node via its attribute but the problem is that the attribute is case insensitive meaning I won't exactly know how the string is cased in the document.
So for example:
Given the document:
<Document xmlns:my="http://www.MyDomain.com/MySchemaInstance">
<Machines>
<Machine FQDN="machine1.mydomain.com">
<...>
</Machine>
<Machine FQDN="Machine2.MyDomain.Com">
<...>
</Machine>
</Machines>
</Document>
If I want to retrieve the machine1 I would use the XPath:
//my:Machines/my:Machine/*[#FQDN='machine1.mydomain.com']
But a similar XPath to get machine2 would fail becuase the case does not match:
//my:Machines/my:Machine/*[#FQDN='machine2.mydomain.com'] //Fails
I have seen various posts mention using something like (I am not sure how to apply Namespaces to this):
translate(#FQDN, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz')
But even if I got it to work it would be really cumbersome considering the number of times I would be using it.
Finally I have read that XPath 2.0 supports matches() and lower-case() but being new to XPath I don't understand how to apply them:
For example if I try the following I get an "Invalid Qualified name":
//my:Machines/my:Machine/[matches(#FQDN, '(?i)machine1.mydomain.com')]
//my:Machines/my:Machine/[lower-case(#FQDN, 'machine1.mydomain.com')]
Can someone provide a sample XPath that includes handling of Namespaces that would work?
Thanks
Your example XML and XPath statements don't match.
The sample XML elements are not bound to a namespace. The "my" namespace-prefix is declared, but not used for those elements, so they are in the "no namespace".
Your sample XPath is using predicate filters on the children of Machine rather than on the Machine element that has the #FQDN.
You could use either of these methods to look for the value case-insensitive:
matches() function, with a flag for case-insensitive matching:
//Machines/Machine[matches(#FQDN,'machine2.mydomain.com','i')]
upper-case() function to evaluate the upper-case strings:
//Machines/Machine[upper-case(#FQDN)=upper-case('machine2.mydomain.com')]
lower-case() function to evaluate the lower-case strings:
//Machines/Machine[lower-case(#FQDN)=lower-case('machine2.mydomain.com')]
Can someone provide a sample XPath that includes handling of
Namespaces that would work?
Not sure what you meant by the handling of namespaces, but if you wanted to match on those elements regardless of their namespace then you can use the wildcard operator for the namespace:
//*:Machines/*:Machine[matches(#FQDN,'machine2.mydomain.com','i')]

xpath to check '#' present

I want to write xpath to check node contain '#'
<node1>
<node11>Some text</node11>
<node11>#2o11 PickMe</node12>
</node1>
I want to write xpath like "//node11[contains(,'#\d+')]". Whats correct way to check #
The correct XPath expression is:
//node11[contains(., '#')]
In your XML, the closing tag of the second subnote should be </node11> instead of </node12>.
If you are using xpath 2.0 you should be able to use something like:
"//node11[matches(.,'#\d+')]"
However, if you aren't using 2.0 you won't have regex support directly. If you are using 1.0 then you won't be able to match using \d+. But this will work:
"//node11[contains(.,'#')]"
Or even:
"//node11[starts-with(.,'#')]"
Use:
/*/node11[contains(., '#')]
Note: It is recommended to avoid using the // pseudo-operator because this most often leads to very slow evaluation of the XPath expression.

Resources