Getting element's name in XPATH - xpath

If I selected an element using XPATH how can I get its name?
I mean something like text() function in //element/[#id=elid]/text().

Use name(). (Find docs for newer versions of the XPath language here.)
Here are modified versions of your example:
Works in XPath 2.0+ only:
//element/*[#id='elid']/name()
Works in XPath 1.0 and 2.0+*:
name(//element/*[#id='elid'])
*If using 2.0+, the expression //element/*[#id='elid'] must only return one element. Otherwise you'll get an error like A sequence of more than one item is not allowed as the first argument of fn:name()
You could also use local-name() which returns the local part of the expanded name (without any namespace prefix).

The tag names tree can also be obtained with
echo "du //Element/*" | xmllint --shell response-02.xml
Ele1
id
name
Nested1
id
name
Ele2

Related

Using a regex to get a Nokogiri node

I'm parsing an XML file with Nokogiri.
Currently, I'm using the following to get the value I need (the document includes multiple Phase nodes):
xml.xpath("//Phase[#text=' = STER P=P(T) ']")
But now, the uploaded XML file can have a text attribute with a different value. Thus, I'm trying to update my code using a regular expression since the value always contains STER.
After looking at a few questions on SO, I tried
xml.xpath("//Phase[#text~=/STER/]")
However, when I run it, I get
ERROR: Invalid predicate: //Phase[#text~=/STER/] (Nokogiri::XML::XPath::SyntaxError)
What am I missing here?
Alternatively, is there an XPATH function similar to starts-with` that looks for the substring within the entire value and not just at the beginning of it?
There are two problems with your code: first off, there is no =~ operator in XPath. The way to test whether text matches a regex is using the matches function:
//Phase[matches(#text, 'STER')]
Secondly, regex matching is a feature of XPath 2.0, but Nokogiri implements XPath 1.0.
Luckily, you are not actually using any regex features, you are simply checking for a fixed string, which can be done with XPath 1.0 using the contains function:
//Phase[contains(#text, 'STER')]

XPath: Using substring-after returns only one match

My problem using XPath is whenever i use the "substring" function I get only one match and I want to get them all.
another problem is whenever I use the combination of "substring" and operator | it just won't work (no matches).
For example: http://www.tripadvisor.com/Hotel_Review-g52024-d653910-Reviews-Ace_Hotel_Portland-Portland_Oregon.html
on this webpage I used the query
//SPAN[#class='ratingDate relativeDate']/#title | //*[#class='ratingDate']/text()
I got 10 matches but some of them start with "Reviewed ". so I added "substring-after"
and didn't get any matches
the original syntax:
//SPAN[#class='ratingDate relativeDate']/#title | substring-after(//*[#class='ratingDate']/text(), 'Reviewed ')
With pure XPath 1.0 you can't solve that, if you use XPath 2.0 or XQuery 1.0 you can put the substring-after call into the last step of the path e.g. //*[#class='ratingDate']/substring-after(., 'REVIEWED').
If you only have XPath 1.0 then you first need to select the elements with XPath and then iterate over the result in your host language to extract the substring for each element; how you do that depends on the host language and the XPath API.

XPath 2.0: Retrieving nodes by attribute where value is case Insensitive

I am new to using XPath and I am trying to retrieve a node via its attribute but the problem is that the attribute is case insensitive meaning I won't exactly know how the string is cased in the document.
So for example:
Given the document:
<Document xmlns:my="http://www.MyDomain.com/MySchemaInstance">
<Machines>
<Machine FQDN="machine1.mydomain.com">
<...>
</Machine>
<Machine FQDN="Machine2.MyDomain.Com">
<...>
</Machine>
</Machines>
</Document>
If I want to retrieve the machine1 I would use the XPath:
//my:Machines/my:Machine/*[#FQDN='machine1.mydomain.com']
But a similar XPath to get machine2 would fail becuase the case does not match:
//my:Machines/my:Machine/*[#FQDN='machine2.mydomain.com'] //Fails
I have seen various posts mention using something like (I am not sure how to apply Namespaces to this):
translate(#FQDN, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz')
But even if I got it to work it would be really cumbersome considering the number of times I would be using it.
Finally I have read that XPath 2.0 supports matches() and lower-case() but being new to XPath I don't understand how to apply them:
For example if I try the following I get an "Invalid Qualified name":
//my:Machines/my:Machine/[matches(#FQDN, '(?i)machine1.mydomain.com')]
//my:Machines/my:Machine/[lower-case(#FQDN, 'machine1.mydomain.com')]
Can someone provide a sample XPath that includes handling of Namespaces that would work?
Thanks
Your example XML and XPath statements don't match.
The sample XML elements are not bound to a namespace. The "my" namespace-prefix is declared, but not used for those elements, so they are in the "no namespace".
Your sample XPath is using predicate filters on the children of Machine rather than on the Machine element that has the #FQDN.
You could use either of these methods to look for the value case-insensitive:
matches() function, with a flag for case-insensitive matching:
//Machines/Machine[matches(#FQDN,'machine2.mydomain.com','i')]
upper-case() function to evaluate the upper-case strings:
//Machines/Machine[upper-case(#FQDN)=upper-case('machine2.mydomain.com')]
lower-case() function to evaluate the lower-case strings:
//Machines/Machine[lower-case(#FQDN)=lower-case('machine2.mydomain.com')]
Can someone provide a sample XPath that includes handling of
Namespaces that would work?
Not sure what you meant by the handling of namespaces, but if you wanted to match on those elements regardless of their namespace then you can use the wildcard operator for the namespace:
//*:Machines/*:Machine[matches(#FQDN,'machine2.mydomain.com','i')]

Trouble using Xpath "starts with" to parse xhtml

I'm trying to parse a webpage to get posts from a forum.
The start of each message starts with the following format
<div id="post_message_somenumber">
and I only want to get the first one
I tried xpath='//div[starts-with(#id, '"post_message_')]' in yql without success
I'm still learning this, anyone have suggestions
I think I have a solution that does not require dealing with namespaces.
Here is one that selects all matching div's:
//div[#id[starts-with(.,"post_message")]]
But you said you wanted just the "first one" (I assume you mean the first "hit" in the whole page?). Here is a slight modification that selects just the first matching result:
(//div[#id[starts-with(.,"post_message")]])[1]
These use the dot to represent the id's value within the starts-with() function. You may have to escape special characters in your language.
It works great for me in PowerShell:
# Load a sample xml document
$xml = [xml]'<root><div id="post_message_somenumber"/><div id="not_post_message"/><div id="post_message_somenumber2"/></root>'
# Run the xpath selection of all matching div's
$xml.selectnodes('//div[#id[starts-with(.,"post_message")]]')
Result:
id
--
post_message_somenumber
post_message_somenumber2
Or, for just the first match:
# Run the xpath selection of the first matching div
$xml.selectnodes('(//div[#id[starts-with(.,"post_message")]])[1]')
Result:
id
--
post_message_somenumber
I tried xpath='//div[starts-with(#id,
'"post_message_')]' in yql without
success I'm still learning this,
anyone have suggestions
If the problem isn't due to the many nested apostrophes and the unclosed double-quote, then the most likely cause (we can only guess without being shown the XML document) is that a default namespace is used.
Specifying names of elements that are in a default namespace is the most FAQ in XPath. If you search for "XPath default namespace" in SO or on the internet, you'll find many sources with the correct solution.
Generally, a special method must be called that binds a prefix (say "x:") to the default namespace. Then, in the XPath expression every element name "someName" must be replaced by "x:someName.
Here is a good answer how to do this in C#.
Read the documentation of your language/xpath-engine how something similar should be done in your specific environment.
#FindBy(xpath = "//div[starts-with(#id,'expiredUserDetails') and contains(text(), 'Details')]")
private WebElementFacade ListOfExpiredUsersDetails;
This one gives a list of all elements on the page that share an ID of expiredUserDetails and also contains the text or the element Details

Use XPath to parse element name containing a colon

So I'm trying to parse some of the WootAPI with XPath. A problem that I'm running into is that a couple of the elements have colons in their name, such as woot:price or woot:condition. Now, trying to use the XPath //rss/channel/item/woot:price won't grab the contents of the element woot:price, because of the colon, I think. What can I do to get it anyway?
The colons are because the elements have a namespace prefix and are bound to the Woot namespace.
You should read up on XML namespaces and how they affect XPATH and XSLT.
If you want to reference the Woot elements in your XPATH you will either need to:
Declare the Woot namespace http://www.woot.com/ so that when you use that namespace prefix in your XPATH it will be understood.
Use a more generic XPATH statement that uses predicate filters that use local-name() and namespace-uri() to match the element.
//rss/channel/item/*[local-name()='price' and namespace-uri()='http://www.woot.com/']

Resources