Looking for n-th instance of x node in root node - xpath

Suppose I have following xml
<root>
<x>
<y />
<z>
<y />
</z>
<n>
<m>
<y />*
</m>
</n>
</x>
<x>
<y />
<z>
<y />
</z>
<y />*
</x>
</root>
I would like to retrieve those y nodes which are followed with *
So it is always third node in x ancestor node
I tried something like:
//x//y[3]
However it doesn't work I guess it would work only if y nodes are on the same level.
So I tried
(//x//y)[3] but it retrieves only one node (third one) in whole document
So I tried something like:
//x(//y)[3]
//x(//y[3])
//x//(y[3])
etc. but I get parse error
Is there any way to retrieve what I need using xpath?

Use:
//x/descendant::y[3]
This selects every third y descendant of each x in the document. It sometimes helps to write out an expanded expression to see what's really going on. In this case, the following:
//x//y[3]
is equivalent to:
/descendant-or-self::node()/child::x/descendant-or-self::node()/child::y[3]
Written this way it becomes obvious why it doesn't do what you wanted (i.e. it's looking for any y that is the third child of an x element and there isn't one). What you really wanted was every third y descendant. Here it is fully expanded:
/descendant-or-self::node()/child::x/descendant::y[3]
The important lesson here is that it pays to know what the XPath abbreviated syntax is really doing. The spec is actually quite readable. I recommend taking a look.

Update: both of these examples are XPath 2.0 only.
In XPath 1.0:
/row//y/(ancestor::x//y)[3]
In XPath 2.0:
for $x in /row//x
return ($x//y)[3]

Related

Not able get element using xpath following-sibling

I am not able select xpath using following sibling. It says element not found
tried below syntax,
//x/y[contains(text(),"Status"]/following-sibling::/x/y
//x/y[contains(text(),"Status"]/following-sibling::/x/y/text()
//x/y[contains(text(),"Status"]/../x/y
This is what my HTML code looks like,
<X>
<y> Status </y>
</x>
<x>
<y> ACTIVE</y>
</x>
None of above syntax gives ACTIVE as output. It throws element not found error. Can anyone help me to formulate proper syntax to get value.
This expression
(//y)[2]
Outputs:
ACTIVE
Is that what you're looking for?
I think you want this:
//x[contains(y,"Status")]/following-sibling::x/y/text()
Like mentioned in a comment, y doesn't have any siblings; only x does. Select the x that contains a y with "Status" and then select the following x sibling (and child y's text()).

Xpath for an element , all ancestors of which have the same name up to a point

I have an XML that looks like the following:
xml tree
I need those tag elements that have only son elements as their ancestors.The only non-son ancestor allowed is the root element parent.After parent no ancestor of tag can be anything other than son . This xpath therefore would return <tag id="t1" /> and <tag id="t2" />
//son//tag would be one solution. Another would be //tag[ancestor::son] You could use /descendent:: in place of //; there are differences in the order in which results are reported. There are other variants; which one is best depends on the exact context in which you're doing this.
I should have posted this earlier or may be it does not matter.Here is the nasty looking xpath I wrote to solve this:
/parent/(descendant::tag except(descendant::element() except descendant::son)/descendant::tag)
Hope someone would suggest a better looking alternative.

Compare attribute of one element to attribute of another element

This seems like it should be easy, but I can never figure it out.
Presume I have the following document:
<data>
<a>
<b val="1"/>
</a>
<c val="1">
</data>
And assume that I am executing an XPath from the context of <b>. I need to check if there is an element c that has the same value as b.
Obviously, this doesn't work:
../a/c[#val=#val]
How to I get an XPath to remember its "current" context when traversing the tree?
Try the expression below. You'll notice that the current node is not lost since a predicate is used for finding the c node.
.[../../c/#val=#val]

Find attribute names that start with a certain pattern

I am looking to find all attributes of an element that match a certain pattern.
So for an element
<element s2="1" name="aaaa" id="1" />
<element s3="1" name="aaaa" id="2" />
I would like to be able to find all attributes that start with 's' (returning the value of s1 for the first element and s3 for the value of the second element).
If this is outside of xpath's ability please let me know.
Use:
element/#*[starts-with(name(), 's')]
This XPath expression selects all atribute nodes whose name starts with the string 's' and that are attributes of elements named element that are children of the current node.
starts-with() is a standard function in XPath 1.0
element/#*[substring(name(), 1,1) = "s"]
will match any attribute that starts with 's'.
The function starts-with() might look better than using substring()
I've tested the given answers from both #Dimitre-Novatchev and #Ledhund, using lxml.html module in Python.
Both element/#*[starts-with(name(), 's')] and element/#*[substring(name(), 1,1) = "s"] return only the values of s2 and s3. You won't be able to know which value belong to which attribute.
I think in practice I would be more interested in finding the elements themselves that contain the attributes of names starting with specific characters rather than just their values.
To achieve that is very simple, just add /.. at the end,
element/#*[starts-with(name(), "s")]/..
or
element/#*[starts-with(name(), "s")]/parent::*
or
element/#*[starts-with(name(), "s")]/parent::node()
None from above worked for me.
So I did not some changes and it worked for me. :)
/*:UserCustomField[starts-with(#name, 'purchaseDate')]

Xpath to select only nodes where child elements exist?

This should be an easy one but it is giving me trouble. Given this structure:
<root>
<a>
<b/>
</a>
<a/>
</root>
I'm trying to formulate an xpath expression that gives only the non-empty "a" elements, i.e. the ones that have child elements. Therefore I want the first instance of "a" returned, but not the second.
So far I have "/root/a/self::*" but that is returning me both a's.
/root/a[count(*)>0]
will give any 'a' node with any kind of child node
/root/a[count(*)>0]
This one works
/root/a[*]
or even
//a[*]

Resources