xpath get child node excluding parent - xpath

I'm looking for an xpath that will give me a child node only if the parent node doesn't equal a specific value. For example if I have an xml like the following:
<Grandpa><Dad><Son /></Dad><Son /></Grandpa>
I want to return the Son element outside the Dad element.

This Xpath selects those Son elements whose parent element is not named Dad:
//Son[local-name(..) != 'Dad']
So, applied to this XML:
<Grandpa><Dad><Son a="1"/></Dad><Son a="2"/></Grandpa>
It will select:
<Son a="2"/>

Related

Predicates: how is the expression nodeName='text' evaluated?

In this xpath:
/A/B[C='hello']
Is C="hello" some kind of syntactic shortcut for C[text()='hello']? Is it documented anywhere?
Edit: Okay, I discovered one difference: C= returns all the text nodes in C and C's children, while C[text()= returns only the text nodes in C.
Now, suppose I have the XML:
<root>
<A>
<B>
<C>hello<E>EEE</E>world</C>
<D>world</D>
</B>
<B>
<C>goodbye</C>
<D>mars</D>
</B>
</A>
</root>
How would I choose the B node containing the first C node using the syntax C[text()=? I can get the B node using the C= syntax like this:
/root/A/B[C="helloEEEworld"]
But this doesn't work:
/root/A/B[C[text()="helloworld"]]
nor do these:
/root/A/B[C[text()="hello world"]]
/root/A/B[C[text()="helloEEEworld"]]
Hmmm...this works:
/root/A/B[C[text()="hello"]]
Why is that? Does text() only return the first text node? According to the W3C, text() returns all text node children of the context node.
text() really returns all text node children as list of nodes
When you use /root/A/B[C[text()="hello"]] you mean fetch B node with C child that any direct child node is equal to "hello".
In the same way you can match it by :
/root/A/B[C[text()="world"]]
or explicitly specify that you want to get node by exact first or second direct child text node:
/root/A/B[C[text()[1]="hello"]]
/root/A/B[C[text()[2]="world"]]
If you want to match required node by its complete text content you can use
/root/A/B[C[.="helloEEEworld"]]
or
/root/A/B[C="helloEEEworld"]
C in the predicate expression [C='hello'] returns all C elements that is direct child of context element which is B. So the entire predicate is a boolean expression that contains comparison between a node-set and a string (notice that element is a type of node in XPath data model), and behavior of this case is documented in the spec as follows :
If one object to be compared is a node-set and the other is a string, then the comparison will be true if and only if there is a node in the node-set such that the result of performing the comparison on the string-value of the node and the other string is true. If one object to be compared is a node-set and the other is a boolean, then the comparison will be true if and only if the result of performing the comparison on the boolean and on the result of converting the node-set to a boolean using the boolean function is true. [source]
C='hello' in /A/B[C='hello'] will be evaluated to true if any of the C elements, after converted to string, equals 'hello'. So it is more of a shortcut for C[string()='hello'] if you will.
"Hmmm...this works:
/root/A/B[C[text()="hello"]]
Why is that? Does text() only return the first text node? According to the W3C, text() returns all text node children of the context node."
Instead of the first text node, text() in this context returns all direct child text nodes. This is because child:: is the default axis in XPath. Contrasts your XPath with the equivalent verbose version of it :
/child::root/child::A/child::B[child::C[child::text()="hello"]]

XPath difference between two similar path and other questions

I've to made some exercices but
I don't really understand the difference between two similar path
I've the tree :
<b>
<t></t>
<a>
<n></n>
<p></p>
<p></p>
</a>
<a>
<n></n>
<p></p>
</a>
<a></a>
</b>
And we expect that each final tag contain one text node.
I've to explain the difference between //a//text() and //a/text()
I see that //a//text() return all text nodes and it seems legit,
but why //a/text() return the last "a node" -> text node ?
Another question :
why //p[1] return for each "a node", the first "p" child node ?
-> I've two results
<b>
<t></t>
<a>
<n></n>
**<p></p>**
<p></p>
</a>
<a>
<n></n>
**<p></p>**
</a>
<a></a>
</b>
Why the answer is not the first "p" node for the whole document ?
Thanks for all !
Difference between 1: //a//text() and 2: //a/text()
Let's break it down: //a selects all a elements, no matter where they are in the document. Suppose you have /a, that would select all root a elements.
If the / path expression comes after another element in an XPath expression, it will select elements directly descending the element before that in the XPath expression (ie child elements).
If the // path expression comes after another element in an XPath expression, it will select all elements that are descendant of the previous element, no matter where they are under the previous element.
Applying to your two XPath expressions:
//a//text(): Select all a elements no matter where they are in the document, and for those elements select text() no matter where they are under the a elements selected.
//a/text(): Select all a elements no matter where they are in the document, and for those elements select any direct descendant text().
Why //p[1] returns for each "a node", the first "p" child node?
Suppose you were to write //a/p[1], this would select the first p child element of any a element anywhere in the document. By writing //p[1] you are omitting an explicit parent element, but the predicate still selects the first child element of any parent the p element has.
In this case there are two parent a elements, for which the first p child element is selected.
It would be good to search for a good introduction to XPath on your favorite search engine. I've always found this one from w3schools.com to be a good one.

Need a xpath : where parent having multiple child, but i required only parent value

In below code: parent "div" having three child "span", "script" and "span". but i required the value of Parent "div" which "N/A". "N/A" not comes under any attribute of div. Its just a value of parent "div".
<div class="ah-text-align-right ah-font-xsmall" style="">
<span id="_dcmanageinvestmentsportlet_WAR_ahdcmnginvportlet__FDROR_110hidden" style="display:none">
<script type="text/javascript">
<span class="ah-float-left">
N/A
</div>
For getting parent element you can use double dot .. after child element xpath.
For getting text of an element you can use xpath text() function, but depending on implementation of xpath in whatever environment and code you use, it might be unavailable. Note, that text of an element will return actual text node of this element as well as all text nodes of child elements.
For your case if you search a parent of a span with ah-float-left class, then xpath should be something like following:
//span[#class='ah-float-left']/..
For getting text of a parent, you'll need following:
//span[#class='ah-float-left']/../text()
Note: looking elements up by class name may return you a collection of elements which in turn will return you collection of parent elements and collection of parent nodes texts, which may not be desired. I would recommend lookup child element by id, since xhtml prescribes that elements ids are unique. Thus, an xpath for a parent div should better look like following:
//span[#id='_dcmanageinvestmentsportlet_WAR_ahdcmnginvportlet__FDROR_110hidden']/..

How to select node which has a parent with some attributes

How to select node which has a parent with some attributes.
Eg: what is Xpath to select all expiration_time elements.
In the following XML, I'm getting error if states elements has attributes, otherwise no probs.
Thanks
<lifecycle>
<states elem="0">
<expiration_time at="rib" zing="chack">08</expiration_time>
</states>
<states elem="1">
<expiration_time at="but">4:52</expiration_time>
</states>
<states elem="2">
<expiration_time at="ute">05:40:15</expiration_time>
</states>
<states elem="3">
<expiration_time>00:00:00</expiration_time>
</states>
</lifecycle>
states/expiration_time[../#elem = "0"]?
Use:
/*/*/expiration_time
This selects all expiration_time elements that are grand-children of the top-element of the XML document.
/*/*[#*]/expiration_time
This selects any expiration_time element whose parent has at least one attribute and is a child of the top element of the XML document.
/*/*[not(#*)]/expiration_time
This selects any expiration_time element whose parent has no attributes and is a child of the top element of the XML document.
/*/*[#elem = '2']/expiration_time
This selects any expiration_time element whose parent has an elem attribute with string value '2' and that is (the parent) a child of the top element of the XML document.
This will give you all nodes having atleast one attribute
//*[count(./#*) > 0]

XPATH filter tag-less children

Is there any way to specify that I want to select only tag-less child elements (in the following example - "text")?
<div>
<p>...</p>
"text"
</div>
The text() function matches text nodes. Example: //div/text() — matches all text children within all div elements.
Use:
/*/text()[normalize-space()]
This selects all text nodes that are children of the top element of the document and that do not consist only of white-space characters.
In the concrete example this will select only the text node with string value:
'
"text"
'
The XPath expressions:
/*/text()
or
/div/text()
both select two text nodes, the first of which contains only white-space and the second is the same text node as above:
'
"text"
'
select only tag-less child elements
To me this sounds like selecting all elements that don't have other elements as children. But then again, "text" in your example is not an element, but a text node, so I'm not really sure what do you want to select...
Anyway, here is a solution for selecting such elements.
//*[not(*)]
Selects all elements that don't have an element as a child. Replace the first * with an element name if you only want to select certain elements that don't have child elements. Also note that using // is generally slow since it runs through the whole document. Consider using more specific path when possible (like /div/*[not(*)] in this case).

Resources