Xpath - How to select a node but not its child nodes - xpath

I am trying to select a node but not any of its child nodes.
Example Input:
<Header attr1="Hello">
<child1> hello </child1>
<child2>world</child2>
</Header>
Expected Output: <Header attr1="Hello"> </Header>
Code:
Document xmlDoc = saxBuilder.build(inputStream);
Xpath x = XPath.newInstance("/Header");
eleMyElement = x.selectSingleNode(xmlDoc);
XMLOutputter output = new XMLOutputter();
output.outputString(eleMyElement) --> this is the output
I tried with /Header as XPath, it gives me the header along with child nodes.

You need to distinguish what is selected from what is displayed.
The XPath expression /Header selects one node only, the Header element. You say "it gives me", but what is "it"? Something is displaying the results of the XPath selection, and it is choosing to display the results by rendering the selected element with all its children. You need to look at the code that is displaying the result.

In this case you can simply do
eleMyElement.getContent().clear();
and all child nodes will be deleted.

Related

xPath: fetch element with an attribute containing the text of another element

Given I have the following HTML structure:
<button aria-labelledby="ref-1" id="foo" onclick="convey(event)">action 2</button>
<div class="anotherElement">foobar</div>
<div id="ref-1" hidden>target 2</div>
I would like to fetch button by its aria-labelledby attribute. I tried the following options:
//*[#aria-labelledby=string(/div[#id="ref-1"]/#id)]
//*[#aria-labelledby = string(.//*[normalize-space() = "target 2"]/#id)]
//*[#aria-labelledby = .//*[normalize-space() = "target 2"]/#id]
But wasn't able to fetch the element. Anyone has an idea what the right xPath could be?
Edit: simply put: how do I fetch the button element if my only information is "target 2", and if both elements can be randomly located?
//button[#aria-labelledby='ref-1']
or
//button[#aria-labelledby=(//*/#id)]
or
//button[#aria-labelledby=(//*[contains(.,'target 2')]/#id)]
or
//button[#aria-labelledby=(//*[contains(text(),'target 2')]/#id)]
?
Since button and div are the same level siblings here you can use preceding-sibling XPath expression like this:
//div[text()='target 2']//preceding-sibling::button
pay attention with with your actual XML this will match 2 button elements.
To make more precise math I think we will need to be based on more details, not only the target 2 text

How can I select nodes that don't contain links but which do contain specific text using xpath

Given the following HTML:
$content =
'<html>
<body>
<div>
<p>During the interim there shall be nourishment supplied</p>
</div>
<div>
<p>During the interim there shall be interim nourishment supplied</p>
</div>
<div>
<ul><li>During the interim there shall be nourishment supplied</li></ul>
</div>
</body>
</html>';
I want all the nodes containing the word "interim" but not if the word "interim" is part of a link element.
The nodes I would expect back are the first P node and the LI node only.
I've tried the following:
'//*/text()[not(a) and contains(.,"interim")]'
... but this still returns the A and also returns part of it's parent P node (the part after the A), neither of which are desired. You can see my attempt here: https://glot.io/snippets/ehp7hmmglm
If you use the XPath expression //*[not(self::a) and not(a) and text()[contains(.,"interim")]] then you get all elements that do not contain an a element, are not a elements and contain a text node child containing that word.

XPath difference between two similar path and other questions

I've to made some exercices but
I don't really understand the difference between two similar path
I've the tree :
<b>
<t></t>
<a>
<n></n>
<p></p>
<p></p>
</a>
<a>
<n></n>
<p></p>
</a>
<a></a>
</b>
And we expect that each final tag contain one text node.
I've to explain the difference between //a//text() and //a/text()
I see that //a//text() return all text nodes and it seems legit,
but why //a/text() return the last "a node" -> text node ?
Another question :
why //p[1] return for each "a node", the first "p" child node ?
-> I've two results
<b>
<t></t>
<a>
<n></n>
**<p></p>**
<p></p>
</a>
<a>
<n></n>
**<p></p>**
</a>
<a></a>
</b>
Why the answer is not the first "p" node for the whole document ?
Thanks for all !
Difference between 1: //a//text() and 2: //a/text()
Let's break it down: //a selects all a elements, no matter where they are in the document. Suppose you have /a, that would select all root a elements.
If the / path expression comes after another element in an XPath expression, it will select elements directly descending the element before that in the XPath expression (ie child elements).
If the // path expression comes after another element in an XPath expression, it will select all elements that are descendant of the previous element, no matter where they are under the previous element.
Applying to your two XPath expressions:
//a//text(): Select all a elements no matter where they are in the document, and for those elements select text() no matter where they are under the a elements selected.
//a/text(): Select all a elements no matter where they are in the document, and for those elements select any direct descendant text().
Why //p[1] returns for each "a node", the first "p" child node?
Suppose you were to write //a/p[1], this would select the first p child element of any a element anywhere in the document. By writing //p[1] you are omitting an explicit parent element, but the predicate still selects the first child element of any parent the p element has.
In this case there are two parent a elements, for which the first p child element is selected.
It would be good to search for a good introduction to XPath on your favorite search engine. I've always found this one from w3schools.com to be a good one.

Get count of specific nodes between two specific sibling nodes

I'm using HtmlAgilityPack to get a filtered DOM of <h2> and <h3> nodes and using Xpath 1.0 (from my Xpath 1.0 crash course this week) I need to get the number of <h3>'s (the number varies) that are between sibling <h2>'s as follows:
<div>
<h2>heading 1</h2>
<h3>sub 1.1</h3>
<h3>sub 1.2</h3>
<h2>heading 2</h2>
<h3>sub 2.1</h3>
<h2>heading 3</h2>
....
</div>
When I iterate (using C#) through the filtered nodes I want the exact number of <h3>'s that are after a <h2> and before the next <h2>. When I use the following I get all the <h3>'s as the result.
int countH3 = n.SelectNodes("./preceding-sibling::h2[2]/following-sibling::h2[3]/preceding-sibling::h3").Count(); //the [position] is set dynamically
For the node structure above would like the result of the code line to be:
countH3 = 1
but it is:
countH3 = 3
I've found many similar SO questions regarding "sibling nodes between sibling nodes" and have to thank #LarsH for his comment in another question that /preceding::h3 returns ALL <h3>'s which helped explain the issue. I think I may need to use the Kayessian method of node-set intersection but get the "invalid token" error when I include the . | union character as follows:
countH3 = n.SelectNodes("./h2[2]/following-sibling::h2[3]
[count(.|./h2[2]/following-sibling::h2[3]/preceding-sibling::h3)=
count(./h2[2]/following-sibling::h2[3]/preceding-sibling::h3)]").Count();
Any suggestions appreciated.

What is the effect of this XPATH expression?

What happens when I apply following-sibling::*[1] to the last child?
What happens when I apply
following-sibling::*[1] to the last
child?
Answer: it gets evaluate to an empty node-set, because there is no more following siblings.
If you want to get the following sibling of the context node or following sibling of context node parent otherwise, the rigth axis is following, as in:
following::*[1]
It will get you the first sibling to the current node.
Here is an example:
<Root>
<Div id="Hey">
Test1
</Div>
<Div>
Test2
</Div>
<Div>
Test3
</Div>
</Root>
XPath:
/Root/Div[#id = 'Hey']/following-sibling::*[1]
/Root/Div[#id = 'Hey'] will get you the Div with id=Hey and /following-sibling::*[1] will then get you the first sibling so in total you would get the Div with the text "Test2".
Update: I apologize (see comment), but this: /Root/Div[3]/following-sibling::*[1] will just return a empty list. (Div[3] is the last child)

Resources