XPATH / ElementTree - How to get next element that is not a child - xpath

i want to search for a specific element <B> if <B>'s child <C> equals the string s1 i want to search from that element on to the next element <X> that is NOT a child of <B> and return its value (s2).
The tree would look something like this:
<A>
<B>
<C>s1</C>
</B>
<D>
<X>s2</X>
</D>
</A>

The following works for me in xsh
//X[preceding::B[C='s1']][not(parent::A)]/text()

Related

Get parents attribute value if child doesn't have a specific attribute value

I have an xml file in linux that I want to process.
I need to get all ids of a parent nodes based on its children.
Here I want to get all id of 'a' that have 'c' without key "f.g".
<a id="11111">
<b>
<c key="d.e">stuff1</c>
<c key="f.g">stuff2</c>
<c key="j.k">stuff4</c>
</b>
</a>
<a id="22222">
<b>
<c key="d.e">stuff1</c>
<c key="h.i">stuff3</c>
<c key="j.k">stuff4</c>
<c key="l.m">stuff5</c>
</b>
</a>
<a id="33333">
<b>
<c key="c.d">stuff0</c>
<c key="d.e">stuff1</c>
<c key="h.i">stuff3</c>
<c key="j.k">stuff4</c>
<c key="l.m">stuff5</c>
</b>
</a>
In this case I should be getting 22222 and 33333.
I'm not really sure how to write the xpath for this.
I think you are looking for something like:
//a[not(.//c[#key="f.g"])]/#id
which can be translated as: find any node <a> which does NOT have a child node <c> which itself has an attribute called key which itself has an attribute value of "f.g".
You can filter by (not):
//a[[not(#key = 'f.g')]]
It will return you needed 'a' elements, but I don't know how to get their ids.
#Jack Fleeting's answer is probably the best solution. As an alternative (more consuming) :
//c[not(#key="f.g" or preceding-sibling::c[#key="f.g"] or following-sibling::c[#key="f.g"])]/ancestor::a
Look for c elements where itself, and preceding or following siblings contain an attribute different from #key="f.g". Then select their a ancestors.

XPath : check if first child is desired or not

I want to check if the first child is B or not in the following code:
<A>
<B>
<C> 123</C>
</B>
</A>
<A>
<E>
<C> 00</C>
</E>
<B>
<C>121</C>
</B>
</A>
Here there are two A and in both we have B. I want to check if first child of A is B and print value in C in it.
How can I do this. using xpath?
How about:
//A/*[1][self::B]/C
It gets the first child element of A node that is a B node, then gets its C node content.
Use this XPath-1.0 expression:
A[child::*[1] = child::B]/B[1]/C
It checks if the first child node of A is identical with the first B child node. If yes, then it selects the first B node and returns the value of its C child.

Why doesn't //* return the document node?

I am trying to understand the following example
<?xml version="1.0" encoding="UTF-8"?>
<c>
<a>
<b att1="5">
<c/>
</b>
<d/>
</a>
<a att1="10">
<d>
<c/>
</d>
<b/>
</a>
</c>
Now I run the XPath query
//*[c]
which I take to mean "All nodes that have a child that is a c". However, this returns only the <b> and <d> nodes that have a <c> child without returning the Document node as I expected. Can anyone explain why?
Because //* equivalent to /descendant-or-self::node()/*. Notice that the document node referenced by self::node() in the previous XPath, so the outer most node selected by that XPath would be the child of the document node (due to /*), which is the root element c, which doesn't have direct child c, hence didn't get selected.
You want /descendant-or-self::node()[c] to include the document node, which is equivalent to //.[c], see the demo.

How to group two nodes which are not related in xpath?

I have html structure like this:
<a>
<c>
</c>
</a>
<b>
<d>
</d>
</b>
<a>
<c>
</c>
</a>
<b>
<d>
</d>
</b>
How do I group node 'a' and node 'b' together?
The xpath should be able to select the pairs of node 'a' and 'b'.
The nodes have auto generated id's and name's so I can't use them in xpath.
You can use the | operator for two unrelated XPath Expressions:
(//a | //b)

XPath: limit scope of result set

Given the XML
<a>
<c>
<b id="1" value="noob"/>
</c>
<b id="2" value="tube"/>
<a>
<c>
<b id="3" value="foo"/>
</c>
<b id="4" value="goo"/>
<b id="5" value="noob"/>
<a>
<b id="6" value="near"/>
<b id="7" value="bar"/>
</a>
</a>
</a>
and the Xpath 1.0 query
//b[#id=2]/ancestor::a[1]//b[#value="noob"]
The Xpath above returns both node ids 1 and 5. The goal is to limit the result to just node id=1 since it is the only #value="noob" element that is a descendant of the same <a> that (//b[#id=2]) is also a descendant of.
In other words, "Find all b elements who's value is "noob" that are descendants of the a element which also has a descendant whose id is 2, but is not the descendant of any other a element". How's that for convoluted? In practice the id number and values would be variable and there would hundreds of node types.
If the id=2, we would expect to return element id=1 not id=5 since it is contained in another a element. If the id=4, we would expect to return id=5, but not id=1 since it is not in the first ancestor a element as id=4.
Edit:
Based on the comments of Dimitre and Alejandro, I found this helpful blog entry explaining the use of count() with the | union operator as well as some other excellent tips.
Use:
//b[#value='noob']
[count(ancestor::a[1] | //b[#id=2]/ancestor::a[1]) = 1]
Explanation:
The second predicate assures that both b elements have the same nearest ancestor a.
Remember: In XPath 1.0 the test for node identity is:
count($n1 | $n2) = 1
First, this
is there some way to limit the result
set to the <b> elements that are ONLY
the children of the immediate <a>
element of the start node
(//b[#id=2])?
//b[#value='noob'][ancestor::a[1]/b/#id=2]
It's not the same as:
Starting at a node whose id is equal
to 2, find all the elements whose
value is "noob" that are descendants
of the immediate parent c element
without passing through another c
element
Wich is:
//c[b/#id=2]//*[.='noob'][ancestor::c[1][b/#id=2]]
Besides these expressions, when you are dealing with "context marks" you can use the set's membership test as in:
$node[count(.|$node-set)=count($node-set)]
I leave you its use for this case as an exercise...
//b[#id=2]/ancestor::a[1]//b[#value="noob" and not(ancestor::a[2]=//b[#id=2]/ancestor::a[1])] ?
that works only for your case though, not sure how generic it should be!

Resources