How to return xpath union of nodes in separate trees? - xpath

It's a basic question, but I couldn't find the answer anywhere.
<a>
<b1>
<d1>D1</d1>
<e1>E1</e1>
</b1>
<b2>
<c2>
<d2>D2</d2>
<e2>E2</e2>
</c2>
</b2>
</a>
From the above I'd like to return:
<a>
<d1>D1</d1>
<e1>E1</e1>
<d2>D2</d2>
<e2>E2</e2>
</a>
And not:
<a>
<b1>
<d1>D1</d1>
<e1>E1</e1>
</b1>
<b2>
<d2>D2</d2>
<e2>E2</e2>
</b2>
</a>
If that makes any sense. I tried "/a", but that gave me:
<a>
<b1>
<d1>D1</d1>
<e1>E1</e1>
</b1>
<b2>
<c2>D2E2</c2>
</b2>
</a>

If you meant to select all leave nodes (nodes without child node(s)), you can try this XPath :
//*[not(*)]
Or using XPath union (|) to get child nodes of <b1> and <c2> :
(//b1/* | //c2/*)
Given sample XML you posted, both XPath above will return :
<d1>D1</d1>
<e1>E1</e1>
<d2>D2</d2>
<e2>E2</e2>
But if you really need the result to be wrapped in <a>, then I agree with #minopret comment, that isn't what XPath meant to do. XSLT is more proper way to transform an XML to different format.
UPDATE :
In respond to your last comment, there is no such grouping in XPath. Should be done in the host language if you need that data structure. Your best bet is to select parent of those desired nodes in XPath so you get them grouped by their parent. Then you can do further processing in the host language, for example :
//*[not(*)]/parent::*
//*[*[not(*)]]
Any of above two XPath queries can return :
<b1>
<d1>D1</d1>
<e1>E1</e1>
</b1>
<c2>
<d2>D2</d2>
<e2>E2</e2>
</c2>

XPath can only return nodes that are already present in your source tree. To construct new nodes, or reorganise the tree, you need XSLT or XQuery.

Related

How to get parent element with attribute using xpath

I have posted sample XML and expected output kindly help to get the result.
Sample XML
<root>
<A id="1">
<B id="2"/>
<C id="2"/>
</A>
</root>
Expected output:
<A id="1"/>
You can formulate this query in several ways:
Find elements that have a matching attribute, only ascending all the time:
//*[#id=1]
Find the attribute, then ascend a step:
//#id[.=1]/..
Use the fn:id($id) function, given the document is validated and the ID-attribute is defined as such:
/id('1')
I think it's not possible what you're after. There's no way of selecting a node without its children using XPATH (meaning that it'd always return the nodes B and C in your case)
You could achieve this using XQuery, I'm not sure if this is what you want but here's an example where you create a new node based on an existing node that's stored in the $doc variable.
declare variable $doc := <root><A id="1"><B id="2"/><C id="2"/></A></root>;
element {fn:node-name($doc/*)} {$doc/*/#*}
The above returns <A id="1"></A>.
is that what you are looking for?
//*[#id='1']/parent::* , similar to //*[#id='1']/../
if you want to verify that parent is root :
//*[#id='1']/parent::root
https://en.wikipedia.org/wiki/XPath
if you need not just parent - but previous element with some attribute: Read about Axis specifiers and use Axis "ancestor::" =)

Xpath - matching based on node() contains() content

I have the following HTML structure (there are many blocks using the same architecture):
<span id="mySpan">
<i>
Price
<b>
3 900
<small>€</small>
</b>
</i>
</span>
Now, I want to get the content of <b> using Xpath which I tried like so:
//span[#id="mySpan"]/i/node()[1][contains(text(),"Price")]
which does match anything. How can I match this using the node()[1] text as anchor?
Regarding the Xpath you tried, instead of text() which return text node child, simply use . :
//span[#id="mySpan"]/i/node()[1][contains(.,"Price")]
For the ultimate goal, I'd suggest this XPath :
//span[#id="mySpan"]/i[contains(.,"Price")]/b
or if you want specifically to match against the first node within <i> :
//span[#id="mySpan"]/i[contains(node(),"Price")]/b

XPath: limit scope of result set

Given the XML
<a>
<c>
<b id="1" value="noob"/>
</c>
<b id="2" value="tube"/>
<a>
<c>
<b id="3" value="foo"/>
</c>
<b id="4" value="goo"/>
<b id="5" value="noob"/>
<a>
<b id="6" value="near"/>
<b id="7" value="bar"/>
</a>
</a>
</a>
and the Xpath 1.0 query
//b[#id=2]/ancestor::a[1]//b[#value="noob"]
The Xpath above returns both node ids 1 and 5. The goal is to limit the result to just node id=1 since it is the only #value="noob" element that is a descendant of the same <a> that (//b[#id=2]) is also a descendant of.
In other words, "Find all b elements who's value is "noob" that are descendants of the a element which also has a descendant whose id is 2, but is not the descendant of any other a element". How's that for convoluted? In practice the id number and values would be variable and there would hundreds of node types.
If the id=2, we would expect to return element id=1 not id=5 since it is contained in another a element. If the id=4, we would expect to return id=5, but not id=1 since it is not in the first ancestor a element as id=4.
Edit:
Based on the comments of Dimitre and Alejandro, I found this helpful blog entry explaining the use of count() with the | union operator as well as some other excellent tips.
Use:
//b[#value='noob']
[count(ancestor::a[1] | //b[#id=2]/ancestor::a[1]) = 1]
Explanation:
The second predicate assures that both b elements have the same nearest ancestor a.
Remember: In XPath 1.0 the test for node identity is:
count($n1 | $n2) = 1
First, this
is there some way to limit the result
set to the <b> elements that are ONLY
the children of the immediate <a>
element of the start node
(//b[#id=2])?
//b[#value='noob'][ancestor::a[1]/b/#id=2]
It's not the same as:
Starting at a node whose id is equal
to 2, find all the elements whose
value is "noob" that are descendants
of the immediate parent c element
without passing through another c
element
Wich is:
//c[b/#id=2]//*[.='noob'][ancestor::c[1][b/#id=2]]
Besides these expressions, when you are dealing with "context marks" you can use the set's membership test as in:
$node[count(.|$node-set)=count($node-set)]
I leave you its use for this case as an exercise...
//b[#id=2]/ancestor::a[1]//b[#value="noob" and not(ancestor::a[2]=//b[#id=2]/ancestor::a[1])] ?
that works only for your case though, not sure how generic it should be!

Simple xpath question

I'm thinking this is a very simple xpath question .. I'm just not sure why my xpath isn't working.
Here's what my XML looks like
<A>
<B>foo</B>
</A>
<C>
<A>
<B>foo</B>
</A>
</C>
Now .. I want to grab all "A" elements which contain a "B" with contained text "foo".
//A[B[text()='foo']]
//A matches all As
//A[B] that have a B as a child
//A[B[text()='foo']] which contains foo as text.
I suggest to read the XPath tutorial at w3chools.com

XPath - Get node with no child of specific type

XML: /A/B or /A
I want to get all A nodes that do not have any B children.
I've tried
/A[not(B)]
/A[not(exists(B))]
without success
I prefer a solution with the syntax /*[local-name()="A" and .... ], if possible. Any ideas that works?
Clarification. The xml looks like:
<WhatEver>
<A>
<B></B>
</A>
</WhatEver>
or
<WhatEver>
<A></A>
</WhatEver>
Maybe
*[local-name() = 'A' and not(descendant::*[local-name() = 'B'])]?
Also, there should be only one root element, so for /A[...] you're either getting all your XML back or none. Maybe //A[not(B)] or /*/A[not(B)]?
I don't really understand why /A[not(B)] doesn't work for you.
~/xml% xmllint ab.xml
<?xml version="1.0"?>
<root>
<A id="1">
<B/>
</A>
<A id="2">
</A>
<A id="3">
<B/>
<B/>
</A>
<A id="4"/>
</root>
~/xml% xpath ab.xml '/root/A[not(B)]'
Found 2 nodes:
-- NODE --
<A id="2">
</A>
-- NODE --
<A id="4" />
Try this "/A[not(.//B)]" or this "/A[not(./B)]".
The first / causes XPath to start at the root of the document, I doubt that is what you intended.
Perhaps you meant //A[not(B)] which would find all A nodes in the document at any level that do not have a direct B child.
Or perhaps you are already at a node that contains A nodes in which case you just want A[not(B)] as the XPath.
If you are trying to get A anywhere in the hierarchy from the root, this works (for xslt 1.0 as well as 2.0 in case its used in xslt)
//descendant-or-self::node()[local-name(.) = 'a' and not(count(b))]
OR you can also do
//descendant-or-self::node()[local-name(.) = 'a' and not(b)]
OR also
//descendant-or-self::node()[local-name(.) = 'a' and not(child::b)]
There are n no of ways in xslt to achieve the same thing.
Note: XPaths are case-sensitive, so if your node names are different (which I am sure, no one is gonna use A, B), then please make sure the case matches.
Use this:
/*[local-name()='A' and not(descendant::*[local-name()='B'])]

Resources