xpath: filter selected nodes based on type of parent node - xpath

Here's a sample of the XML I'm dealing with:
<subchapter>
<section>
</section>
</subchapter>
<part>
<section>
</section>
</part>
<part>
<section>
</section>
</part>
<quotedContent>
<section>
</section>
</quotedContent>
I'm trying to filter out certain nodes based on the type of their parents nodes. In other words, I want to find all the <section> nodes NOT in <quotedContent> nodes. There are various other parent nodes in addition to <part> and <subchapter> that I want to be included in my end result. So, it's a matter of excluding just the <quotedContent> nodes. I'm pretty sure its just a matter of getting the xpath string correct.
I'm using R's xml2 package, specifically the xml_find_all() function, as follows:
xml_find_all(ustc, "..//d1:section[parent='part']", ns = xml_ns(ustc))
Based on the above xml example, I would expect to get two nodes -- the first two, not the last one inside the .

Use not(parent::quotedContent) in the predicate e.g. //section[not(parent::quotedContent)]. Or //*[not(self::quotedContent)]/section.

Related

Xpath - How to select a node but not its child nodes

I am trying to select a node but not any of its child nodes.
Example Input:
<Header attr1="Hello">
<child1> hello </child1>
<child2>world</child2>
</Header>
Expected Output: <Header attr1="Hello"> </Header>
Code:
Document xmlDoc = saxBuilder.build(inputStream);
Xpath x = XPath.newInstance("/Header");
eleMyElement = x.selectSingleNode(xmlDoc);
XMLOutputter output = new XMLOutputter();
output.outputString(eleMyElement) --> this is the output
I tried with /Header as XPath, it gives me the header along with child nodes.
You need to distinguish what is selected from what is displayed.
The XPath expression /Header selects one node only, the Header element. You say "it gives me", but what is "it"? Something is displaying the results of the XPath selection, and it is choosing to display the results by rendering the selected element with all its children. You need to look at the code that is displaying the result.
In this case you can simply do
eleMyElement.getContent().clear();
and all child nodes will be deleted.

xpath: check if element is within other element

I have quite a large XML structure that in its simplest form looks kinda like this:
<document>
<body>
<section>
<p>Some text</p>
</section>
</body>
<backm>
<section>
<p>Some text</p>
<figure><title>This</title></figure>
</section>
</backm>
</document>
The section levels can be almost limitless (both within the body and backm elements) so I can have a section in section in section in section, etc. and the figure element can be within a numlist, an itenmlist, a p, and a lot more elements.
What I want to do is to check if the title in figure element is somewhere within the backm element. Is this possible?
A document could have multiple <backm> elements and it could have multiple <figure><title>Title</title></figure> elements in it. How you build your query depends on the situations you're trying to distinguish between.
//backm/descendant::figure/title
Will return the <title> elements that are the child of a <figure> element and the descendant of a <backm> element.
So:
count(//backm/descendant::figure/title) > 0
Will return True if there are 1 or more such title elements.
You can also express this using Double Negation
not(//backm[not(descendant::figure/title)])
I'm under the impression that this should have better performance.
//title[parent::figure][ancestor::backm]
Lists all <title> elements with a parent of <figure> and an <backm> ancestor.

xpath: count preceding elements

I have an xml structure that looks like this:
<document>
<body>
<section>
<title>something</title>
<subtitle>Something again</subtitle>
<section>
<p xml:id="1234">Some text</p>
<figure id="2121"></figure>
<section>
<p xml:id="somethingagain">Some text</p>
<figure id="939393"></figure>
<p xml:id="countelement"></p>
</section>
</section>
</section>
<section>
<title>something2</title>
<subtitle>Something again2</subtitle>
<section>
<p xml:id="12345678">Some text2</p>
<figure id="939394"></figure>
<p xml:id="countelement2"></p>
</section>
</section>
</body>
</document>
How can I count the figure elemtens I have before the <p xml:id="countelement"></p> element using XPath?
Edit:
And i only want to count figure elements within the parent section, in the next section it should start from 0 again.
Given you're using an XPath 2.0 compatible engine, find the count element and call fn:count() for each of them with using all preceding figure-elements as input.
This will return the number of figures preceding each "countelement" on the same level (I guess this is what you actually want):
//p[#xml:id="countelement"]/count(preceding-sibling::figure)
This will return the number of figures preceding each "countelement" and the level above:
//p[#xml:id="countelement"]/count(preceding-sibling::figure | parent::*/preceding-sibling::figure)
This will return the number of all preceeding figures preceding each "countelement" and the level above:
//p[#xml:id="countelement"]/count(preceding::figure)
If you're bound to XPath 1.0, you won't be able to get multiple results. If #id really is an id (and thus unique), you will be able to use this query:
count(//p[#xml:id="countelement"]/preceding::figure)
If there are "countelements" which are not <p/> elements, replace p by *.
count(id("countelement")/preceding-sibling::figure)
Please note that the xml:id attributes of two different elements cannot the same value, such as "countelement". If you wish two different elements to have a same-named attribute with the same value "countelement", it must be some other attribute perhaps "kind" that is not of DTD attribute type ID. In that case in place of id("countelement") you would use *[#kind="countelement"].

Using XPath expression how can i get the first text node immediately following a node?

I want to get to the exact node having this text: 'Company'. Once I get to this node I want to get to the next text node immediately following this node because this contains the company name. How can I do this with Xpath?
Fragment of XML is:
<div id="jobsummary">
<div id="jobsummary_content">
<h2>Job Summary</h2>
<dl>
<dt>Company</dt>
<!-- the following element is the one I'm looking for -->
<dd><span class="wrappable">Pinpoint IT Services, LLC</span></dd>
<dt>Location</dt>
<dd><span class="wrappable">Newport News, VA</span></dd>
<dt>Industries</dt>
<dd><span class="wrappable">All</span></dd>
<dt>Job Type</dt>
<dd class="multipledd"><span class="wrappable">Full Time</span></dd><dd class="multipleddlast"><span class="wrappable"> Employee</span></dd>
</dl>
</div>
</div>
I got to the Company tag with following xpath: //*[text()= 'Company']
Now I want to get to the next text node. My XML is dynamic. So I can't hardcode the node type like <dd> for getting the company value. But this is for sure that the value be in the immediate next text node.
So how can I get to the text node immediately after the node with text as Company?
If you cannot hardcode any part of the following-sibling node your xpath should look like this:
//*[text()='Company']/following::*/*/text()
assuming that the desired text is always enclosed in another element like span.
To test for given dt text, modify your xpath to
//*[text()='Company' or text()='Company:' or text()='Company Name']/following::*/*/text()
use //*[text()='Company']/following-sibling::dd to get the next dd.
You can even insert conditions for that dd and also go further in it.
following-sibling::elementName just looks for the next sibling at the same parent level that meets your requirements.
With no conditions, like above, it will get the next dd after the 'Company'.
The text is in the span so you might try
//*[text()='Company']/following-sibling::dd/span
Another clarifying example would be, let's say that you want to get also the next industries text for the current selected 'Company'.
Having //*[text()='Company',
you can modify it like this: //*[text()='Company']/following-sibling::dt[text()='Industries']/dd/span
Of course, instead of hardcoding the values for text(), you can use variables.
You can Use XPathNavigator and go on to every node type one by one
I think XPathNavigator::MoveToNext is the method you are looking for.
There is the sample code as well at..
http://msdn.microsoft.com/en-us/library/9yxc3x24.aspx
Use this general XPath expression that selects the wanted text node even when it is wrapped in statically unknown markup elements:
(//*[text()='Company']/following-sibling::*[1]//text())[1]
When this XPath expression is evaluated against the provided XML document:
<div id="jobsummary">
<div id="jobsummary_content">
<h2>Job Summary</h2>
<dl>
<dt>Company</dt>
<!-- the following element is the one I'm looking for -->
<dd><span class="wrappable">Pinpoint IT Services, LLC</span></dd>
<dt>Location</dt>
<dd><span class="wrappable">Newport News, VA</span></dd>
<dt>Industries</dt>
<dd><span class="wrappable">All</span></dd>
<dt>Job Type</dt>
<dd class="multipledd"><span class="wrappable">Full Time</span></dd><dd class="multipleddlast"><span class="wrappable"> Employee</span></dd>
</dl>
</div>
</div>
exactly the wanted text node is selected:
Pinpoint IT Services, LLC
Even if we change the XML to this one:
<div id="jobsummary">
<div id="jobsummary_content">
<h2>Job Summary</h2>
<div>
<p>Company</p>
<!-- the following element is the one I'm looking for -->
<dd><span class="wrappable"><b><i><u>Pinpoint IT Services, LLC</u></i></b></span></dd>
<dt>Location</dt>
<dd><span class="wrappable">Newport News, VA</span></dd>
<dt>Industries</dt>
<dd><span class="wrappable">All</span></dd>
<dt>Job Type</dt>
<dd class="multipledd"><span class="wrappable">Full Time</span></dd><dd class="multipleddlast"><span class="wrappable"> Employee</span></dd>
</div>
</div>
</div>
the XPath expression above still selects the wanted text node:
Pinpoint IT Services, LLC

XPath - Get node with no child of specific type

XML: /A/B or /A
I want to get all A nodes that do not have any B children.
I've tried
/A[not(B)]
/A[not(exists(B))]
without success
I prefer a solution with the syntax /*[local-name()="A" and .... ], if possible. Any ideas that works?
Clarification. The xml looks like:
<WhatEver>
<A>
<B></B>
</A>
</WhatEver>
or
<WhatEver>
<A></A>
</WhatEver>
Maybe
*[local-name() = 'A' and not(descendant::*[local-name() = 'B'])]?
Also, there should be only one root element, so for /A[...] you're either getting all your XML back or none. Maybe //A[not(B)] or /*/A[not(B)]?
I don't really understand why /A[not(B)] doesn't work for you.
~/xml% xmllint ab.xml
<?xml version="1.0"?>
<root>
<A id="1">
<B/>
</A>
<A id="2">
</A>
<A id="3">
<B/>
<B/>
</A>
<A id="4"/>
</root>
~/xml% xpath ab.xml '/root/A[not(B)]'
Found 2 nodes:
-- NODE --
<A id="2">
</A>
-- NODE --
<A id="4" />
Try this "/A[not(.//B)]" or this "/A[not(./B)]".
The first / causes XPath to start at the root of the document, I doubt that is what you intended.
Perhaps you meant //A[not(B)] which would find all A nodes in the document at any level that do not have a direct B child.
Or perhaps you are already at a node that contains A nodes in which case you just want A[not(B)] as the XPath.
If you are trying to get A anywhere in the hierarchy from the root, this works (for xslt 1.0 as well as 2.0 in case its used in xslt)
//descendant-or-self::node()[local-name(.) = 'a' and not(count(b))]
OR you can also do
//descendant-or-self::node()[local-name(.) = 'a' and not(b)]
OR also
//descendant-or-self::node()[local-name(.) = 'a' and not(child::b)]
There are n no of ways in xslt to achieve the same thing.
Note: XPaths are case-sensitive, so if your node names are different (which I am sure, no one is gonna use A, B), then please make sure the case matches.
Use this:
/*[local-name()='A' and not(descendant::*[local-name()='B'])]

Resources