Check if preceding sibiling is a title element - xpath

I need to see if the closest preceding element in the figure is a title element. My XML-structure looks like this:
<section>
<title>Something</title>
<figure><graphic></graphic></figure>
<figure><graphic></graphic></figure>
<p>Some text</p>
</section>

Take the first preceding sibling and verify it is a title:
//figure[preceding-sibling::*[1][self::title]]

Related

Select `text()` that 1) precede a given node but 2) are also descendants of another given node

This is a follow-up question of this, but unfortunately the answer from that question doesn't apply.
Say I have the following XML:
<body>
<div id="global-header">
header
</div>
<div id="a">
<h3>some title</h3>
<p>text 1
<b>bold</b>
</p>
<div>
<p>abc</p>
<p>text 2</p>
<p>def</p>
</div>
</div>
</body>
I want to
find the <p> node whose value is "text 2" (assume we only have exactly one such <p>), and then
find all the nodes that precede this particular <p> but are also descendants of the <div id='a'> node(you can use something like [#id='a'] to locate it), and finally
extract text() from step 2.
The desired output should look like:
some title
text 1
bold
abc
The caveat is that
the preceding nodes may contain arbitrary node type, not only <h3> and <p>.
the <p>text 2</p> node may be embeded arbitrarly deep in the tree, hence xpath like .//p[text()="text 2"]/preceding-sibling::* would only extract <p>abc</p> and leave out others.
You can try this XPath expression:
//p[.='text 2']/preceding::text()[ancestor::div[#id='a']]
The disadvantage of this approach is that the text() nodes may not be clearly separated, but rather merged for the sub-elements. To separate them, you'd need some kind of for-loop.

xpath:how to find a node that not contains text?

I have a html like:
...
<div class="grid">
"abc"
<span class="searchMatch">def</span>
</div>
<div class="grid">
<span class="searchMatch">def</span>
</div>
...
I want to get the div which not contains text,but xpath
//div[#class='grid' and text()='']
seems doesn't work,and if I don't know the text that other divs have,how can I find the node?
Let's suppose I have inferred the requirement correctly as:
Find all <div> elements with #class='grid' that have no directly-contained non-whitespace text content, i.e. no non-whitespace text content unless it's within a child element like a <span>.
Then the answer to this is
//div[#class='grid' and not(text()[normalize-space(.)])]
You need a not() statement + normalize-space() :
//div[#class='grid' and not(normalize-space(text()))]
or
//div[#class='grid' and normalize-space(text())='']

xpath: check if element is within other element

I have quite a large XML structure that in its simplest form looks kinda like this:
<document>
<body>
<section>
<p>Some text</p>
</section>
</body>
<backm>
<section>
<p>Some text</p>
<figure><title>This</title></figure>
</section>
</backm>
</document>
The section levels can be almost limitless (both within the body and backm elements) so I can have a section in section in section in section, etc. and the figure element can be within a numlist, an itenmlist, a p, and a lot more elements.
What I want to do is to check if the title in figure element is somewhere within the backm element. Is this possible?
A document could have multiple <backm> elements and it could have multiple <figure><title>Title</title></figure> elements in it. How you build your query depends on the situations you're trying to distinguish between.
//backm/descendant::figure/title
Will return the <title> elements that are the child of a <figure> element and the descendant of a <backm> element.
So:
count(//backm/descendant::figure/title) > 0
Will return True if there are 1 or more such title elements.
You can also express this using Double Negation
not(//backm[not(descendant::figure/title)])
I'm under the impression that this should have better performance.
//title[parent::figure][ancestor::backm]
Lists all <title> elements with a parent of <figure> and an <backm> ancestor.

xpath: count preceding elements

I have an xml structure that looks like this:
<document>
<body>
<section>
<title>something</title>
<subtitle>Something again</subtitle>
<section>
<p xml:id="1234">Some text</p>
<figure id="2121"></figure>
<section>
<p xml:id="somethingagain">Some text</p>
<figure id="939393"></figure>
<p xml:id="countelement"></p>
</section>
</section>
</section>
<section>
<title>something2</title>
<subtitle>Something again2</subtitle>
<section>
<p xml:id="12345678">Some text2</p>
<figure id="939394"></figure>
<p xml:id="countelement2"></p>
</section>
</section>
</body>
</document>
How can I count the figure elemtens I have before the <p xml:id="countelement"></p> element using XPath?
Edit:
And i only want to count figure elements within the parent section, in the next section it should start from 0 again.
Given you're using an XPath 2.0 compatible engine, find the count element and call fn:count() for each of them with using all preceding figure-elements as input.
This will return the number of figures preceding each "countelement" on the same level (I guess this is what you actually want):
//p[#xml:id="countelement"]/count(preceding-sibling::figure)
This will return the number of figures preceding each "countelement" and the level above:
//p[#xml:id="countelement"]/count(preceding-sibling::figure | parent::*/preceding-sibling::figure)
This will return the number of all preceeding figures preceding each "countelement" and the level above:
//p[#xml:id="countelement"]/count(preceding::figure)
If you're bound to XPath 1.0, you won't be able to get multiple results. If #id really is an id (and thus unique), you will be able to use this query:
count(//p[#xml:id="countelement"]/preceding::figure)
If there are "countelements" which are not <p/> elements, replace p by *.
count(id("countelement")/preceding-sibling::figure)
Please note that the xml:id attributes of two different elements cannot the same value, such as "countelement". If you wish two different elements to have a same-named attribute with the same value "countelement", it must be some other attribute perhaps "kind" that is not of DTD attribute type ID. In that case in place of id("countelement") you would use *[#kind="countelement"].

xquery/xpath- how to get number of descendant nodes of a particular type

Take a look at the sample XML below--
<div id="main">
<div id="1">
Some random text
</div>
<div id="2">
Some random text
</div>
<div id="3">
Some random text
</div>
<p> Some more random text</p>
<div id="4">
Some random text
</div>
</div>
Now, how do I find out the number of divs within the main div using Xquery? And how to do this in XPath?
You can use the following XPath:
count(div[#id="main"]/div)
The function count does the counting, the main div is selected by its id.
The XPath expressions below can be used both in XPath and XQuery. This is so, because XPath (2.0) is a proper subset of XQuery.
Use:
count(/*//div)
If "the main div" isn't the top element of the XML document, and this is the only div whose id attribute has string value of "main", use:
count((//div[#id='main'])[1]//div)
If it is guaranteed that the div children of the "main div" dont have div descendents, use:
count((//div[#id='main'])[1]/div)
Do note: The XPath pseudo-operator // can be very inefficient -- this is why, always try to avoid using it, whenever the structure of the XML document is statically known and specific paths can be used.

Resources