Select nodes that 1) precede a given node but 2) are also descendants of another given node - xpath

Say I have the following XML:
<body>
<div id="global-header">
header
</div>
<div class="a">
<h3>some title</h3>
<p>text 1</p>
<p>text 2</p>
<p>text 3</p>
</div>
</body>
I want to
find any <p> node whose value is "text 2", and then
find all the nodes that precede this particular <p> but are also descendants of the <div class='a'> node.
The desired output should look like:
<h3>some title</h3>
<p>text 1</p>
The caveat is that the preceding nodes may contain arbitrary node type, not only <h3> and <p>, as in the above case.
My first try:
.//p[text()="text 2"]/preceding::*
Unfortunately, this will also select <div id="global-header">, which is not desired.

You need to use preceding-sibling to select nodes that are children of the same parent instead of preceding:
.//p[text()="text 2"]/preceding-sibling::*

Related

Extract all text from node and child nodes excluding specific nodes

I have the following HTML:
<div class="flex flex-wrap">
<span>Final result </span>
<strong>2:0</strong> (7:6 <div>
<sup>5</sup>
</div>, 6:1)
</div>
I can extract all the text using:
string(//div[#class='flex flex-wrap'])
However, I don't want extract the superscript text in the <sup> tag. How would I do this?

XPATH: Select a node whose children do not containg some text

I'm trying to select a node whose children do not contain some specific text.
For example:
<div class="b-margin">
<div class="tag">Pt</div>
<div class="tag">En</div>
</div>
<div class="b-margin">
<div class="tag">Ru</div>
<div class="tag">En</div>
</div>
How would i go about selecting the 'div class="b-margin"' nodes that do not have children with the text "Pt"?
Here is the simple xpath.
//div[#class='b-margin' and not(div[.='Pt'])]
Screenshot:

Select element based on cousin value

Lets say I have this html (ignore tags names):
<div>
<card>
<h2>1</h2>
</card>
<footer>
<p>text 1</p>
</footer>
</div>
<div>
<card>
<h2>2</h2>
</card>
<footer>
<p>text 2</p>
</footer>
</div>
<div>
<card>
<h2>3</h2>
</card>
<footer>
<p>text 2</p>
</footer>
</div>
and I want to select p tag that have an h2 value of 2 (I will select p with text 2)
if I use this expression //h2[text()="2"]/../following::footer/p I will get 2 p tags.
How do I select only the p tag with cousin h2 value of 2 ?
EDIT: Robbie Averill answer was the first to work, but you should check other answers they are very good too.
You can navigate from the h2 matched up to the div that contains the element you want, then target footer/p elements from there:
//h2[text()="2"]/../../footer/p
Try to use below XPath to select required element:
//card[h2="2"]/following-sibling::footer/p
This XPath,
//div[card/h2="2"]/footer/p
will select footer/p cousins of card/h2 elements with string values of 2.

How to select the first occurrence in each element by XPath?

In the following html tags:
<div>
<div>
<h3>
<a href='http://Ali.org'></a>
</h3>
<div>
<p>
<a href='http://Mohammad.org'></a>
</p>
</div>
</div>
<div>
<h4>
<a href='http://Ali.org'></a>
</h4>
<p>
<a href='http://Mohammad.org'></a>
</p>
</div>
</div>
I want to select two 'a' tags 'http://Ali.org' & 'http://YaALi.org'. By the following, I can:
//div//a[not(parent::*[not(following-sibling::*)])]
But what about a simpler XPath?
By the following, all of 'a' tags will be selected since they are all the first child of their parents:
//div/div//a[1]
Or by the following, just the first 'a' tag will be selected:
(//div//a)[1]
I want to select 'a' tags that are the first in the 'a' tags of div elements...
// in the middle of a path is an abbreviation for descendant-or-self::node(), so if you do
//div/div//a[1]
this effectively means
//div/div/descendant-or-self::node()/a[1]
This picks the first child a of all descendant nodes. What you want is:
//div/div/descendant::a[1]
which will pick the first descendant a.

Make use of XPath Axes to extract sibling elements' text

Given the following html, how to get a list of tuple (TIME, COMMENT, OOXX) by XPath? I think I need to make use of XPath Axes but not sure how to use that. Furthermore, the OOXX seems not to belong to any tags!
<div class="contents">
<p></p>
<div class="meta">TIME</div>OOXX
<div class="comment">COMMENT</div>
<p></p>
<div class="meta">TIME</div>OOXX
<div class="comment">COMMENT</div>
<p></p>
<div class="meta">TIME</div>OOXX
<div class="comment">COMMENT</div>
<p></p>
<div class="meta">TIME</div>OOXX
<div class="comment">COMMENT</div>
<p></p>
</div>
How you'll want to deal with multiple such tuples in the input XML will depend on your requirements and the facilities of the context of the XPath evaluation.
However, here's how to get the first TIME:
/div/div[#class="meta"][1]/text()
Here's how to get the first COMMENT:
/div/div[#class="comment"][1]/text()
And here's how to get the first OOXX:
/div/div[#class="meta"][1]/following-sibling::text()[1]

Resources