How to select sequential elements in xpath? - xpath

Suppose I have this XML:
<body>
<div id="1"></div>
<a id = "1"></a>
<a id = "2"></a>
<a id = "3"></a>
<div id="2"></div>
<a id = "4"></a>
<a id = "5"></a>
<a id = "6"></a>
</body>
Given the element //div[id='1'] how do I select "it's" <a> elements (Ids from 1 to 3) but exclude <a> elements with id 4 or higher, since they appear after <div id='2'>

This is one possible XPath :
//div[#id='1']/following-sibling::a[preceding-sibling::div[1][#id='1']]
The XPath basically select a after div[#id='1'] where nearest preceding sibling div element is the div[#id='1']. Or maybe the following simpler XPath is enough :
//a[preceding-sibling::div[1][#id='1']]

Related

How to select this element with Scrapy XPATH?

Only requirement: it needs to refer to the thread-navigation class, because that page has many other pagination elements
<section id="thread-navigation" class="group">
<div class="float-left">
<div class="pagination talign-mleft">
<span class="pages">Pages (6):</span>
<span class="pagination_current">1</span>
2
3
4
5
6
Next ยป //<--- this one
</div>
</div>
</section>
I was trying something like this:
r.xpath('//*[#class="thread-navigation" and contains (., "Next")]').get()
But it always returns None
Thank you
You are not referring to an #class attribute, but rather to an #id attribute with the value thread-navigation. So try this XPath-1.0 expression:
r.xpath('//a[ancestor::*/#id="thread-navigation" and contains (text(), "Next")]/#href').get()
Its result is
I want this text?page=2
This xpath:
'//section[#id="thread-navigation"]//a/#href'

Select element based on cousin value

Lets say I have this html (ignore tags names):
<div>
<card>
<h2>1</h2>
</card>
<footer>
<p>text 1</p>
</footer>
</div>
<div>
<card>
<h2>2</h2>
</card>
<footer>
<p>text 2</p>
</footer>
</div>
<div>
<card>
<h2>3</h2>
</card>
<footer>
<p>text 2</p>
</footer>
</div>
and I want to select p tag that have an h2 value of 2 (I will select p with text 2)
if I use this expression //h2[text()="2"]/../following::footer/p I will get 2 p tags.
How do I select only the p tag with cousin h2 value of 2 ?
EDIT: Robbie Averill answer was the first to work, but you should check other answers they are very good too.
You can navigate from the h2 matched up to the div that contains the element you want, then target footer/p elements from there:
//h2[text()="2"]/../../footer/p
Try to use below XPath to select required element:
//card[h2="2"]/following-sibling::footer/p
This XPath,
//div[card/h2="2"]/footer/p
will select footer/p cousins of card/h2 elements with string values of 2.

How to select the first occurrence in each element by XPath?

In the following html tags:
<div>
<div>
<h3>
<a href='http://Ali.org'></a>
</h3>
<div>
<p>
<a href='http://Mohammad.org'></a>
</p>
</div>
</div>
<div>
<h4>
<a href='http://Ali.org'></a>
</h4>
<p>
<a href='http://Mohammad.org'></a>
</p>
</div>
</div>
I want to select two 'a' tags 'http://Ali.org' & 'http://YaALi.org'. By the following, I can:
//div//a[not(parent::*[not(following-sibling::*)])]
But what about a simpler XPath?
By the following, all of 'a' tags will be selected since they are all the first child of their parents:
//div/div//a[1]
Or by the following, just the first 'a' tag will be selected:
(//div//a)[1]
I want to select 'a' tags that are the first in the 'a' tags of div elements...
// in the middle of a path is an abbreviation for descendant-or-self::node(), so if you do
//div/div//a[1]
this effectively means
//div/div/descendant-or-self::node()/a[1]
This picks the first child a of all descendant nodes. What you want is:
//div/div/descendant::a[1]
which will pick the first descendant a.

JSoup select numbers

<div class="sResMain">
<b>
dogukan1905
</b>
<img src="http://eu.ipstatic.net/images/male.gif" width="11" height="11" class="sResSex">
20
<br>
<div class="sResMainTxt">
<div class="sResTxtField">I study at aircraft technology...</div></div></div>
I want to select number(20) between img and br tag. However I couldn't.
From what you posted, the text that you are trying to parse belongs to <div class="sResMain">. Moreover this is the only text that <div class="sResMain"> has. There is a method in Jsoup that will return the text that belongs (immediate textnode child) to a node. Try ownText() of Element.
Document doc = Jsoup.parse(htmlStr);
Elements elements = doc.select(".sResMain");
for(Element e : elements) {
String text = e.ownText();
System.out.println(text);
}

Check for preceding nodes starting from a specific point in xml

I'm trying to create an xpath to find an element which doesn't have any 'p', 'li', or 'span' preceding elements under a common parent. For example I have this structure:
<a>
<div>
<div/>
<div>
<div>
<div>
</p>
</div>
<img/>
</div>
<div>
<ci/>
</div>
</div>
</div>
</a>
The node I'm interested in is the <img> element. So far I have this xpath:
count(/a/div[1]/div[position() = last()]//img[(count(preceding::*[name() = 'p' or name() = 'li' or name() = 'span']) = 0)]) > 0
I don't care if any of the unwanted elements are under /a/div[1]/div[1]/ only under /a/div[1]/div[2]. With that said, preceding won't work because it'll look under /a/div[1]/div[1] which I don't care for. The 'p' element in the above example can be in any number of divs.
EDIT:
I added the div containing the element <ci/>.
I was able to get this to work using the following:
count(/a/div[1]/div[position() = last()]//img[(count(preceding::*[(name() = 'p' or name() = 'li' or name() = 'span']) and ancestor::div[parent::div[parent::a] and descendant::ci]]) = 0)]) > 0

Resources