XPath expression to select self, preceding and following nodes - xpath

I'd like to select the following HTML in a document, based on the content of TARGET. I.e. if TARGET matches, select everything. However, I'm not sure where to go after: id('page')/x:div/span/a='TARGET' – How to use parent, child, and sibling expressions to get the containing div, the a preceding that div, and the two br tags following the div
<a></a>
<div>
<br />
<span>
<a>TARGET</a>
<a></a>
<span>
<span>
<a></a>
</span>
<a></a>
<span></span>
</span>
<span>
<a></a>
</span>
</span>
</div>
<br />
<br />

Use a single XPath like:
"//*[
(self::a and following-sibling::*[1][self::div and span/a='TRAGET']) or
(self::div and span/a='TARGET') or
(self::br and preceding-sibling::*[1][self::div and span/a='TARGET']) or
(self::br and preceding-sibling::*[2][self::div and span/a='TARGET'])
]"
Do note that your document is not well formed due to unclosed br tags. Moreover, I didn't include any namespace, which you can add if necessary.

Probably, you should first find all divs (not sure about conditions should be met):
//div[span[a[text()="TARGET"]]][preceding-sibling::*[1][name()="a"]][following-sibling::*[1][name()="br"]]
after that - all related elements for each div:
./preceding-sibling::a[1]
./following-sibling::br[1]
./following-sibling::br[2]

Related

How to select the first occurrence in each element by XPath?

In the following html tags:
<div>
<div>
<h3>
<a href='http://Ali.org'></a>
</h3>
<div>
<p>
<a href='http://Mohammad.org'></a>
</p>
</div>
</div>
<div>
<h4>
<a href='http://Ali.org'></a>
</h4>
<p>
<a href='http://Mohammad.org'></a>
</p>
</div>
</div>
I want to select two 'a' tags 'http://Ali.org' & 'http://YaALi.org'. By the following, I can:
//div//a[not(parent::*[not(following-sibling::*)])]
But what about a simpler XPath?
By the following, all of 'a' tags will be selected since they are all the first child of their parents:
//div/div//a[1]
Or by the following, just the first 'a' tag will be selected:
(//div//a)[1]
I want to select 'a' tags that are the first in the 'a' tags of div elements...
// in the middle of a path is an abbreviation for descendant-or-self::node(), so if you do
//div/div//a[1]
this effectively means
//div/div/descendant-or-self::node()/a[1]
This picks the first child a of all descendant nodes. What you want is:
//div/div/descendant::a[1]
which will pick the first descendant a.

How to use XPath extract text without Html tag?

<div id="info" class="">
<span>
<span class="pl"> author</span>:
<a class="" href="/search/author"Peter</a>
</span><br/>
<span class="pl">publisher:</span> god cor<br/>
<span class="pl">year:</span> 2011-6<br/>
<span class="pl">page:</span> 360<br/>
<span class="pl">price:</span> 39.50<br/>
From the above HTML tags, i want to extract those numbers with XPath.How can i do that?
Thanks.
The XPath for each number is (in order as shown above) :
//*[#id="info"]/a/text()[2] --> 2011-6
//*[#id="info"]/a/text()[3] -->360
//*[#id="info"]/a/text()[4] --> 39.5
You can know the XPath for any tag by just opening the html file in Chrome, right clicking on the view and choosing "inspect". When you find the tag you want, just right click on it and choose Copy-> Copy XPath.

Xpath: locate a node by multiple attributes of a parent node

Here is the code:
<li class="abc">
<div class="abc">
<input type="checkbox">
</div>
<div class="xyz">
<div class="headline">Mongo like candy</div>
<div>
</li>
<li class="abc">
<div class="abc">
<input type="checkbox">
</div>
<div class="xyz">
<div class="headline">Candygram for mongo</div>
<div>
</li>
Xpath challenge. I want locate the checkbox of the li which contains the headline "Mongo like candy" so I can select it using Selenium. In other words, how do you locate the checkbox from here:
li//div[#class='abc']//input[#type='checkbox']
but qualifying it with a different attribute within the same parent node:
li//div[#headline][contains(text(),"Mongo like candy")]
The basic idea is to qualify the final path with a predicate, i.e.
li[/*predicate here*/]//div[#class='abc']//input[#type='checkbox']
The predicate expresses the condition on the li that you want:
.//div[#class='headline' and contains(text(), "Mongo like candy")]
Putting them together yields:
li[.//div[#class='headline' and contains(text(), "Mongo like candy")]]//div[#class='abc']//input[#type='checkbox']
something like
li[div[#class='xyz']//div[#class='headline' and contains(text(),"Mongo like candy"))]]//input[#type='checkbox']
unless I messed up parentheses. (that is, you select not just li, but the proper li).
Even this works:
//li[1]/div[1]/input[#type='checkbox']
It may fail if more div tags are introduced in the page.

Xpath: robust path for a locator of an element with 1 sibling and one...cousin?

This is the code:
<li>
<a>
<h1>Quorn Stuk­jes</h1>
<p class="price">
</a>
<form>
<button type="submit">+</button>
</form>
</li>
I want to create a locator that finds the first <h1> that has an sibling element <p> with an attribute "price". Easy so far. But now I also want that <h1> to share its grandparent with a <button> class with the attribute type "submit".
What I created was the following:
//a/p[#class="price"]/preceding-sibling::p/preceding-sibling::h1
I'm wondering if this is the most sensible solution (it does work), or if there is something more elegant and robust.
(//*[form/button[#type = 'submit']]/*[p[#class = 'price']]/h1)[1] should do (assuming a submit button only makes sense in a form parent element).

xPath strange behaviour - selecting ALL elements even if [1] set

today I stumbled upon a very interesting case (at least for me). I am messing around with Selenium and xPath and tried to get some elements, but got a strange behaviour:
<div class="resultcontainer">
<div class="info">
<div class="title">
<a>
some text
</a>
</div>
</div>
</div>
<div class="resultcontainer">
<div class="info">
<div class="title">
<a>
some other text
</a>
</div>
</div>
</div>
<div class="resultcontainer">
<div class="info">
<div class="title">
<a>
some even unrelated text
</a>
</div>
</div>
</div>
This is my data.
When i run the following xPath query:
//div[#class="title"][1]/a
I get as a result ALL instead of only the first one. But if I query:
//div[#class="resultcontainer"][1]/div[#class="info"]/div[#class="title"]/a
I get only the first , not all.
Is there some divine reason behind that?
Best regards,
bisko
I think you want
(//div[#class="title"])[1]/a
This:
//div[#class="title"][1]/a
selects all (<a> elements that are children of) <div> elements that have a #class of 'title', that are the first children of their parents (in this context). Which means: it selects all of them.
The working XPath selects all <div> elements that have a #class of 'title' - and of those it takes the first one.
The predicates (the expressions in square brackets []) are applied to each element that matched the preceding location step (i.e. "//div") individually. To apply a predicate to a filtered set of nodes, you need to make the grouping clear with parentheses.
Consequently, this:
//div[1][#class="title"]/a
would select all <div> elements, take the first one, and then filter it down futher by checking the #class value. Also not what you want. ;-)

Resources