Specify only one leaf node with xpath from a given result - xpath

given
<a>
<div><span></span></div>
<div></div>
<div><span></span></div>
<div><span></span></div>
</a>
I want to only select n-th span
say, only 2nd span
You CANNOT do this and declare victory:
//a/div[1]/span Selects first span //a/div[2]/span selects second span tadaaa I win.
no
Imagine the structure of the HTML is dynamic.
I tried for example
//a//span[2] but this doesn't work
E.g. after I want to //a/div/span get all leaflet spans, then select the second one for example

Will this
(//span)[2]
or this
(//a//span)[2]
work for you?

Related

XPath: Search for HTML element within HTML element

With XPath, how would you search for elements that only contain another specific element? For example, what expression would result in getting all <p> tags that contain <strong> elements within them?
<p>This is some text that <strong>contains another HTML element</strong></p>
In XPath you use square brackets to filter. It is called the predicate. See I.e. this tutorial .
To select all p’s with a element strong you use
//p[strong]
If you want to find all p’s with only the element strong and no other elements, you add
//p[strong][count(*)=count(strong)]
The * stands for any element.
If, as in your example , you only interested in p’s with the strong element being the last child node you use
//p[strong[not(following-sibling::node())]]
Predicates are the way to go.
what expression would result in getting all p tags that contain elements within them?
If you only want to select p elements with direct strong children, you can use p[strong], if you're looking for any descendants, use p[descendant::strong]. In both cases the context node has to be at the level of the p elements.

Xpath syntax to grab listed elements based on ID above containing word

I want to grab li element text and links from a list. The challenge is, the span sometimes has different class names BUT always has the word 'notable' featured in them, example:
<span class="mw-headline" id="Notable_alumni">Notable alumni</span>
OR
<span class="mw-headline" id="Notable_former_pupils">Notable former pupils</span>
So I need to use "contains" somehow, so I am along these lines:
//li[contains(span/#id,'Notable')]/span/#id/following-sibling::text()
But can't get this right.
Another issue is these blocks of text and headers are not in the same containing div either. Added an image to simplify and you can see the code.
Assuming that the span with the #id is always under the h2 (you could make more generic by using * instead of h2 if that doesn't hold true). If you anchor to that containing element, then look for the first ul that is a following-sibling, you can select the text() from all of it's li elements:
//h2[span[contains(#id,'Movie Title')]]/following-sibling::ul[1]/li//text()

xpath - element containing exact text, but minus sibling elements?

Without using index specificity. I'm trying to target an element with exact text, but which also ignores the text of sibling elements. For example, target the span with Save below.
<span>Click and save money!</span>
<span>
<i>Icon</i>
Save
</span>
So something like //span[contains(text(), 'Save')] would grab any span with "Save" in it.
Try the xpath : //span[text()[normalize-space(.)='Save']]
It looks for span elements which have text nodes whose space-trimmed value is exactly Save

How to get all inner texts of tag by XPATH?

Is it possible to get all inner texts of some tag by XPath?
For example, in one case, there could be text: root.xpath('//h2[text()="Description"]/following-sibling::p/span/span/text())
In another case, it could be in first span: root.xpath('//h2[text()="Description"]/following-sibling::p/span/text())
So my question is, whether is there some way how to get all texts in one tag but not only on first level.
Something like root.xpath('//h2[text()="Description"]/following-sibling::p/*/text())
How about using // axis ?
//h2[text()="Description"]/following-sibling::p/span//text()
This should return all text nodes, anywhere within the span

How to find the first link on the page containing this text?

If I have two links:
<div class="abc">
<a id="def1" href="/definitely">Definitely 1</a>
<a id="def2" href="/definitely">Definitely 2</a>
</div>
And I want to identify the first (def1), I thought this would work:
var linkXPath = "//div[#class='abc']//a[contains(#href,'def')][1]";
But it doesn't seem to.
What am I doing wrong?
It is a FAQ why
//someName[1]
doesn't select the first element of //someName.
Looking at the definition of the // abbreviation, one would realize that in fact
//someName[1]
is equivalent to:
/descendant-or-self::node()/someName[1]
and this selects every someName element that is the first someName child of its parent node.
Thus, if there are two or more someName elements that are the first someName child of their parent, all of them are selected.
Solution:
Instead of
//someName[1]
use:
(//someName)[1]
So, in your particular case use:
(//div[#class='abc']//a[contains(#href,'def')]) [1]
Apart from this, none of the above expressions would select any node, if in the actual XML document a default namespace was specified. Selecting nodes in a document with a default namespace is the biggest XPath FAQ. To find the solution just search for "default namespace" in this SO tag and anywhere on the Internet.
Your XPath expression selects the first a element (with the right href) of every div (that has the right class) that contains one. So if there were two divs that matched, each with multiple a elements that matched, you'd get a reault set containing two elements -- the first a in the first div, and the first a in the second div.
To select just the first element of the entire result set, use parentheses like so:
(//div[#class='abc']//a[contains(#href,'def')])[1]
Other than that, your expression works fine for me (tested here).

Resources