If I have two links:
<div class="abc">
<a id="def1" href="/definitely">Definitely 1</a>
<a id="def2" href="/definitely">Definitely 2</a>
</div>
And I want to identify the first (def1), I thought this would work:
var linkXPath = "//div[#class='abc']//a[contains(#href,'def')][1]";
But it doesn't seem to.
What am I doing wrong?
It is a FAQ why
//someName[1]
doesn't select the first element of //someName.
Looking at the definition of the // abbreviation, one would realize that in fact
//someName[1]
is equivalent to:
/descendant-or-self::node()/someName[1]
and this selects every someName element that is the first someName child of its parent node.
Thus, if there are two or more someName elements that are the first someName child of their parent, all of them are selected.
Solution:
Instead of
//someName[1]
use:
(//someName)[1]
So, in your particular case use:
(//div[#class='abc']//a[contains(#href,'def')]) [1]
Apart from this, none of the above expressions would select any node, if in the actual XML document a default namespace was specified. Selecting nodes in a document with a default namespace is the biggest XPath FAQ. To find the solution just search for "default namespace" in this SO tag and anywhere on the Internet.
Your XPath expression selects the first a element (with the right href) of every div (that has the right class) that contains one. So if there were two divs that matched, each with multiple a elements that matched, you'd get a reault set containing two elements -- the first a in the first div, and the first a in the second div.
To select just the first element of the entire result set, use parentheses like so:
(//div[#class='abc']//a[contains(#href,'def')])[1]
Other than that, your expression works fine for me (tested here).
Related
With XPath, how would you search for elements that only contain another specific element? For example, what expression would result in getting all <p> tags that contain <strong> elements within them?
<p>This is some text that <strong>contains another HTML element</strong></p>
In XPath you use square brackets to filter. It is called the predicate. See I.e. this tutorial .
To select all p’s with a element strong you use
//p[strong]
If you want to find all p’s with only the element strong and no other elements, you add
//p[strong][count(*)=count(strong)]
The * stands for any element.
If, as in your example , you only interested in p’s with the strong element being the last child node you use
//p[strong[not(following-sibling::node())]]
Predicates are the way to go.
what expression would result in getting all p tags that contain elements within them?
If you only want to select p elements with direct strong children, you can use p[strong], if you're looking for any descendants, use p[descendant::strong]. In both cases the context node has to be at the level of the p elements.
I want to grab li element text and links from a list. The challenge is, the span sometimes has different class names BUT always has the word 'notable' featured in them, example:
<span class="mw-headline" id="Notable_alumni">Notable alumni</span>
OR
<span class="mw-headline" id="Notable_former_pupils">Notable former pupils</span>
So I need to use "contains" somehow, so I am along these lines:
//li[contains(span/#id,'Notable')]/span/#id/following-sibling::text()
But can't get this right.
Another issue is these blocks of text and headers are not in the same containing div either. Added an image to simplify and you can see the code.
Assuming that the span with the #id is always under the h2 (you could make more generic by using * instead of h2 if that doesn't hold true). If you anchor to that containing element, then look for the first ul that is a following-sibling, you can select the text() from all of it's li elements:
//h2[span[contains(#id,'Movie Title')]]/following-sibling::ul[1]/li//text()
I'm looking through HTML documents for the text: "Required". What I need to find is the element that holds the text. For example:
<p>... Required<p>
I would get to element name = p
However, it might not be in a <p> tag. It could be in any kind of tag, which is where this question differs from some of the other search text Stack Overflow questions.
Right now I'm using:
page.at(':contains("Required")')
but this only get me the full HTML element
The problem you have is the :contains pseudo class matches any element that has the searched for text anywhere in its descendants. You need to find the innermost element that contains such text. Since html is the ancestor of all elements, if the page contains the text anywhere then html will contain, and so that will be the first matching element.
I’m not sure you can achieve this with CSS, but you can use XPath like this:
page.at_xpath('//*[text()[contains(., "Required")]]')
This finds the first element node that has a text() node as a child that contains Required. When you have that node (if it exists) you can then call name on it to give the name of the element.
For CSS you can do:
page.at('[text()*="Required"]')
It's not real CSS though, or even a jQuery extra.
You should use CSS selectors:
page.css('p').text
Can anyone help me with this? I cannot grab the 'Blue Shoes' text from this div no matter what I try! Been over an hour now and still cannot work it out. Tried:
//div[#class='breadcrumbs']/text(
//div[#class='breadcrumbs']
//div[#class='breadcrumbs']/div
Nothing seems to work. Any help MUCH appreciated.
<div class="breadcrumbs">Home/Blue Shoes</div>
</div>
//div[#class='breadcrumbs']/text()
should give you what you need in this case - it will select the set of all text nodes that lie directly under the breadcrumbs div. if you want to specifically target the one at the end (e.g. if there's more than two levels of breadcrumb and there's another text node for, say, a slash between two a elements) then the slightly more specific
//div[#class='breadcrumbs']/text()[last()]
may work better.
If this doesn't work then there are two other possibilities I can think of. Firstly, the HTML DOM uses upper case for element names, and since XPath is case-sensitive you may find you need //DIV instead of //div. Or maybe there's a namespace issue - if your document has an xmlns="..." on the root element then that puts your div elements in a namespace, and unprefixed names in xpath refer to nodes in no namespace. To select namespaced nodes you have to bind a prefix to the corresponding namespace URI and then use the prefix in your expressions (//xhtml:div). Exactly how you go about mapping prefixes depends on what library/tool/language you're using to execute the xpath queries.
I have some HTML like this:
<div> Make </div>
And I want to match it based on the fact that the content of the node contains the text "Make".
Put another way "Make" is a substring of the div node's content and I want to make such a match on this node using XPath.
The obvious solution would be
//div[contains(., 'Make')]
but this will find all divs that contain the string "Make" anywhere within their content, so not only will it find the example you've given in the question but also any ancestor div of that one, or any divs where that substring is buried deep in a descendant element.
If you only want cases where that string is directly inside the div with no other intervening elements then you'd have to use the slightly more complex
//div[text()[contains(., 'Make')]]
This is subtly different from
//div[contains(text(), 'Make')]
which would look only in the first text node child of the div, so it would find <div>Make<br/>Break</div> but not <div>Break<br/>Make</div>
If you want to allow for intervening elements other than div, then try
//div[contains(., 'Make')][not(.//div[contains(., 'Make'])]
Seems like this is what you are looking for: //div[contains(text(),'Make')]
If this will not work you can try: //div[contains(.,'Make')]. This will find all divs, which contain 'Make' in any attribute.
To find that node anywhere in the document, you would need this:
//div[contains(text(), "Make")]