XPath - Locate node using its flattened descendant text - xpath

I got this html :
<tr>
<td>
Some
<strong>
text
</strong>
<em>
and more
</em>
</td>
...
</tr>
I need to locate my td element with this text Some text and more. I know that I can get this text with this XPath expression :
//td//text()
But I can not find a solution to locate td element. I try this :
//td[//text()='Some text and more']
but I get errors. Do you know a working XPath expression for that ?

Firstly, XPath uses forward slashes, never backslashes.
Secondly, I believe this may be the XPath you need:
//td[normalize-space(.) = 'Some text and more']
Could you give that a try?

Related

Xpath simplification: extract text of self and child node

Having this HTML-snippet
<td class="info">self-text
<br>
<b>child-text</b>
</td>
I would like to extract self-text and child-text.
So far i am using this regex:
.//td[contains(#class, 'info')]/text() | .//td[contains(#class, 'info')]/b/text()
Is there any simpler way to do this?
You can use the following XPath expression which will return all non-empty text nodes anywhere within the outer td element :
.//td[contains(#class, 'info')]//text()[normalize-space()]

XPath get only first Parent of nested HTML

I am newbie in XPath. Can someone explain how to resolve this problem:
<table>
<tr>
<td>
<table>
<tr>
<td>
<table>
<tr>
<td>Label</td>
<td>value</td>
</tr>
</table>
</td>
</tr>
</table>
</td>
</tr>
</table>
I try to get <tr> which contains Label value, but it does not work for me,
Here is my code :
//td[contains(.,'Label')]/ancestor::tr[1]
Desired result:
<tr>
<td>Label</td>
<td>value</td>
</tr>
Can someone help me ?
This expression matches the tr that you want:
//tr[contains(td/text(), 'Label')]
Like yours, this starts by scanning all tr elements in the document, but this version uses just a single predicate. The td/text() limits the test to actual text nodes which are grandchildren of the row. If you just used td, then all of the td's descendant text nodes would be collected and concatenated, and the outer tr would match.
UPDATE: Also, for what it's worth, the reason your expression isn't working is that the ancestor axis returns elements in document order, not "outward" from the point of the context node. This is something I've run into myself, as it is somewhat unintuitive. To make your approach work, you would need to say
//td[contains(.,'Label')]/ancestor::tr[last()]
instead of
//td[contains(.,'Label')]/ancestor::tr[1]
I had the same issue, except that the text 'Label' was sometimes in a nested span, or even further nested in the td. For example:
<td><span>Label</span></td>
The previous answer only finds 'Label' if it is in a text element that is a direct child of the td. This issue is a bit harder because we need to search for a td that contains the text 'Label' in any of its children. Since the tds are nested, all tds qualify as having a descendant that contains the text 'Label'. So, the only way I found to overcome this is to add a check that makes sure that the td we select does not contain a td with the search text.
//td[contains(., 'Label') and not(.//td[contains(., 'Label')])]/ancestor::tr[1]
This says give me all of the tds that have a decedent text containing 'Label', but exclude all tds that contain a td that has a decedent text containing 'Label' (nesting ancestors). This returns the child most td that contains the text. Then you can go back to the tr that contains this td using ancestor.
Also, if you just want the lowest table that contains text use this:
//table[contains(., 'Label') and not(.//table[contains(., 'Label')])]
or you can select the tr directly:
//tr[contains(., 'Label') and not(.//tr[contains(., 'Label')])]
This seems like a common problem, but I didn't see a solution anywhere. So, I decided to post to this old unanswered question in hopes that it helps somebody.

Selenium IDE with XPath to identify cell in table based on other column

Please take a look at the snippet of html below:
<tr class="clickable">
<td id="7b8ee8f9-b66f-4fba-83c1-4cf2827130b5" class="clickable">
<a class="editLink" href="#">Single</a>
</td>
<td class="clickable">£14.00</td>
</tr>
I'm trying to assert the value of td[2] when td[1] contains "Single". I've tried assorted variants of:
//td[2][(contains(text(),'£14.00'))]/../td[1][(contains(text(),'Single'))]
I've used similar notation elsewhere successfully - but to no avail here... I think it's down to td[1] having the nested element, but not sure.
Can someone enlighten as to what I'm getting wrong? :)
Cheers!
What about:
//tr[contains(td[1], "Single")]/td[2]
First select the <tr> containing the <td> matching the text, and then select td[2].
Then,
contains(//tr[contains(td[1], "Single")]/td[2], "£14.00")
should return True.
Or, closer to the expression you tried, you could test if this matches:
//tr[contains(td[1], "Single")]/td[2][contains(., "£14.00")]
See #JensErat's answer to find xth td with td contains in same tr xpath python .
Why not make it simple on yourself, do the if statement in your code. Psuedocode:
Select the top level tr.
Find first td within tr, check to see if it contains Single.
If it does, assert that it contains £14.00
Alternatively, you could just get the text of the top level tr and perform the checks on that text.

xpath nearest element to a given element

I am having trouble returning an element using xpath.
I need to get the text from the 2nd TD from a large table.
<tr>
<td>
<label for="PropertyA">Some text here </label>
</td>
<td> TEXT!! </td>
</tr>
I'm able to find the label element, but then I'm having trouble selecting the sibling TD to return the text.
This is how I select the label:
"//label[#for='PropertyA']"
thanks
You are looking for the axes following-sibling. It searches in the siblings in the same parent - there it is tr. If the tds aren't in the same tr then they aren't found. If you want to it then you can use axes following.
//td[label[#for='PropertyA']]/following-sibling::td[1]
From the label element, it should be:
//label[#for='PropertyA']/following::td[1]
And then use the DOM method from the hosting language to get the string value.
Or select the text node (something I do not recommend) with:
//label[#for='PropertyA']/following::td[1]/text()
Or if there's going to be just this one only node, then you could use the string() function:
string(//label[#for='PropertyA']/following::td[1])
You can also select from the common ancestor tr like:
//tr[td/label/#for='PropertyA']/td[2]
Getting ANY following element:
//td[label[#for='PropertyA']]/following-sibling::*

nokogiri: why is this an invalid xpath?

//br/preceding-sibling::normalize-space(text())
i am getting invalid xpath expression with nokogiri
normalize-space is a function. You can't use it there.
You need a node-set.
maybe you mean
//br/preceding-sibling::*
or you could use normalize-space in a predicate, inside square brackets. Think of the predicate as a filter or selector on the node-set. So you can do this:
//br/preceding-sibling::*[normalize-space()='Fred']
In English that translates to "all elements preceding <br> in the document, and for which the (normalized) text is 'Fred' ". In this document:
<html>
<p>
<h2>Fred</h2>
<br/>
</p>
<table>
<tr>
<td>
<br/>
</td>
</tr>
</table>
</html>
...the xpath expression selects the <h2> node.
I figured this out with the free XpathVisualizer tool available on codeplex.

Resources