Scrapy: How do I select the next `td` in this `tr`? - xpath

I want to select the next sibling of a td tag in a tr element.
The tr element is this:
<tr>
<td>Created On:</td>
<td>06/28/2018 06:32 </td>
</tr>
My Scrapy code looks like this: response.xpath("//text()[contains(.,'Created On:')]/following-sibling::td"). But that gives me an empty list [].
How do I select the next td?

Try this XPath expression:
//text()[contains(.,'Created On:')]/../following-sibling::td
You were trying to use the following-sibling axis from the wrong context node. Going back one level fixes this problem.
An alternative is matching the td element in the first place like in this expression:
//td[contains(text(),'Created On:')]/following-sibling::td

Related

Xpath simplification: extract text of self and child node

Having this HTML-snippet
<td class="info">self-text
<br>
<b>child-text</b>
</td>
I would like to extract self-text and child-text.
So far i am using this regex:
.//td[contains(#class, 'info')]/text() | .//td[contains(#class, 'info')]/b/text()
Is there any simpler way to do this?
You can use the following XPath expression which will return all non-empty text nodes anywhere within the outer td element :
.//td[contains(#class, 'info')]//text()[normalize-space()]

Xpath to select next parent of the current node

if tr contains class="productnamecolor colors_productname" i want to select next tr which contains the price details. so i use :
.//a[#class="productnamecolor colors_productname"]/parent::node()/following-sibling::tr
But didn't work. What is wrong with this expression?
HTML :
<tr>
<td valign="top" width="100%">
Trouser Suspenders
</td>
</tr>
thanx in advance.
The parent of your <a> element is a td element, and the td element doesn't have a following-sibling - certainly not a following sibling that is a tr. If you want the next row in the table, use
.//a[#class="..."]/ancestor::tr[1]/following-sibling::tr[1]
or
.//tr[descendant::a/#class="..."]/following-sibling::tr[1]
If you want to select just next tr after <a class="productnamecolor colors_productname"> simply use following two ways :-
using following axis :
(.//a[#class="productnamecolor colors_productname"]/following::tr)[1]
using preceding axis :
(.//tr[preceding::a[#class="productnamecolor colors_productname"])[1]
Hope it helps...:)

XPath get only first Parent of nested HTML

I am newbie in XPath. Can someone explain how to resolve this problem:
<table>
<tr>
<td>
<table>
<tr>
<td>
<table>
<tr>
<td>Label</td>
<td>value</td>
</tr>
</table>
</td>
</tr>
</table>
</td>
</tr>
</table>
I try to get <tr> which contains Label value, but it does not work for me,
Here is my code :
//td[contains(.,'Label')]/ancestor::tr[1]
Desired result:
<tr>
<td>Label</td>
<td>value</td>
</tr>
Can someone help me ?
This expression matches the tr that you want:
//tr[contains(td/text(), 'Label')]
Like yours, this starts by scanning all tr elements in the document, but this version uses just a single predicate. The td/text() limits the test to actual text nodes which are grandchildren of the row. If you just used td, then all of the td's descendant text nodes would be collected and concatenated, and the outer tr would match.
UPDATE: Also, for what it's worth, the reason your expression isn't working is that the ancestor axis returns elements in document order, not "outward" from the point of the context node. This is something I've run into myself, as it is somewhat unintuitive. To make your approach work, you would need to say
//td[contains(.,'Label')]/ancestor::tr[last()]
instead of
//td[contains(.,'Label')]/ancestor::tr[1]
I had the same issue, except that the text 'Label' was sometimes in a nested span, or even further nested in the td. For example:
<td><span>Label</span></td>
The previous answer only finds 'Label' if it is in a text element that is a direct child of the td. This issue is a bit harder because we need to search for a td that contains the text 'Label' in any of its children. Since the tds are nested, all tds qualify as having a descendant that contains the text 'Label'. So, the only way I found to overcome this is to add a check that makes sure that the td we select does not contain a td with the search text.
//td[contains(., 'Label') and not(.//td[contains(., 'Label')])]/ancestor::tr[1]
This says give me all of the tds that have a decedent text containing 'Label', but exclude all tds that contain a td that has a decedent text containing 'Label' (nesting ancestors). This returns the child most td that contains the text. Then you can go back to the tr that contains this td using ancestor.
Also, if you just want the lowest table that contains text use this:
//table[contains(., 'Label') and not(.//table[contains(., 'Label')])]
or you can select the tr directly:
//tr[contains(., 'Label') and not(.//tr[contains(., 'Label')])]
This seems like a common problem, but I didn't see a solution anywhere. So, I decided to post to this old unanswered question in hopes that it helps somebody.

Selenium IDE with XPath to identify cell in table based on other column

Please take a look at the snippet of html below:
<tr class="clickable">
<td id="7b8ee8f9-b66f-4fba-83c1-4cf2827130b5" class="clickable">
<a class="editLink" href="#">Single</a>
</td>
<td class="clickable">£14.00</td>
</tr>
I'm trying to assert the value of td[2] when td[1] contains "Single". I've tried assorted variants of:
//td[2][(contains(text(),'£14.00'))]/../td[1][(contains(text(),'Single'))]
I've used similar notation elsewhere successfully - but to no avail here... I think it's down to td[1] having the nested element, but not sure.
Can someone enlighten as to what I'm getting wrong? :)
Cheers!
What about:
//tr[contains(td[1], "Single")]/td[2]
First select the <tr> containing the <td> matching the text, and then select td[2].
Then,
contains(//tr[contains(td[1], "Single")]/td[2], "£14.00")
should return True.
Or, closer to the expression you tried, you could test if this matches:
//tr[contains(td[1], "Single")]/td[2][contains(., "£14.00")]
See #JensErat's answer to find xth td with td contains in same tr xpath python .
Why not make it simple on yourself, do the if statement in your code. Psuedocode:
Select the top level tr.
Find first td within tr, check to see if it contains Single.
If it does, assert that it contains £14.00
Alternatively, you could just get the text of the top level tr and perform the checks on that text.

xpath nearest element to a given element

I am having trouble returning an element using xpath.
I need to get the text from the 2nd TD from a large table.
<tr>
<td>
<label for="PropertyA">Some text here </label>
</td>
<td> TEXT!! </td>
</tr>
I'm able to find the label element, but then I'm having trouble selecting the sibling TD to return the text.
This is how I select the label:
"//label[#for='PropertyA']"
thanks
You are looking for the axes following-sibling. It searches in the siblings in the same parent - there it is tr. If the tds aren't in the same tr then they aren't found. If you want to it then you can use axes following.
//td[label[#for='PropertyA']]/following-sibling::td[1]
From the label element, it should be:
//label[#for='PropertyA']/following::td[1]
And then use the DOM method from the hosting language to get the string value.
Or select the text node (something I do not recommend) with:
//label[#for='PropertyA']/following::td[1]/text()
Or if there's going to be just this one only node, then you could use the string() function:
string(//label[#for='PropertyA']/following::td[1])
You can also select from the common ancestor tr like:
//tr[td/label/#for='PropertyA']/td[2]
Getting ANY following element:
//td[label[#for='PropertyA']]/following-sibling::*

Resources