How to get td text value of nested table by XPath - xpath

I have this table inside another table inside another table and so on. And then I want to get the text value of the td element with a specific class.
<tr>
<td width="5%"></td>
<td class="wintxt">The XML ....<br/><br/>Number: xyz</td>
</tr>
I need to get the text content "The XML ....Number: xyz"
I tried using:
List<?> submissionString = resultOfsubmissionPage.getByXPath("//tr[#class=\"wintxt\"]/td/text()");
...and many other variations but I always get a zero element List. Anyone has a clue?

There is mistakes with your provided xpath you are searching text() in that row means tr which has class attribute but as your provided HTML only one td has class attribute. So try as below :-
List<?> submissionString = resultOfsubmissionPage.getByXPath("//tr/td[#class='wintxt']/text()");
Hope it helps..:)

Related

How do you select two different tags via xpath, both at different levels, when one of them is optional

I have a situation where the data is a mix of these format on the same page. In other words, some rows will show up as:
some lengthy XPATH_X uptill here:
<td/>
<td>
I Need this element td
</td>
<td/>
<td/>
<td/>
<td/>
and a few other rows will show in this format:
the same lengthy XPATH_X uptill here:
<td/>
<td>
<span>
I Need this element span
</span>
</td>
<td/>
<td/>
<td/>
<td/>
Please note that there are no differentiating attributes for each of the td tags. I need to select the second row (td) in both the cases.
I'm trying to catch both of the elements using the following xpath:
XPATH_X/*[self::td[position()=2] or self::td[position()=2]/span]
I tried this out on the page but for some reason it doesn't select anything.
Can someone please help me out with this? I've spent more than 2 hours on this already.
You should try XPATH_X/td[2]//text() to retrieve the text whether it's at the root of the td or in a child tag
You can test it here ; in this test I retrieve three results :
the text inside a span inside a td
the text at the root of a td
both the texts at the root of a td and inside the enclosed span (if this doesn't work for you and the text of the td should be retrieved only if there's no span, use XPATH_X/td[position()=2 and not(./span)]/text() | XPATH_X/td[2]/span/text() instead)
To retrieve the elements containing text nodes rather than the text node themselves, you can use the following :
XPATH_X/td[2]//self::node()[text()]

XPath get only first Parent of nested HTML

I am newbie in XPath. Can someone explain how to resolve this problem:
<table>
<tr>
<td>
<table>
<tr>
<td>
<table>
<tr>
<td>Label</td>
<td>value</td>
</tr>
</table>
</td>
</tr>
</table>
</td>
</tr>
</table>
I try to get <tr> which contains Label value, but it does not work for me,
Here is my code :
//td[contains(.,'Label')]/ancestor::tr[1]
Desired result:
<tr>
<td>Label</td>
<td>value</td>
</tr>
Can someone help me ?
This expression matches the tr that you want:
//tr[contains(td/text(), 'Label')]
Like yours, this starts by scanning all tr elements in the document, but this version uses just a single predicate. The td/text() limits the test to actual text nodes which are grandchildren of the row. If you just used td, then all of the td's descendant text nodes would be collected and concatenated, and the outer tr would match.
UPDATE: Also, for what it's worth, the reason your expression isn't working is that the ancestor axis returns elements in document order, not "outward" from the point of the context node. This is something I've run into myself, as it is somewhat unintuitive. To make your approach work, you would need to say
//td[contains(.,'Label')]/ancestor::tr[last()]
instead of
//td[contains(.,'Label')]/ancestor::tr[1]
I had the same issue, except that the text 'Label' was sometimes in a nested span, or even further nested in the td. For example:
<td><span>Label</span></td>
The previous answer only finds 'Label' if it is in a text element that is a direct child of the td. This issue is a bit harder because we need to search for a td that contains the text 'Label' in any of its children. Since the tds are nested, all tds qualify as having a descendant that contains the text 'Label'. So, the only way I found to overcome this is to add a check that makes sure that the td we select does not contain a td with the search text.
//td[contains(., 'Label') and not(.//td[contains(., 'Label')])]/ancestor::tr[1]
This says give me all of the tds that have a decedent text containing 'Label', but exclude all tds that contain a td that has a decedent text containing 'Label' (nesting ancestors). This returns the child most td that contains the text. Then you can go back to the tr that contains this td using ancestor.
Also, if you just want the lowest table that contains text use this:
//table[contains(., 'Label') and not(.//table[contains(., 'Label')])]
or you can select the tr directly:
//tr[contains(., 'Label') and not(.//tr[contains(., 'Label')])]
This seems like a common problem, but I didn't see a solution anywhere. So, I decided to post to this old unanswered question in hopes that it helps somebody.

Selenium IDE with XPath to identify cell in table based on other column

Please take a look at the snippet of html below:
<tr class="clickable">
<td id="7b8ee8f9-b66f-4fba-83c1-4cf2827130b5" class="clickable">
<a class="editLink" href="#">Single</a>
</td>
<td class="clickable">£14.00</td>
</tr>
I'm trying to assert the value of td[2] when td[1] contains "Single". I've tried assorted variants of:
//td[2][(contains(text(),'£14.00'))]/../td[1][(contains(text(),'Single'))]
I've used similar notation elsewhere successfully - but to no avail here... I think it's down to td[1] having the nested element, but not sure.
Can someone enlighten as to what I'm getting wrong? :)
Cheers!
What about:
//tr[contains(td[1], "Single")]/td[2]
First select the <tr> containing the <td> matching the text, and then select td[2].
Then,
contains(//tr[contains(td[1], "Single")]/td[2], "£14.00")
should return True.
Or, closer to the expression you tried, you could test if this matches:
//tr[contains(td[1], "Single")]/td[2][contains(., "£14.00")]
See #JensErat's answer to find xth td with td contains in same tr xpath python .
Why not make it simple on yourself, do the if statement in your code. Psuedocode:
Select the top level tr.
Find first td within tr, check to see if it contains Single.
If it does, assert that it contains £14.00
Alternatively, you could just get the text of the top level tr and perform the checks on that text.

Parse a HTML table using Ruby, Nokogiri omitting the column headers

I have trouble parsing a HTML table using Nokogiri and Ruby. My HTML table structure looks like this
<table>
<tbody>
<tr>
<td>Firstname</td>
<td>Lastname</td>
<td>Middle</td>
</tr>
<tr>
<td>ding</td>
<td>dong</td>
<td>ling</td>
</tr>
....
....
.... {more tr's and td's with similar data exists.}
....
....
....
....
....
</tbody>
</table>
In the above HTML table I would like to entirely remove the first and corresponding elements, so remove Firstname, Lastname and Middle i.e., I want to start stripping the text only from the second . So this way I get only the contents of the table from the second or tr[2] and no column headers.
Can someone please provide me a code as to how to do this.
Thanks.
require 'rubygems'
require 'nokogiri'
doc = Nokogiri::HTML(x)
rows = doc.xpath('//table/tbody/tr[position() > 1]')
# OR
rows = doc.xpath("//table/tbody/tr")
header = rows.shift
After you've run either one of the above 2 snippets, rows will contain every <tr>...</tr> after the first one. For example puts rows.to_xml prints the following:
<tr><td>ding</td>
<td>dong</td>
<td>ling</td>
</tr>
To get the inner text, removing all the html tags, run puts rows.text
ding
dong
ling
To get the inner text of the td tags only, run rows.xpath('td').map {|td| td.text }
["ding", "dong", "ling"]
Alternatively:
table.css('tr')[1..-1]
or to strip out the text starting at row 2:
table.css('tr')[1..-1].map{|tr| tr.css('td').map &:text}
Since Nokogiri does support :has CSS pseudo-class you can get heading row with
#doc.at_css('table#table_id').css('tr:has(th)')
and since it does supports :not CSS pseudo-class as well, you can get other rows with
#doc.at_css('table#table_id').css('tr:not(:has(th))')
respectively. Depending on your preferences you might like to avoid negation and just use css('tr:has(td)').

xpath nearest element to a given element

I am having trouble returning an element using xpath.
I need to get the text from the 2nd TD from a large table.
<tr>
<td>
<label for="PropertyA">Some text here </label>
</td>
<td> TEXT!! </td>
</tr>
I'm able to find the label element, but then I'm having trouble selecting the sibling TD to return the text.
This is how I select the label:
"//label[#for='PropertyA']"
thanks
You are looking for the axes following-sibling. It searches in the siblings in the same parent - there it is tr. If the tds aren't in the same tr then they aren't found. If you want to it then you can use axes following.
//td[label[#for='PropertyA']]/following-sibling::td[1]
From the label element, it should be:
//label[#for='PropertyA']/following::td[1]
And then use the DOM method from the hosting language to get the string value.
Or select the text node (something I do not recommend) with:
//label[#for='PropertyA']/following::td[1]/text()
Or if there's going to be just this one only node, then you could use the string() function:
string(//label[#for='PropertyA']/following::td[1])
You can also select from the common ancestor tr like:
//tr[td/label/#for='PropertyA']/td[2]
Getting ANY following element:
//td[label[#for='PropertyA']]/following-sibling::*

Resources