Xpath help - select a node, then child nodes - xpath

I'm stumped on why and how to do this query.
My html structure is like this (tables nested inside tables):
<root>
<table>
</table>
<table>
<tr>
<td>
<table>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
</table>
</td>
</tr>
</table>
</root>
If I start out my xpath like:
var tables = blah.SelectNodes("//table");
which returns me the 3 parent tables, then I want to select the td's from the 2nd tr like this:
var td = tables[2].SelectNodes("//tr[2]/td");
But, when I do this, it goes back to the parent/root, the "blah" level. Why is this, and how can I keep filtering my search results down?
Note: The example xml structure may not directly match the queries written, just trying to give a general idea...

Just keep extending the XPath
This one returns the <tr> items (four of them) of the second table:
/table/tr/td/table/tr
This one returns the second <tr> item:
/table/tr/td/table/tr[2]
Your best bet, though, is to give individual id attributes to each table, so that you can find it directly using that attribute.
Using something like this:
<root>
<table id="1">
</table>
<table id="2">
<tr>
<td>
<table id="3">
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
</table>
</td>
</tr>
</table>
</root>
You can get the items in the innermost table with:
//table[#id="3"]
You can get an individual <td> item from that innermost table with:
//table/tr/td/table/tr[2]/td[1]
Assigning an id attribute makes it a little easier (note missing /tr/td items after the first table):
//table[#id="3"]/tr[2]/td[1]

Related

Protractor expect not exist in page

I have html page like this:
<table>
<thead>
<tr>
<td>title</td>
<td>desc</td>
<td>status</td>
</tr>
</thead>
<tbody>
<tr>
<td><label>lorem1</label></td>
<td><label>desc1 lorem</label></td>
<td><label>active</label></td>
<td> Delete </td>
</tr>
<tr>
<td><label>lorem2</label></td>
<td><label>desc2 lorem</label></td>
<td><label>active</label></td>
<td> Delete </td>
</tr>
<tr>
<td><label>lorem3</label></td>
<td><label>desc3 lorem</label></td>
<td><label>deactive</label></td>
<td> Delete </td>
</tr>
</tbody>
</table>
Now I delete record lorem2 from above list (with click on delete link) and after that I want to check lorem2 that deleted shouldn't exist or contain in page.
I write this code but it's not correct:
expect(element(by.css("table")).getText()).not.toBe('lorem2');
You will delete the lorem2 by a locator may be xpath
below for deleting
//tr/td//label[contains(text(),"lorem2")]/following::td/a
below for checking if exist after deletion
//tr/td//label[contains(text(),"lorem2")]
you should parameterize xpath (i.e) the text Lorem2 for other text.
expect(element(by.xpath('//tr/td//label[contains(text(),"lorem2")]
')).isPresent()).toBe(false);

I want to grab the nested tag using XPath in Scrapy

Currently, I am solving some problem using Scrapy and XPath, where I am required to grab the nested tag. Assume the condition like this
<table>
<tbody>
<tr>
<td>
<table>
<tbody>
<tr><td></td><td></td></tr>
<tr><td></td><td></td></tr>
<tr><td></td><td></td></tr>
</tbody>
</table>
</td>
</tr>
<tr>
<td>
<table>
<tbody>
<tr><td></td><td></td></tr>
<tr><td></td><td></td></tr>
<tr><td></td><td></td></tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
I only want to grab or select the nested tr (<tr><td></td><td></td></tr>). How I should write the XPath for this.
To get all tr elements that have td children but no table grandchildren, use the XPath expression //tr[td][not(td/table)].
//tr/td[2]/..
We select second td in tr and then just level up to select our tr element.

Nokogiri ruby: Iterate over table rows with no class name

I want to iterate over each row of a table.
This is the relevant source code showing 6 table rows in total.
3 of them have no class name and 3 others do, the ... represent some attributes.
<tbody>
<tr> … </tr>
<tr class="even"> … </tr>
<tr> … </tr>
<tr class="even"> … </tr>
<tr> … </tr>
<tr class="even"> … </tr>
</tbody>
Assuming that doc is a Nokogiri::HTML::Document the following code generates only 3 tr elements instead of 6. It only returns the tr elements having the class="even".
doc.css('#main_result table tbody tr').each do |tr|
p tr
end
How can I now get an array of all tr elements, making it able to iterate over them?
This actual HTML can be found on the following link:
http://www.motogp.com/en/Results+Statistics/1949/TT/500cc/RAC
I don't really know how to paste the source code nicely... sorry
The HTML in that page is malformed, and is missing some <tr> tags, it actually looks something like this:
<tbody>
<td></td>
...
</tr>
<tr class="even">
<td></td>
...
</tr>
<td></td>
...
</tr>
<tr class="even">
<td></td>
...
</tr>
<td></td>
...
</tr>
<tr class="even">
<td></td>
...
</tr>
</tbody>
Note how only the tr tags with class="even" are present, the others are missing. Nokogiri therefore only sees three rows when parsing the page.
One possible solution to this could be to use Nokogumbo, which adds Google’s Gumbo HTML5 parser to Nokogiri, and better handles and corrects malformed HTML like this:
require 'nokogumbo' # install the gem first
doc = Nokogiri.HTML5(the_page)
puts doc.css('#main_result table tbody tr').size
# should now be 6 rather than 3

XPath - Get a list of nodes which has 3 children with specific tags

What is the xpath to use if I want to get the nodes that have a certain number of child nodes of a tag type?
<table>
<tr>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<p></p>
</tr>
<tr>
<td></td>
<p></p>
</tr>
</table>
For example, in the markup above, I want to get <tr> tags that have 3 <td> children. The xpath should return the 1st and 3rd <tr>.
You could try a condition based on the count statement, for example:
/table/tr[count(td)=3]

tfooter doesn't validate for xhtml?

I had my webpage validated for xhtml transitional till I added this table (see below). Since then it doesn't validate and says "
document type does not allow element "tfoot" here <tfoot>
The element named above was found in a context where it is not allowed. This could mean that you have incorrectly nested elements -- such as a "style" element in the "body" section instead of inside "head" -- or two elements that overlap (which is not allowed).
One common cause for this error is the use of XHTML syntax in HTML documents. Due to HTML's rules of implicitly closed elements, this error can create cascading effects. For instance, using XHTML's "self-closing" tags for "meta" and "link" in the "head" section of a HTML document may cause the parser to infer the end of the "head" section and the beginning of the "body" section (where "link" and "meta" are not allowed; hence the reported error)."
Any ideas as what is happening? I checked for any opened and not closed tags but did not find any so I don't know what else is wrong.
<table>
<caption>
My first table, Anna
</caption>
<thead>
<tr>
<th>
June
</th>
<th>
July
</th>
<th>
August
</th>
</tr>
</thead>
<tbody>
<tr>
<td>
Data 1
</td>
<td>
Data 2
</td>
<td>
Data 3
</td>
<td>
Data 4
</td>
</tr>
<tr>
<td>
Data a
</td>
<td>
Date b
</td>
<td>
Data c
</td>
<td>
Data d
</td>
</tr>
<tfoot>
<tr>
<td>
Result1
</td>
</tr>
</tfoot>
</tbody>
</table>
You've got the <tfoot> at the end of the table. It should be between the <thead> and the <tbody>. It will appear at the bottom, but it's coded at the top. One of the original ideas is that as a large table loaded, the heading and footer would be visible quickly, with the rest filling in (esp. useful if the body was scrollable between them). It hasn't quite worked out like that in practice, but it does make more sense if you know that.
In the DTD it lists:
<!ELEMENT table (caption?, (col*|colgroup*), thead?, tfoot?, (tbody+|tr+))>
That is, optional caption, then zero-or-more col or colgroup, then optional thead, then optional tfoot, then at least one tbody or tr.
UPDATE: Note that HTML 5 now allows one to put the <tfoot> at the end of the table, instead of before the first <tbody> (or the first <tr> that isn't in a <thead>, <tfoot> or <tbody> and hence in a single implicit <tbody>). As such the code in the question would now be considered valid. The older approach is also still valid, and probably advisable.
The tfoot element should be outside of the tbody element, like this:
<table>
<caption>
My first table, Anna
</caption>
<thead>
<tr>
<th>
June
</th>
<th>
July
</th>
<th>
August
</th>
</tr>
</thead>
<tfoot>
<tr>
<td>
Result1
</td>
</tr>
</tfoot>
<tbody>
<tr>
<td>
Data 1
</td>
<td>
Data 2
</td>
<td>
Data 3
</td>
<td>
Data 4
</td>
</tr>
<tr>
<td>
Data a
</td>
<td>
Date b
</td>
<td>
Data c
</td>
<td>
Data d
</td>
</tr>
</tbody>
Here is a small example of the correct nesting for those who need it.
<table>
<caption></caption>
<thead>
<tr>
<th></th>
</tr>
</thead>
<tfoot>
<tr>
<td></td>
</tr>
</tfoot>
<tbody>
<tr>
<td></td>
</tr>
</tbody>
</table>

Resources