Nokogiri ruby: Iterate over table rows with no class name

Nokogiri ruby: Iterate over table rows with no class name - ruby

I want to iterate over each row of a table.
This is the relevant source code showing 6 table rows in total.
3 of them have no class name and 3 others do, the ... represent some attributes.
<tbody>
<tr> … </tr>
<tr class="even"> … </tr>
<tr> … </tr>
<tr class="even"> … </tr>
<tr> … </tr>
<tr class="even"> … </tr>
</tbody>
Assuming that doc is a Nokogiri::HTML::Document the following code generates only 3 tr elements instead of 6. It only returns the tr elements having the class="even".
doc.css('#main_result table tbody tr').each do |tr|
p tr
end
How can I now get an array of all tr elements, making it able to iterate over them?
This actual HTML can be found on the following link:
http://www.motogp.com/en/Results+Statistics/1949/TT/500cc/RAC
I don't really know how to paste the source code nicely... sorry

The HTML in that page is malformed, and is missing some <tr> tags, it actually looks something like this:
<tbody>
<td></td>
...
</tr>
<tr class="even">
<td></td>
...
</tr>
<td></td>
...
</tr>
<tr class="even">
<td></td>
...
</tr>
<td></td>
...
</tr>
<tr class="even">
<td></td>
...
</tr>
</tbody>
Note how only the tr tags with class="even" are present, the others are missing. Nokogiri therefore only sees three rows when parsing the page.
One possible solution to this could be to use Nokogumbo, which adds Google’s Gumbo HTML5 parser to Nokogiri, and better handles and corrects malformed HTML like this:
require 'nokogumbo' # install the gem first
doc = Nokogiri.HTML5(the_page)
puts doc.css('#main_result table tbody tr').size
# should now be 6 rather than 3

Related

I want to grab the nested tag using XPath in Scrapy

Currently, I am solving some problem using Scrapy and XPath, where I am required to grab the nested tag. Assume the condition like this
<table>
<tbody>
<tr>
<td>
<table>
<tbody>
<tr><td></td><td></td></tr>
<tr><td></td><td></td></tr>
<tr><td></td><td></td></tr>
</tbody>
</table>
</td>
</tr>
<tr>
<td>
<table>
<tbody>
<tr><td></td><td></td></tr>
<tr><td></td><td></td></tr>
<tr><td></td><td></td></tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
I only want to grab or select the nested tr (<tr><td></td><td></td></tr>). How I should write the XPath for this.

To get all tr elements that have td children but no table grandchildren, use the XPath expression //tr[td][not(td/table)].

//tr/td[2]/..
We select second td in tr and then just level up to select our tr element.

Xpath help - select a node, then child nodes

I'm stumped on why and how to do this query.
My html structure is like this (tables nested inside tables):
<root>
<table>
</table>
<table>
<tr>
<td>
<table>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
</table>
</td>
</tr>
</table>
</root>
If I start out my xpath like:
var tables = blah.SelectNodes("//table");
which returns me the 3 parent tables, then I want to select the td's from the 2nd tr like this:
var td = tables[2].SelectNodes("//tr[2]/td");
But, when I do this, it goes back to the parent/root, the "blah" level. Why is this, and how can I keep filtering my search results down?
Note: The example xml structure may not directly match the queries written, just trying to give a general idea...

Just keep extending the XPath
This one returns the <tr> items (four of them) of the second table:
/table/tr/td/table/tr
This one returns the second <tr> item:
/table/tr/td/table/tr[2]
Your best bet, though, is to give individual id attributes to each table, so that you can find it directly using that attribute.
Using something like this:
<root>
<table id="1">
</table>
<table id="2">
<tr>
<td>
<table id="3">
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
</table>
</td>
</tr>
</table>
</root>
You can get the items in the innermost table with:
//table[#id="3"]
You can get an individual <td> item from that innermost table with:
//table/tr/td/table/tr[2]/td[1]
Assigning an id attribute makes it a little easier (note missing /tr/td items after the first table):
//table[#id="3"]/tr[2]/td[1]

Combine th:each with templating element using Thymeleaf

I have a list of 'product' which I want to show as a list of row table using an html template.
The html template looks like:
<tr th:fragment="productTemplate">
<td th:text="${productName}">product name</td>
<td th:text="${productprice}>product price</td>
</tr>
Here is what I did:
<table>
<tr th:each="product : ${products}" th:substituteby="product :: productTemplate" th:with="productName=*{name}, productPrice=*{price}" />
</table>
If I use th:include, there will be tr nested to each tr
If I use th:substituteby, substitute has the priority on th:each
I cant find a way to replace my loop items by an other.
Somebody have a solution to do this?

I got it:
<table>
<tr th:each="product : ${products}" th:include="product :: productTemplate"
th:with="productName=${product.name}, productPrice=${product.price}"
th:remove="tag" />
</table>
And here, we can keep the template class on the tr element (that what I wanted)
<tbody th:fragment="productTemplate">
<tr class="my-class">
<td th:text="${productName}">product name</td>
<td th:text="${productPrice}">product price</td>
</tr>
</tbody>
here's the result:
<table>
<tr class="my-class">
<td>Lettuce</td>
<td>12.0</td>
</tr>
<tr class="my-class">
<td>Apricot</td>
<td>8.0</td>
</tr>
</table>
thanks to danielfernandez from the official thymeleaf forum

th:include is what you are looking for. The code below works for me. I prefer to put multiple fragments in one file so I've included that here.
<table>
<tr th:each="product : ${products}" th:include="/fragments/productFragment :: productRow" />
</table>
...
/fragments/productFragment.html
...
<tr th:fragment="productRow">
<td th:text="${product.productName}">product name</td>
<td th:text="${product.productPrice}">product price</td>
</tr>
...

Element 'tr' cannot be nested within element 'tr'

I have a partial view like this:
#model List<user>
#foreach (var user in Model)
{
<tr>
<td>#user.name</td>
<td>...</td>
</tr>
}
And get an error like this:
Validation (HTML5): Element 'tr' cannot be nested within element 'tr'.
It's annoying me more than it should, but I want to get rid of it. Installing Web Standards Update didn't help. Any ideas?
Edit
This is the main view:
<table>
<thead>
<tr>
<th>#i18n.name</th>
<th>...</th>
</tr>
</thead>
<tbody id="results">
#Html.Partial("list_rows", #Model.users)
</tbody>
</table>
This is the generated HTML:
<table>
<thead>
<tr>
<th>naam</th>
<th>...</th>
</tr>
</thead>
<tbody id="results">
<tr>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
</tr>
</tbody>
</table>
Edit Pulling the entire page through the W3C validator gives
This document was successfully checked as HTML5!

This error appears when you open a <tr> element before you loop through your model. So far the code you postet is correct and free of errors.

Just make sure that your code looks something like this:
<table>
#foreach (var user in Model)
{
<tr>
<td>#user.name</td>
<td>...</td>
</tr>
}
</table>
It seems like you already have an open tr tag in which you are trying to add more tr tags. If you already have tr tags in your table, just make sure they are all closed before the loop starts:
<tr>..</tr>

tfooter doesn't validate for xhtml?

I had my webpage validated for xhtml transitional till I added this table (see below). Since then it doesn't validate and says "
document type does not allow element "tfoot" here <tfoot>
The element named above was found in a context where it is not allowed. This could mean that you have incorrectly nested elements -- such as a "style" element in the "body" section instead of inside "head" -- or two elements that overlap (which is not allowed).
One common cause for this error is the use of XHTML syntax in HTML documents. Due to HTML's rules of implicitly closed elements, this error can create cascading effects. For instance, using XHTML's "self-closing" tags for "meta" and "link" in the "head" section of a HTML document may cause the parser to infer the end of the "head" section and the beginning of the "body" section (where "link" and "meta" are not allowed; hence the reported error)."
Any ideas as what is happening? I checked for any opened and not closed tags but did not find any so I don't know what else is wrong.
<table>
<caption>
My first table, Anna
</caption>
<thead>
<tr>
<th>
June
</th>
<th>
July
</th>
<th>
August
</th>
</tr>
</thead>
<tbody>
<tr>
<td>
Data 1
</td>
<td>
Data 2
</td>
<td>
Data 3
</td>
<td>
Data 4
</td>
</tr>
<tr>
<td>
Data a
</td>
<td>
Date b
</td>
<td>
Data c
</td>
<td>
Data d
</td>
</tr>
<tfoot>
<tr>
<td>
Result1
</td>
</tr>
</tfoot>
</tbody>
</table>

You've got the <tfoot> at the end of the table. It should be between the <thead> and the <tbody>. It will appear at the bottom, but it's coded at the top. One of the original ideas is that as a large table loaded, the heading and footer would be visible quickly, with the rest filling in (esp. useful if the body was scrollable between them). It hasn't quite worked out like that in practice, but it does make more sense if you know that.
In the DTD it lists:
<!ELEMENT table (caption?, (col*|colgroup*), thead?, tfoot?, (tbody+|tr+))>
That is, optional caption, then zero-or-more col or colgroup, then optional thead, then optional tfoot, then at least one tbody or tr.
UPDATE: Note that HTML 5 now allows one to put the <tfoot> at the end of the table, instead of before the first <tbody> (or the first <tr> that isn't in a <thead>, <tfoot> or <tbody> and hence in a single implicit <tbody>). As such the code in the question would now be considered valid. The older approach is also still valid, and probably advisable.

The tfoot element should be outside of the tbody element, like this:
<table>
<caption>
My first table, Anna
</caption>
<thead>
<tr>
<th>
June
</th>
<th>
July
</th>
<th>
August
</th>
</tr>
</thead>
<tfoot>
<tr>
<td>
Result1
</td>
</tr>
</tfoot>
<tbody>
<tr>
<td>
Data 1
</td>
<td>
Data 2
</td>
<td>
Data 3
</td>
<td>
Data 4
</td>
</tr>
<tr>
<td>
Data a
</td>
<td>
Date b
</td>
<td>
Data c
</td>
<td>
Data d
</td>
</tr>
</tbody>
Here is a small example of the correct nesting for those who need it.
<table>
<caption></caption>
<thead>
<tr>
<th></th>
</tr>
</thead>
<tfoot>
<tr>
<td></td>
</tr>
</tfoot>
<tbody>
<tr>
<td></td>
</tr>
</tbody>
</table>

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Nokogiri ruby: Iterate over table rows with no class name - ruby

Related

I want to grab the nested tag using XPath in Scrapy

Xpath help - select a node, then child nodes

Combine th:each with templating element using Thymeleaf

Element 'tr' cannot be nested within element 'tr'

tfooter doesn't validate for xhtml?

Categories

Resources