I need to get all rows with class named 'odd_row' or 'even_row'.
HTML:
<tbody>
<tr class="first_row"> … </tr>
<tr class="subjectField" style="display:none"> … </tr>
<tr class="odd_row"> … </tr>
<tr class="subjectField" style="display:none"> … </tr>
<tr class="even_row"> … </tr>
<tr class="subjectField" style="display:none"> … </tr>
</tbody>
I tried this:
#b.table(:class => 'color_table').tbody.trs(:class => ('odd_row' || 'even_row').size
But it returns 1.
Does anybody know how to solve this problem?
If you want to do an "or" of classes, you need to use a regular expression. In regular expressions, "or" is done using a single pipe character "|". The class locator you would want is:
:class => /odd_row|even_row/
Therefore, to count all odd and even rows, you want:
#b.table(:class => 'color_table')
.tbody
.trs(:class => /odd_row|even_row/)
.size
You are calling .size at the end. This gets the size of the array as an integer. You can try without it.
Related
I have the follow html file:
<table class="pd-table">
<caption> Tech </caption>
<tbody>
<tr data-group="1">
<td> Electrical </td>
<td> Design </td>
<tr data-group="1">
<td> Output </td>
<td> Function </td>
<tr data-group="7">
<td> EMC </td>
<table>
<tbody>
<tr>
<td> EN 6547 ESD </td>
<td> EN 8901 ESD </td>
<tr data-group="8">
<td> Weight [8] </td>
<td> 27.7 </td>
I can isolate EN 6547 ESD and EN 8901 ESD with the follow xpath:
//table[#class="pd-table"]//tbody//tr//td/table//tr//td/text()').getall()
Any other way is always welcome :)
Another data which I would like to get is to get all the rest of the data without the previous isolated.
Is there any way to do it? :)
Looks like table tag is not closed properly in data-group-7...
Anyway in such cases you can stick to text content of the cell using contains() or text()="some exact text"
response.xpath('//td[contains(text(), "EMC")]').css('td~table tbody td::text').extract()
Your used Xpath uses a lot of unwanted double slash.
See meaning of double slash in Xpath.
The less you use double slash, the better it will perform.
So just use single slash like this:
//table[#class="pd-table"]/tbody/tr/td/table/tr/td/text()
Another way of selecting td's that have two ancestor::table
//td[count(ancestor::table)=2]/text()
And that leads to the answer of your second question:
//td[count(ancestor::table)=1]/text()
An other possibility would just be:
//table[#class="pd-table"]/tbody/tr/td/text()
Or(assuming the second tabel does not have tr's with #data-group):
//tr[#data-group]/td/text()
So you see there are many Xpath's lead to Rome ;-).
I need to find the whole text according last word in the string. I have something like this:
<table>
<tr>
<td style='white-space:nowrap;'>
<a href=''>test</a>
</td>
<td>any text</td>
<td>text text texttofind</td>
<td>Not Available</td>
<td class='aui-lozenge aui-lozenge-default'>text</td>
</tr>
<tr>
<td style='white-space:nowrap;'>
<a href=''>test</a>
</td>
<td>any text</td>
<td>text text texttofind2</td>
<td>Not Available</td>
<td class='aui-lozenge aui-lozenge-default'>text</td>
</tr>
<tr>
<td style='white-space:nowrap;'>
<a href=''>test</a>
</td>
<td>any text</td>
<td>text text texttofind3</td>
<td>Not Available</td>
<td class='aui-lozenge aui-lozenge-default'>text</td>
</tr>
</table>
I need to find whole text vallue according last word texttofind
<td>text text texttofind</td>
I cant use contains, because it will find multiple values. I need something like ends-with but I am using xpath 1.0.
I tried something like this, but I am not sure what is wrong because it is not working
//tr[substring(., string-length(#td)
- string-length('texttofind') + 1) = 'texttofind']
or maybe it would be better to use matches?
You're almost there; try changing your xpath expression to
//tr//td[substring(., string-length(.)
- string-length('texttofind') + 1) = 'texttofind']
and see if it works.
There is a table which rows are with different class names: first_row, odd_row, even_row and subjectField.
HTML:
<table class="color_table">
<thead></thead>
<tbody>
<tr class="first_row"></tr>
<td colspan="1" rowspan="1"></td>
<td colspan="1" rowspan="1"></td>
<td colspan="1" rowspan="1"></td>
<td colspan="1" rowspan="1">
**63**
</td>
<tr class="subjectField" style="display:none"></tr>
<tr class="odd_row"></tr>
<tr class="subjectField" style="display:none"></tr>
<tr class="even_row"></tr>
<tr class="subjectField" style="display:none"></tr>
</tbody>
Additional HTML:
<tbody>
<tr class="first_row"></tr>
<tr class="subjectField" style="display:none"></tr>
<tr class="odd_row"></tr>
<tr class="subjectField" style="display:none"></tr>
<tr>
<td class="separator" rowspan="1" colspan="10"></td>
</tr>
<tr class="even_row"></tr>
<tr class="subjectField" style="display:none"></tr>
</tbody>
I need to get information from all rows except row which class name is 'subjectField'
My code:
table = #f.div(:id => 'household').table(:class => 'color_table')
table.tbody.trs(:class => 'first_row', :class => 'odd_row', :class =>'even_row').each do
age = tr.td(:index => 3).text
puts age
end
This code takes all rows, subjectFields rows too.
Does anybody know how to make it work with the rows I need only?
To find everything except a class, use a regex with a negative lookahead:
table.trs(:class => /^(?!subjectField)/).size
If you want to get the text for each of these rows:
puts table.trs(:class => /^(?!subjectField)/).collect(&:text)
If you want to get the text of the fourth column for each cell:
puts table.trs(:class => /^(?!subjectField)/).collect do |row|
row.td(:index => 3).text
end
it is really simple:
table = #f.div(:id => 'household').table(:class => 'color_table')
table_element.count #it will display the count of all rows corresponding to specified table.
table_elements(:class => 'first row').index #will return the array count [0]
table_elements(:class => 'even_row').index #will return the array count [2]
not a problem actually
table_elements(:class => 'first row').text # if you need to take the text from the row with corresponding class
or
table_elements[0].text
Have you tried something like this
...(:xpath, "//table/tbody/tr[#class !='subjectField']")...
Let's say I've got an ill formed html page:
<table>
<thead>
<th class="what_I_need">Super sweet text<th>
</thead>
<tr>
<td>
I also need this
</td>
<td>
and this (all td's in this and subsequent tr's)
</td>
</tr>
<tr>
...all td's here too
</tr>
<tr>
...all td's here too
</tr>
</table>
On BeautifulSoup, we were able to get the <th> and then call findNext("td"). Nokogiri has the next_element call, but that might not return what I want (in this case, it would return the tr element).
Is there a way to filter the next_element call of Nokogiri? e.g. next_element("td")?
EDIT
For clarification, I'll be looking at many sites, most of them ill formed in different ways.
For instance, the next site might be:
<table>
<th class="what_I_need">Super sweet text<th>
<tr>
<td>
I also need this
</td>
<td>
and this (all td's in this and subsequent tr's)
</td>
</tr>
<tr>
...all td's here too
</tr>
<tr>
...all td's here too
</tr>
</table>
I can't assume any structure other than there will be trs below the item that has the class what_I_need
First, note that your closing th tag is malformed: <th>. It should be </th>. Fixing that helps.
One way to do it is to use XPath to navigate to it once you've found the th node:
require 'nokogiri'
html = '
<table>
<thead>
<th class="what_I_need">Super sweet text<th>
</thead>
<tr>
<td>
I also need this
</td>
<tr>
</table>
'
doc = Nokogiri::HTML(html)
th = doc.at('th.what_I_need')
th.text # => "Super sweet text"
td = th.at('../../tr/td')
td.text # => "\n I also need this\n "
This is taking advantage of Nokogiri's ability to use either CSS accessors or XPath, and to do it pretty transparently.
Once you have the <th> node, you could also navigate using some of Node's methods:
th.parent.next_element.at('td').text # => "\n I also need this\n "
One more way to go about it, is to start at the top of the table and look down:
table = doc.at('table')
th = table.at('th')
th.text # => "Super sweet text"
td = table.at('td')
td.text # => "\n I also need this\n "
If you need to access all <td> tags within a table you can iterate over them easily:
table.search('td').each do |td|
# do something with the td...
puts td.text
end
If you want the contents of all <td> by their containing <tr> iterate over the rows then the cells:
table.search('tr').each do |tr|
cells = tr.search('td').map(&:text)
# do something with all the cells
end
how use {cycle} with three values?
Whats wrong with this code:
<table>
<tr bgcolor="{cycle values='#aaaaaa,#bbbbbb'}">
<td bgcolor="{cycle values='#1112233,#334455'}">value</td>
<td bgcolor="{cycle values='#998811,#334466'}">value1</td>
</tr>
</table>
I think you need to give them unique names:
{cycle name='color1' values='#aaaaaa,#bbbbbb'}
{cycle name='color2' values='#1112233,#334455'}
{cycle name='color3' values='#998811,#334466'}