There is a table which rows are with different class names: first_row, odd_row, even_row and subjectField.
HTML:
<table class="color_table">
<thead></thead>
<tbody>
<tr class="first_row"></tr>
<td colspan="1" rowspan="1"></td>
<td colspan="1" rowspan="1"></td>
<td colspan="1" rowspan="1"></td>
<td colspan="1" rowspan="1">
**63**
</td>
<tr class="subjectField" style="display:none"></tr>
<tr class="odd_row"></tr>
<tr class="subjectField" style="display:none"></tr>
<tr class="even_row"></tr>
<tr class="subjectField" style="display:none"></tr>
</tbody>
Additional HTML:
<tbody>
<tr class="first_row"></tr>
<tr class="subjectField" style="display:none"></tr>
<tr class="odd_row"></tr>
<tr class="subjectField" style="display:none"></tr>
<tr>
<td class="separator" rowspan="1" colspan="10"></td>
</tr>
<tr class="even_row"></tr>
<tr class="subjectField" style="display:none"></tr>
</tbody>
I need to get information from all rows except row which class name is 'subjectField'
My code:
table = #f.div(:id => 'household').table(:class => 'color_table')
table.tbody.trs(:class => 'first_row', :class => 'odd_row', :class =>'even_row').each do
age = tr.td(:index => 3).text
puts age
end
This code takes all rows, subjectFields rows too.
Does anybody know how to make it work with the rows I need only?
To find everything except a class, use a regex with a negative lookahead:
table.trs(:class => /^(?!subjectField)/).size
If you want to get the text for each of these rows:
puts table.trs(:class => /^(?!subjectField)/).collect(&:text)
If you want to get the text of the fourth column for each cell:
puts table.trs(:class => /^(?!subjectField)/).collect do |row|
row.td(:index => 3).text
end
it is really simple:
table = #f.div(:id => 'household').table(:class => 'color_table')
table_element.count #it will display the count of all rows corresponding to specified table.
table_elements(:class => 'first row').index #will return the array count [0]
table_elements(:class => 'even_row').index #will return the array count [2]
not a problem actually
table_elements(:class => 'first row').text # if you need to take the text from the row with corresponding class
or
table_elements[0].text
Have you tried something like this
...(:xpath, "//table/tbody/tr[#class !='subjectField']")...
Related
I want to count all table rows which class name is subjectField and to stop counting if some row's class name is separator.
HTML:
<div id="household" class="block">
<div class="block_title"> … </div>
<table class="color_table">
<thead> … </thead>
<tbody>
<tr class="first_row"> … </tr>
<tr class="subjectField" style="display:none"> … </tr>
<tr class="odd_row"> … </tr>
<tr class="subjectField" style="display:none"> … </tr>
<tr class="even_row"> … </tr>
<tr class="subjectField" style="display:none"> … </tr>
<tr>
<td class="separator" rowspan="1" colspan="10">
</td>
</tr>
<tr class="even_row"> … </tr>
<tr class="subjectField" style="display:none"> … </tr>
</tbody>
</table>
</div>
My code:
def countRows
#subjects = 0
#f.div(:id => 'household').table(:class => 'color_table').tbody.trs.find do |tr|
if tr.td(:class => "separator").exists? == true
break
elsif tr(:class => "subjectField").exists? == true
#subjects = #subjects + 1
end
end
return subjects
end
This does not work for me. It says that tr on line 6 is undefined method.
Does anybody know how to solve this problem?
Issue 1 - Checking row class
You are passing tr parameters, which is why Ruby thinks it is a method. In this case, tr is actually an element that is selected for the current iteration of the loop. To check the current tr's class attribute, the elsif statement should be:
elsif tr.class_name == "subjectField"
Issue 2 - Iterating rows
Note that you will also have a problem with the line:
#f.div(:id => 'household').table(:class => 'color_table').tbody.trs.find
The find method will iterate through the trs until the block evaluates as true. Since the block will always evaluate to true or break, you will only get 0 or 1 subjects. Use each instead:
#f.div(:id => 'household').table(:class => 'color_table').tbody.trs.each
Putting it together
Putting the above fixes together, the method could be written as:
def countRows
#subjects = 0
table = #f.div(:id => 'household').table(:class => 'color_table')
table.tbody.trs.each do |tr|
break if tr.td(:class => "separator").exists? == true
#subjects += 1 if tr.class_name == "subjectField"
end
#subjects
end
This is how i would do that (untested code):
def count_rows
#f.div(:id => 'household')
.table(:class => 'color_table')
.trs
.take_while {|row| !row.td(:class => "separator").exists? }
.select {|row| row.class_name =~ /subjectField/}
.size
end
I need to get all rows with class named 'odd_row' or 'even_row'.
HTML:
<tbody>
<tr class="first_row"> … </tr>
<tr class="subjectField" style="display:none"> … </tr>
<tr class="odd_row"> … </tr>
<tr class="subjectField" style="display:none"> … </tr>
<tr class="even_row"> … </tr>
<tr class="subjectField" style="display:none"> … </tr>
</tbody>
I tried this:
#b.table(:class => 'color_table').tbody.trs(:class => ('odd_row' || 'even_row').size
But it returns 1.
Does anybody know how to solve this problem?
If you want to do an "or" of classes, you need to use a regular expression. In regular expressions, "or" is done using a single pipe character "|". The class locator you would want is:
:class => /odd_row|even_row/
Therefore, to count all odd and even rows, you want:
#b.table(:class => 'color_table')
.tbody
.trs(:class => /odd_row|even_row/)
.size
You are calling .size at the end. This gets the size of the array as an integer. You can try without it.
I'd like to parse an HTML file extracting relevant data to use in my research. Here's a piece of the HTML:
<td class="color_line1" valign="center"><a class="linkpadrao" href="javascript:Direciona('5453*SERRA#TALHADA');">Serra Talhada</a></td>
<td class="color_line" valign="center" align="center">9</td>
<td class="color_line" valign="center" align="center">2,973</td>
<td class="color_line" valign="center" align="center">0,016</td>
<td class="color_line" valign="center" align="center">2,939</td>
<td class="color_line" valign="center" align="center">3,000</td>
<td class="color_line" valign="center" align="center">0,572</td>
<td class="color_line" valign="center" align="center">2,401</td>
<td class="color_line" valign="center" align="center">0,024</td>
<td class="color_line" valign="center" align="center">2,378</td>
<td class="color_line" valign="center" align="center">2,426</td>
</tr>
Being more specific, I'd like to get the "Serra Talhada" (as a city name), and also all of the numbers below the city name (it's the max, min and average price of gas).
I tried this so far:
require 'rubygems'
require 'mechanize'
require 'nokogiri'
require 'open-uri'
url = "http://www.anp.gov.br/preco/prc/Resumo_Por_Estado_Municipio.asp"
agent = Mechanize.new
parameters = {'selSemana' => '737*De+28%2F07%2F2013+a+03%2F08%2F2013',
'desc_semana' => 'de+28%2F07%2F2013+a+03%2F08%2F2013',
'cod_Semana' => '737',
'tipo' => '1',
'Cod_Combustivel' => 'undefined',
'selEstado' => 'PE*PERNAMBUCO',
'selCombustivel' => '487*Gasolina',
}
municipio = []
page = agent.post(url, parameters)
extrair = page.parser
extrair.css('.linkpadrao').each do |posto|
# Municipios
municipio << posto.text
end
I can't figure out how to get the numbers as they have the same HTML structure.
Any thoughts?!
Since you need to find the cells with respect to the city link, you should find their common ancestor - in this case their tr.
Using xpath, you can locate a specific cell by its text:
# This is the table that contains all of the city data
data_table = extrair.at_css('.table_padrao')
# This is the specific row that contains the specified city
row = data_table.xpath('//tr[td/a[#class="linkpadrao" and text()="Serra Talhada"]]')
# This is the data in the specific row
data = row.css(".color_line").map{|e| e.text }
#=> ["9", "2,973", "0,016", "2,939", "3,000", "0,572", "2,401", "0,024", "2,378", "2,426"]
You can get the numbers following each posto with:
posto.parent.search('~ td').map &:text
Let's say I've got an ill formed html page:
<table>
<thead>
<th class="what_I_need">Super sweet text<th>
</thead>
<tr>
<td>
I also need this
</td>
<td>
and this (all td's in this and subsequent tr's)
</td>
</tr>
<tr>
...all td's here too
</tr>
<tr>
...all td's here too
</tr>
</table>
On BeautifulSoup, we were able to get the <th> and then call findNext("td"). Nokogiri has the next_element call, but that might not return what I want (in this case, it would return the tr element).
Is there a way to filter the next_element call of Nokogiri? e.g. next_element("td")?
EDIT
For clarification, I'll be looking at many sites, most of them ill formed in different ways.
For instance, the next site might be:
<table>
<th class="what_I_need">Super sweet text<th>
<tr>
<td>
I also need this
</td>
<td>
and this (all td's in this and subsequent tr's)
</td>
</tr>
<tr>
...all td's here too
</tr>
<tr>
...all td's here too
</tr>
</table>
I can't assume any structure other than there will be trs below the item that has the class what_I_need
First, note that your closing th tag is malformed: <th>. It should be </th>. Fixing that helps.
One way to do it is to use XPath to navigate to it once you've found the th node:
require 'nokogiri'
html = '
<table>
<thead>
<th class="what_I_need">Super sweet text<th>
</thead>
<tr>
<td>
I also need this
</td>
<tr>
</table>
'
doc = Nokogiri::HTML(html)
th = doc.at('th.what_I_need')
th.text # => "Super sweet text"
td = th.at('../../tr/td')
td.text # => "\n I also need this\n "
This is taking advantage of Nokogiri's ability to use either CSS accessors or XPath, and to do it pretty transparently.
Once you have the <th> node, you could also navigate using some of Node's methods:
th.parent.next_element.at('td').text # => "\n I also need this\n "
One more way to go about it, is to start at the top of the table and look down:
table = doc.at('table')
th = table.at('th')
th.text # => "Super sweet text"
td = table.at('td')
td.text # => "\n I also need this\n "
If you need to access all <td> tags within a table you can iterate over them easily:
table.search('td').each do |td|
# do something with the td...
puts td.text
end
If you want the contents of all <td> by their containing <tr> iterate over the rows then the cells:
table.search('tr').each do |tr|
cells = tr.search('td').map(&:text)
# do something with all the cells
end
Does anyone know a quick way to count the number of entries in a table using Ruby, Cucumber & Selenium?
The table is fairly basic, I want to count the number of rows:
<table id="product_container">
<tr>
<th>Product Name</th>
<th>Qty In Stock</th>
</tr>
<tr>
<td>...</td>
<td>...</td>
</tr>
</table>
You can use:
page.should have_css "#product_container tr", :count => number_of_rows.to_i
The following step definition should work with Capybara.
Then /^I should have (\d+) table rows$/ do |number_of_rows|
actual_number = page.all('#product_container tr').size
actual_order.should == number_of_rows
end
Usage:
Then I should have 10 table rows
The page.all documentation.
I always use getXpathCount() (Selenium method) in such situation and it works fine :)
In PHP:
$rowsCount = $this->getXpathCount("//table[#id='product_container']/tr");
And if you don't want to count header rows, you should edit the table as:
<table id="product_container">
<thead>
<tr>
<th>Product Name</th>
<th>Qty In Stock</th>
</tr>
</thead>
<tbody>
<tr>
<td>...</td>
<td>...</td>
</tr>
</tbody>
</table>
Then you can get the products count:
$rowsCount = $this->getXpathCount("//table[#id='product_container']/tbody/tr");