I want to count all table rows which class name is subjectField and to stop counting if some row's class name is separator.
HTML:
<div id="household" class="block">
<div class="block_title"> … </div>
<table class="color_table">
<thead> … </thead>
<tbody>
<tr class="first_row"> … </tr>
<tr class="subjectField" style="display:none"> … </tr>
<tr class="odd_row"> … </tr>
<tr class="subjectField" style="display:none"> … </tr>
<tr class="even_row"> … </tr>
<tr class="subjectField" style="display:none"> … </tr>
<tr>
<td class="separator" rowspan="1" colspan="10">
</td>
</tr>
<tr class="even_row"> … </tr>
<tr class="subjectField" style="display:none"> … </tr>
</tbody>
</table>
</div>
My code:
def countRows
#subjects = 0
#f.div(:id => 'household').table(:class => 'color_table').tbody.trs.find do |tr|
if tr.td(:class => "separator").exists? == true
break
elsif tr(:class => "subjectField").exists? == true
#subjects = #subjects + 1
end
end
return subjects
end
This does not work for me. It says that tr on line 6 is undefined method.
Does anybody know how to solve this problem?
Issue 1 - Checking row class
You are passing tr parameters, which is why Ruby thinks it is a method. In this case, tr is actually an element that is selected for the current iteration of the loop. To check the current tr's class attribute, the elsif statement should be:
elsif tr.class_name == "subjectField"
Issue 2 - Iterating rows
Note that you will also have a problem with the line:
#f.div(:id => 'household').table(:class => 'color_table').tbody.trs.find
The find method will iterate through the trs until the block evaluates as true. Since the block will always evaluate to true or break, you will only get 0 or 1 subjects. Use each instead:
#f.div(:id => 'household').table(:class => 'color_table').tbody.trs.each
Putting it together
Putting the above fixes together, the method could be written as:
def countRows
#subjects = 0
table = #f.div(:id => 'household').table(:class => 'color_table')
table.tbody.trs.each do |tr|
break if tr.td(:class => "separator").exists? == true
#subjects += 1 if tr.class_name == "subjectField"
end
#subjects
end
This is how i would do that (untested code):
def count_rows
#f.div(:id => 'household')
.table(:class => 'color_table')
.trs
.take_while {|row| !row.td(:class => "separator").exists? }
.select {|row| row.class_name =~ /subjectField/}
.size
end
Related
I've been doing calculations by hand when it comes to the remaining percentage of the US Presidential election votes in various states. With so many updates and states – this is getting tiring. So why not automate the process?
Here's what I'm looking at:
The problem is that the class names have been randomized. For example, here's the one I'm interested in:
<td class="jsx-3768461732 votes votes-row">2,450,186</td>
Playing around in irb, I tried to use a wildcard on "votes votes-row", since this only appears when I need it in the doc:
require 'nokogiri'
require 'open-uri'
doc = Nokogiri::HTML(open("https://www.politico.com/2020-election/results/georgia/"))
votes = doc.css("[td*='votes-row']")
...which yields no results (=> [])
What am I doing wrong and how to fix? I'm ok with xpath – I just want to make sure changes made elsewhere in the doc don't affect finding these elements.
There's probably a better way but...
require 'nokogiri'
require 'open-uri'
doc = Nokogiri::HTML(open("https://www.politico.com/2020-election/results/georgia/"))
votes = doc.css('tr[class*="candidate-row"]').map { |row| row.css('td').map { |cell| cell.content } }
biden_row = votes.find_index { |row| row[0] =~ /biden/i }
trump_row = votes.find_index { |row| row[0] =~ /trump/i }
biden_votes = votes[biden_row][1].split('%')[1]
trump_votes = votes[trump_row][1].split('%')[1]
Edit: from the HTML source the relevant table looks like:
<table class="jsx-1526769828 candidate-table">
<thead class="jsx-3554868417 table-head">
<tr class="jsx-3554868417">
<th class="table-header jsx-3554868417 candidate-name">
<h5 class="jsx-3554868417">Candidate</h5>
</th>
<th class="table-header jsx-3554868417 percent">
<h5 class="jsx-3554868417">Pct.</h5>
</th>
<th class="table-header jsx-3554868417 vote-bar"></th>
</tr>
</thead>
<tbody class="jsx-2085888330 table-head">
<tr class="jsx-2677388595 candidate-row">
<td class="jsx-3948343365 candidate-name name-row">
<div class="jsx-1912693590 name-only candidate-short-name">Biden</div>
<div class="jsx-3948343365 candidate-party-tag">
<div class="jsx-1420258095 party-label dem">dem</div>
</div>
<div class="jsx-3948343365 candidate-winner-check"></div>
</td>
<td class="jsx-3830922081 percent percent-row">
<div class="candidate-percent-only jsx-3830922081">49.4%</div>
<div class="candidate-votes-next-to-percent jsx-3830922081">2,450,193</div>
</td>
<td class="jsx-3458171655 vote-bar vote-bar-row">
<div style="width:49.4%" class="jsx-3458171655 bar dem"></div>
</td>
</tr>
<tr class="jsx-2677388595 candidate-row">
<td class="jsx-3948343365 candidate-name name-row">
<div class="jsx-1912693590 name-only candidate-short-name">Trump*</div>
<div class="jsx-3948343365 candidate-party-tag">
<div class="jsx-1420258095 party-label gop">gop</div>
</div>
<div class="jsx-3948343365 candidate-winner-check"></div>
</td>
<td class="jsx-3830922081 percent percent-row">
<div class="candidate-percent-only jsx-3830922081">49.4%</div>
<div class="candidate-votes-next-to-percent jsx-3830922081">2,448,635</div>
</td>
<td class="jsx-3458171655 vote-bar vote-bar-row">
<div style="width:49.4%" class="jsx-3458171655 bar gop"></div>
</td>
</tr>
</tbody>
</table>
So you could probably use the candidate-votes-next-to-percent to get this value. e.g.:
require 'nokogiri'
require 'open-uri'
doc = Nokogiri::HTML(open("https://www.politico.com/2020-election/results/georgia/"))
votes = doc.css('tr[class*="candidate-row"]').map do |row|
[
row.css('div[class*="candidate-short-name"]').first.content,
row.css('div[class*="candidate-votes-next-to-percent"]').first.content
]
end
# => [["Biden", "2,450,193"], ["Trump*", "2,448,635"]]
I'm scraping this page https://www.library.uq.edu.au/uqlsm/availablepcsembed.php?branch=Duhig and for each tr I am collecting and returning the level name and the number of computers available.
The problem is that it is being iterated over too many times. There are only 4 tr tags but the loop goes through 5 iterations. This causes an extra nil to be appended to the return array. Why is this?
Scraped Section:
<table class="chart">
<tr valign="middle">
<td class="left">Level 1</td>
<td class="middle"><div style="width:68%;"><strong>68%</strong></div></td>
<td class="right">23 Free of 34 PC's</td>
</tr>
<tr valign="middle">
<td class="left">Level 2</td>
<td class="middle"><div style="width:78%;"><strong>78%</strong></div></td>
<td class="right">83 Free of 107 PC's</td>
</tr>
<tr valign="middle">
<td class="left">Level 4</td>
<td class="middle"><div style="width:64%;"><strong>64%</strong></div></td>
<td class="right">9 Free of 14 PC's</td>
</tr>
<tr valign="middle">
<td class="left">Level 5</td>
<td class="middle"><div style="width:97%;"><strong>97%</strong></div></td>
<td class="right">28 Free of 29 PC's</td>
</tr>
</table>
Shortened Method:
def self.scrape_details_page(library_url)
details_page = Nokogiri::HTML(open(library_url))
library_name = details_page.css("h3")
details_page.css("table tr").collect do |level|
case level.css("a[href]").text.downcase
when "level 1"
name = level.css("a[href]").text
total_available = level.css(".right").text.split(" ")[0]
out_of_available = level.css(".right").text.split(" ")[3]
level = {name: name, total_available: total_available, out_of_available: out_of_available}
when "level 2"
name = level.css("a[href]").text
total_available = level.css(".right").text.split(" ")[0]
out_of_available = level.css(".right").text.split(" ")[3]
level = {name: name, total_available: total_available, out_of_available: out_of_available}
end
end
end
You can specify the class attribute of the table and then access the tr tags inside, this way you avoid the "additional" tr, like:
details_page.css("table.chart tr").map do |level|
...
And simplify a little bit the scrape_details_page method:
def scrape_details_page(library_url)
details_page = Nokogiri::HTML(open(library_url))
details_page.css('table.chart tr').map do |level|
right = level.css('.right').text.split
{ name: level.css('a[href]').text, total_available: right[0], out_of_available: right[3] }
end
end
p scrape_details_page('https://www.library.uq.edu.au/uqlsm/availablepcsembed.php?branch=Duhig')
# [{:name=>"Level 1", :total_available=>"22", :out_of_available=>"34"},
# {:name=>"Level 2", :total_available=>"98", :out_of_available=>"107"},
# {:name=>"Level 4", :total_available=>"12", :out_of_available=>"14"},
# {:name=>"Level 5", :total_available=>"26", :out_of_available=>"29"}]
There is a table which rows are with different class names: first_row, odd_row, even_row and subjectField.
HTML:
<table class="color_table">
<thead></thead>
<tbody>
<tr class="first_row"></tr>
<td colspan="1" rowspan="1"></td>
<td colspan="1" rowspan="1"></td>
<td colspan="1" rowspan="1"></td>
<td colspan="1" rowspan="1">
**63**
</td>
<tr class="subjectField" style="display:none"></tr>
<tr class="odd_row"></tr>
<tr class="subjectField" style="display:none"></tr>
<tr class="even_row"></tr>
<tr class="subjectField" style="display:none"></tr>
</tbody>
Additional HTML:
<tbody>
<tr class="first_row"></tr>
<tr class="subjectField" style="display:none"></tr>
<tr class="odd_row"></tr>
<tr class="subjectField" style="display:none"></tr>
<tr>
<td class="separator" rowspan="1" colspan="10"></td>
</tr>
<tr class="even_row"></tr>
<tr class="subjectField" style="display:none"></tr>
</tbody>
I need to get information from all rows except row which class name is 'subjectField'
My code:
table = #f.div(:id => 'household').table(:class => 'color_table')
table.tbody.trs(:class => 'first_row', :class => 'odd_row', :class =>'even_row').each do
age = tr.td(:index => 3).text
puts age
end
This code takes all rows, subjectFields rows too.
Does anybody know how to make it work with the rows I need only?
To find everything except a class, use a regex with a negative lookahead:
table.trs(:class => /^(?!subjectField)/).size
If you want to get the text for each of these rows:
puts table.trs(:class => /^(?!subjectField)/).collect(&:text)
If you want to get the text of the fourth column for each cell:
puts table.trs(:class => /^(?!subjectField)/).collect do |row|
row.td(:index => 3).text
end
it is really simple:
table = #f.div(:id => 'household').table(:class => 'color_table')
table_element.count #it will display the count of all rows corresponding to specified table.
table_elements(:class => 'first row').index #will return the array count [0]
table_elements(:class => 'even_row').index #will return the array count [2]
not a problem actually
table_elements(:class => 'first row').text # if you need to take the text from the row with corresponding class
or
table_elements[0].text
Have you tried something like this
...(:xpath, "//table/tbody/tr[#class !='subjectField']")...
I need to get all rows with class named 'odd_row' or 'even_row'.
HTML:
<tbody>
<tr class="first_row"> … </tr>
<tr class="subjectField" style="display:none"> … </tr>
<tr class="odd_row"> … </tr>
<tr class="subjectField" style="display:none"> … </tr>
<tr class="even_row"> … </tr>
<tr class="subjectField" style="display:none"> … </tr>
</tbody>
I tried this:
#b.table(:class => 'color_table').tbody.trs(:class => ('odd_row' || 'even_row').size
But it returns 1.
Does anybody know how to solve this problem?
If you want to do an "or" of classes, you need to use a regular expression. In regular expressions, "or" is done using a single pipe character "|". The class locator you would want is:
:class => /odd_row|even_row/
Therefore, to count all odd and even rows, you want:
#b.table(:class => 'color_table')
.tbody
.trs(:class => /odd_row|even_row/)
.size
You are calling .size at the end. This gets the size of the array as an integer. You can try without it.
I am using Webdriver in Ruby and I want to verify three text exists on a page.
Here is the piece of html I want to verify:
<table class="c1">
<thead>many subtags</thead>
<tbody id="id1">
<tr class="c2">
<td class="first-child">
<span>test1</span>
</td>
manny other <td></td>
</tr>
<tr class="c2">
<td class="first-child">
<span>test2</span>
</td>
manny other <td></td>
</tr>
<tr class="c2">
<td class="first-child">
<span>test3</span>
</td>
manny other <td></td>>
</tr>
</tbody>
</table>
How do I verify "test1", "test2" and "test3" presents on this page using
find_element
find_elements
getPageSource?
What is the best approach for it?
Thank you very much.
I would go with #find_elements method,because with other 2 options there will be a chance to get no such element exception.
First collect it in an array -
array = driver.find_elements(:xpath,"//table[#class='c1']//tr[#class='c2']//span[text()='test1']")
Now check the array size
"test1 is present" unless array.empty?
The same way we can test for test2 and test3.
Following sample code will help you to perform your task :
def check_element_exists
arr = ["test1", "test2", "test3"]
arr.each do |a|
if $driver.page_source.include? a
puts a
else
print a
puts " Not present"
end
end