I am trying to search a table for specific a specific value using Ruby and Selenium-webdriver. I have a method that works but takes a lot of time for some reason. It is a one row table and the page HTML looks like this:
<div id="permitGridContainer">
<table id="calendar" class="items" style="width:430px;" name="calendar">
<thead>
<tbody>
<tr>
<td id="avail1" class="status r slct" onmouseout="return nd();" onmouseover="return overlib("Available Quota<br>River Launches : 0 of 4");">
<div class="permitStatus">R</div>
</td>
<td id="avail2" class="status r" onmouseout="return nd();" onmouseover="return overlib("Available Quota<br>River Launches : 0 of 4");">
<div class="permitStatus">R</div>
</td>
<td id="avail3" class="status a" onmouseout="return nd();" onmouseover="return overlib("Available Quota<br>River Launches : 89 of 99");">
<a onclick="javascript:setNewArrivalDate("Sun Sep 06 2015", 2);return false;" href="#">
A
<br>
<small>89</small>
</a>
</td>
<td id="avail4" class="status a" onmouseout="return nd();" onmouseover="return overlib("Available Quota<br>River Launches : 97 of 99");">
</tr>
</tbody>
</table>
</div>
... I shortened the table it has 14 columns.
I am looking for a column that has an Item available and I am checking the class for this, but the text also changes so there are other things I could look for.
This is the code I am using, but it visibly slow. I used puts statements to see the progress. My sense is that is has to do with time accessing the element. So I was hoping there is a better way to process the table quickly. Thank you.
for j in 1..days_to_check[i]
check_avail = driver.find_element(id: "avail#{j}")
check_availclass = check_avail.attribute ("class")
if check_availclass == "status a" or check_availclass == "status a slct"
#process if
end
Depending on your comment I would suggest to use the following xpath. I find this is often easier and feasible to use better xpath than looping though the html table
//td[(#class='status a') or (#class='status A')]
This xpath finds the class with status a or status A
I have this
1.9.3-p286 :073 > doc.css("tr[class~=strong]").children[3].children
=> [#<Nokogiri::XML::Element:0x3fee5e077e98 name="a" attributes=[#
<Nokogiri::XML::Attr:0x3fee5e077dd0 name="href"
value="http://somelink">]>]
Sample html:
<tr class='strong bf highbeam'>
<td>December 6th</td>
<td>Foo</td>
<td><a href='http://somelink' title='bar'>December 6th 2012 Episode</a></td>
<td><a href='http://somelink/#disqus_thread'></a></td>
</tr>
How can I fetch the value http://somelink at this point?
Don't use children, refine your css selector until you get the element you want:
doc.at('tr.strong a')[:href]
I'm trying to work out how to store an html table of drive stats in a database, but the developers have been a bit clever, and started using gifs to represent pass/fail/health stats
Here's a snippet of what I've got:
<tr class="status">
<td class="status"><img border="0" src="/tick_green.gif"></td>
<td class="status">8</td>
<td class="status">Ready</td>
<td class="status"><img border="0" src="/bar10.gif"></td>
<td class="status">SEAGATE ST3146807FC</td>
<td class="status">10000 RPM</td>
<td class="status">3HY61AG9</td>
<td class="status">XR12</td>
<td class="status">286749488</td>
<td class="status"> 28.0°C</td>
<td class="status" style="background-color: #00fa00">
</td>
**
And here's some of the ruby that I've written so far to strip the tags:
table = page.parser.xpath('//table/caption[contains(.,"Drive")]/..')
table.xpath('//table//tr').each do |row|
row.xpath('td').each do |cell|
puts cell.to_html.gsub(/<a[^>]+>/,'').gsub(/<td[^>]+>/,'').gsub(/<\/td[^>]*>/,'').gsub(/<\/a[^>]*>/,'')
#puts cell.text
end
end
I can now get semi-rational output
<img border="0" src="/tick_green.gif">
15
Ready
<img border="0" src="/bar10.gif">
SEAGATE ST3146807FC
10000 RPM
3HY61ASW
XR12
286749488
29.0°C
But I want to replace a couple of other cell elements with other bits
For example, the tick_green can also be '/cross_red.gif' or '/caution.gif' which I want to replace with regular text, likewise, the img bar10.gif, I want to replace with just text of '10'
Is it best to come up with a whole bunch of values for all of my special cases?
I'd do some 'gsub'iing.
E.g.:
example = <<-STRING
<img border="0" src="/tick_green.gif">
15
Ready
<img border="0" src="/bar10.gif">
SEAGATE ST3146807FC
10000 RPM
3HY61ASW
XR12
286749488
29.0°C
STRING
replace = Hash.new("#unknown")
replace['tick_green.gif'] = "[OK]"
replace['bar10.gif'] = "[10]"
regex = /<img [^>]* src="\/(.*)">/
result = example.gsub(regex) { replace[$1] }
Somehow the I'd like to replace the $1 with a named backreference, but don't know how yet.
http://ruby-doc.org/core-1.9.3/String.html#method-i-gsub
edit: result from above
[OK]
15
Ready
[10]
SEAGATE ST3146807FC
10000 RPM
3HY61ASW
XR12
286749488
29.0°C
A case statement will clean that up a little but:
row.css('td').each do |td|
img = td.at('img')
puts case
when img && img[:src][/bar(\d+)\.gif/] then $1
when img && img[:src][/tick_green/] then 'ok'
else td.text.strip
end
end
I have the following piece of HTMl.
<table id="movies">
<thead><tr>
<th>Movie Title</th>
<th>Rating</th>
<th>Release Date</th>
<th>More Info</th>
</tr></thead>
<tbody>
<tr>
<td>The Terminator</td>
<td>R</td>
<td>1984-10-26 00:00:00 UTC</td>
<td>More about The Terminator</td>
</tr>
<tr>
<td>When Harry Met Sally</td>
<td>R</td>
<td>1989-07-21 00:00:00 UTC</td>
<td>More about When Harry Met Sally</td>
</tr>
<tr>
<td>Amelie</td>
<td>R</td>
<td>2001-04-25 00:00:00 UTC</td>
<td>More about Amelie</td>
</tr>
</tbody>
</table>
Now, i want to write the step in my Cucumber in order to check if the specified "ratings" (the second column) are on the page or not.
So, i wrote this (this is a part of the bigger code from my step defination but i checked, everything works till this place):
txt = "//table[#id='movies']/tbody//td[2]"
page.all(:xpath, txt) do |element|
debugger
puts element.text
end
However, there seems to be somewhere a small error, because i never get inside this page.all block... no debugger is invoke, for instance.
Any Help is appreciated :)
After some meditative minutes the problem was fixed im my mind :)
I simply needed to give .each at the end.
txt = "//table[#id='movies']/tbody//td[2]"
page.all(:xpath, txt).each do |element|
debugger
puts element.text
end
I'd like to parse a HTML page with the Nokogiri. There is a table in part of the page which does not use any specific ID. Is it possible to extract something like:
Today,3,455,34
Today,1,1300,3664
Today,10,100000,3444,
Yesterday,3454,5656,3
Yesterday,3545,1000,10
Yesterday,3411,36223,15
From this HTML:
<div id="__DailyStat__">
<table>
<tr class="blh"><th colspan="3">Today</th><th class="r" colspan="3">Yesterday</th></tr>
<tr class="blh"><th>Qnty</th><th>Size</th><th>Length</th><th class="r">Length</th><th class="r">Size</th><th class="r">Qnty</th></tr>
<tr class="blr">
<td>3</td>
<td>455</td>
<td>34</td>
<td class="r">3454</td>
<td class="r">5656</td>
<td class="r">3</td>
</tr>
<tr class="bla">
<td>1</td>
<td>1300</td>
<td>3664</td>
<td class="r">3545</td>
<td class="r">1000</td>
<td class="r">10</td>
</tr>
<tr class="blr">
<td>10</td>
<td>100000</td>
<td>3444</td>
<td class="r">3411</td>
<td class="r">36223</td>
<td class="r">15</td>
</tr>
</table>
</div>
As a quick and dirty first pass I'd do:
html = <<EOT
<div id="__DailyStat__">
<table>
<tr class="blh"><th colspan="3">Today</th><th class="r" colspan="3">Yesterday</th></tr>
<tr class="blh"><th>Qnty</th><th>Size</th><th>Length</th><th class="r">Length</th><th class="r">Size</th><th class="r">Qnty</th></tr>
<tr class="blr">
<td>3</td>
<td>455</td>
<td>34</td>
<td class="r">3454</td>
<td class="r">5656</td>
<td class="r">3</td>
</tr>
<tr class="bla">
<td>1</td>
<td>1300</td>
<td>3664</td>
<td class="r">3545</td>
<td class="r">1000</td>
<td class="r">10</td>
</tr>
<tr class="blr">
<td>10</td>
<td>100000</td>
<td>3444</td>
<td class="r">3411</td>
<td class="r">36223</td>
<td class="r">15</td>
</tr>
</table>
</div>
EOT
# Today Yesterday
# Qnty Size Length Length Size Qnty
# 3 455 34 3454 5656 3
# 1 1300 3664 3545 1000 10
# 10 100000 3444 3411 36223 15
require 'nokogiri'
doc = Nokogiri::HTML(html)
Use CSS to find the start of the table, and define some places to hold the data we're capturing:
table = doc.at('div#__DailyStat__ table')
today_data = []
yesterday_data = []
Loop over the rows in the table, rejecting the headers:
table.search('tr').each do |tr|
next if (tr['class'] == 'blh')
Initialize arrays to capture the pertinent data from each row, selectively push the data into the appropriate array:
today_td_data = [ 'Today' ]
yesterday_td_data = [ 'Yesterday' ]
tr.search('td').each do |td|
if (td['class'] == 'r')
yesterday_td_data << td.text.to_i
else
today_td_data << td.text.to_i
end
end
today_data << today_td_data
yesterday_data << yesterday_td_data
end
And output the data:
puts today_data.map{ |a| a.join(',') }
puts yesterday_data.map{ |a| a.join(',') }
> Today,3,455,34
> Today,1,1300,3664
> Today,10,100000,3444
> Yesterday,3454,5656,3
> Yesterday,3545,1000,10
> Yesterday,3411,36223,15
Just to help you visualize what's going, at the exit from the "tr" loop, the today_data and yesterday_data arrays are arrays-of-arrays looking like:
[["Today", 3, 455, 34], ["Today", 1, 1300, 3664], ["Today", 10, 100000, 3444]]
Alternatively, instead of looping over the "td" tags and sensing the class for the tag, I could have grabbed the contents of the "tr" and then used scan to grab the numbers and sliced the resulting array into "today" and "yesterday" arrays:
tr_data = tr.text.scan(/\d+/).map{ |i| i.to_i }
today_td_data = [ 'Today', *tr_data[0, 3] ]
yesterday_td_data = [ 'Yesterday', *tr_data[3, 3] ]
In real-world development, like at work, I'd use that instead of what I first wrote because it's succinct.
And notice that I didn't use XPath. It's very doable in Nokogiri to use XPath and accomplish this, but for simplicity I prefer CSS accessors. XPath would have allowed accessing individual "td" tag contents, but it also would begin to look like line-noise, which is something we want to avoid when writing code, because it impacts maintenance. I could also have used CSS to drill down to the correct "td" tags like 'tr td.r', but I don't think it would improve the code, it would just be an alternate way of doing it.