Given the following HTML code :
<tr>
<th scope="row" class="navbox-group">Family</th>
<td class="navbox-list navbox-even hlist" style="text-align:left;border-left-width:2px;border-left-style:solid;width:100%;padding:0px">
<div style="padding:0em 0.25em">
<ul>
<li>Andrew Parker Bowles <small>(first husband)</small></li>
<li>Tom Parker Bowles <small>(son)</small></li>
<li>Laura Lopes <small>(daughter)</small></li>
<li>Charles, Prince of Wales <small>(second husband)</small></li>
<li>Bruce Shand <small>(father)</small></li>
<li>Rosalind Shand <small>(mother)</small></li>
<li>Annabel Elliot <small>(sister)</small></li>
<li>Mark Shand <small>(brother)</small></li>
</ul>
</div>
</td>
</tr>
I want to get all the href within the tr element , but only from tr elements that contains :
<th scope="row" class="navbox-group">Family</th>
(Where th='Family')
I try to write the following XPath :
"//tr[#th='Family']//a/#href"
But I don't get any href.
Thanks a lot.
Shany
Try below XPath:
//tr[th="Family"]//#href
It should allow you to get list of links from tr that contains th with text "Family"
Related
I'm trying to use xpath to select all the text within the elements:
between the h3 elements "Hay Point" and "Darymple Bay"
after h3 element "Darymlple Bay"
I've got this xpath syntax working which selects all the text within the td tags after <h3>Dalrymple Bay Coal Terminal</h3>.
.//h3[2]/following::td/text()
But I'm having trouble figuring out how to select all the text between the tags that fall between <h3>Hay Point Coal Terminal</h3> and <h3>Dalrymple Bay Coal Terminal</h3>
A sample of the structure of the html is below:
<h3>Hay Point Coal Terminal</h3>
<tr role="row" class="odd"><td headers="table06762r1c1" tabindex="0">July
</td><td style="text-align: left;"
headers="table06762r1c2">4,517,445</td>
<td headers="table06762r1c3">4,261,253</td>
<td headers="table06762r1c4">4,057,239</td>
<td headers="table06762r1c5">3,535,507</td>
</tr>
<h3>Dalrymple Bay Coal Terminal</h3>
<tr><td headers="table06762r1c1">July</td><td style="text-align: left;"
headers="table06762r1c2">5,462,591</td>
<td headers="table06762r1c3">5,625,700</td>
<td headers="table06762r1c4">5,816,977</td>
<td headers="table06762r1c5">5,396,644</td>
</tr>
If I understand your question correctly and given the html in the question, in order to get text nodes related to the <h3>Hay Point Coal Terminal</h3> node, try:
//h3[1]/following-sibling::tr[1]/td/text()
Output:
July
4,517,445
4,261,253
4,057,239
3,535,50
To get those related to the <h3>Dalrymple Bay Coal Terminal</h3> node, use:
//h3[2]/following-sibling::tr[1]/td/text()
or just
//h3[2]/following-sibling::tr/td/text()
Output:
July
5,462,591
5,625,700
5,816,977
5,396,644
To get both:
//h3/following-sibling::tr/td/text()
Assuming you want to group those you would do something like:
for h3 in response.css('h3'):
item = {
"h3": h3.css('*::text').extract()[0],
"tds": h3.css('* + tr td::text').extract()
}
I am trying to search a table for specific a specific value using Ruby and Selenium-webdriver. I have a method that works but takes a lot of time for some reason. It is a one row table and the page HTML looks like this:
<div id="permitGridContainer">
<table id="calendar" class="items" style="width:430px;" name="calendar">
<thead>
<tbody>
<tr>
<td id="avail1" class="status r slct" onmouseout="return nd();" onmouseover="return overlib("Available Quota<br>River Launches : 0 of 4");">
<div class="permitStatus">R</div>
</td>
<td id="avail2" class="status r" onmouseout="return nd();" onmouseover="return overlib("Available Quota<br>River Launches : 0 of 4");">
<div class="permitStatus">R</div>
</td>
<td id="avail3" class="status a" onmouseout="return nd();" onmouseover="return overlib("Available Quota<br>River Launches : 89 of 99");">
<a onclick="javascript:setNewArrivalDate("Sun Sep 06 2015", 2);return false;" href="#">
A
<br>
<small>89</small>
</a>
</td>
<td id="avail4" class="status a" onmouseout="return nd();" onmouseover="return overlib("Available Quota<br>River Launches : 97 of 99");">
</tr>
</tbody>
</table>
</div>
... I shortened the table it has 14 columns.
I am looking for a column that has an Item available and I am checking the class for this, but the text also changes so there are other things I could look for.
This is the code I am using, but it visibly slow. I used puts statements to see the progress. My sense is that is has to do with time accessing the element. So I was hoping there is a better way to process the table quickly. Thank you.
for j in 1..days_to_check[i]
check_avail = driver.find_element(id: "avail#{j}")
check_availclass = check_avail.attribute ("class")
if check_availclass == "status a" or check_availclass == "status a slct"
#process if
end
Depending on your comment I would suggest to use the following xpath. I find this is often easier and feasible to use better xpath than looping though the html table
//td[(#class='status a') or (#class='status A')]
This xpath finds the class with status a or status A
I have this
1.9.3-p286 :073 > doc.css("tr[class~=strong]").children[3].children
=> [#<Nokogiri::XML::Element:0x3fee5e077e98 name="a" attributes=[#
<Nokogiri::XML::Attr:0x3fee5e077dd0 name="href"
value="http://somelink">]>]
Sample html:
<tr class='strong bf highbeam'>
<td>December 6th</td>
<td>Foo</td>
<td><a href='http://somelink' title='bar'>December 6th 2012 Episode</a></td>
<td><a href='http://somelink/#disqus_thread'></a></td>
</tr>
How can I fetch the value http://somelink at this point?
Don't use children, refine your css selector until you get the element you want:
doc.at('tr.strong a')[:href]
I'm trying to work out how to store an html table of drive stats in a database, but the developers have been a bit clever, and started using gifs to represent pass/fail/health stats
Here's a snippet of what I've got:
<tr class="status">
<td class="status"><img border="0" src="/tick_green.gif"></td>
<td class="status">8</td>
<td class="status">Ready</td>
<td class="status"><img border="0" src="/bar10.gif"></td>
<td class="status">SEAGATE ST3146807FC</td>
<td class="status">10000 RPM</td>
<td class="status">3HY61AG9</td>
<td class="status">XR12</td>
<td class="status">286749488</td>
<td class="status"> 28.0°C</td>
<td class="status" style="background-color: #00fa00">
</td>
**
And here's some of the ruby that I've written so far to strip the tags:
table = page.parser.xpath('//table/caption[contains(.,"Drive")]/..')
table.xpath('//table//tr').each do |row|
row.xpath('td').each do |cell|
puts cell.to_html.gsub(/<a[^>]+>/,'').gsub(/<td[^>]+>/,'').gsub(/<\/td[^>]*>/,'').gsub(/<\/a[^>]*>/,'')
#puts cell.text
end
end
I can now get semi-rational output
<img border="0" src="/tick_green.gif">
15
Ready
<img border="0" src="/bar10.gif">
SEAGATE ST3146807FC
10000 RPM
3HY61ASW
XR12
286749488
29.0°C
But I want to replace a couple of other cell elements with other bits
For example, the tick_green can also be '/cross_red.gif' or '/caution.gif' which I want to replace with regular text, likewise, the img bar10.gif, I want to replace with just text of '10'
Is it best to come up with a whole bunch of values for all of my special cases?
I'd do some 'gsub'iing.
E.g.:
example = <<-STRING
<img border="0" src="/tick_green.gif">
15
Ready
<img border="0" src="/bar10.gif">
SEAGATE ST3146807FC
10000 RPM
3HY61ASW
XR12
286749488
29.0°C
STRING
replace = Hash.new("#unknown")
replace['tick_green.gif'] = "[OK]"
replace['bar10.gif'] = "[10]"
regex = /<img [^>]* src="\/(.*)">/
result = example.gsub(regex) { replace[$1] }
Somehow the I'd like to replace the $1 with a named backreference, but don't know how yet.
http://ruby-doc.org/core-1.9.3/String.html#method-i-gsub
edit: result from above
[OK]
15
Ready
[10]
SEAGATE ST3146807FC
10000 RPM
3HY61ASW
XR12
286749488
29.0°C
A case statement will clean that up a little but:
row.css('td').each do |td|
img = td.at('img')
puts case
when img && img[:src][/bar(\d+)\.gif/] then $1
when img && img[:src][/tick_green/] then 'ok'
else td.text.strip
end
end
I'd like to parse a HTML page with the Nokogiri. There is a table in part of the page which does not use any specific ID. Is it possible to extract something like:
Today,3,455,34
Today,1,1300,3664
Today,10,100000,3444,
Yesterday,3454,5656,3
Yesterday,3545,1000,10
Yesterday,3411,36223,15
From this HTML:
<div id="__DailyStat__">
<table>
<tr class="blh"><th colspan="3">Today</th><th class="r" colspan="3">Yesterday</th></tr>
<tr class="blh"><th>Qnty</th><th>Size</th><th>Length</th><th class="r">Length</th><th class="r">Size</th><th class="r">Qnty</th></tr>
<tr class="blr">
<td>3</td>
<td>455</td>
<td>34</td>
<td class="r">3454</td>
<td class="r">5656</td>
<td class="r">3</td>
</tr>
<tr class="bla">
<td>1</td>
<td>1300</td>
<td>3664</td>
<td class="r">3545</td>
<td class="r">1000</td>
<td class="r">10</td>
</tr>
<tr class="blr">
<td>10</td>
<td>100000</td>
<td>3444</td>
<td class="r">3411</td>
<td class="r">36223</td>
<td class="r">15</td>
</tr>
</table>
</div>
As a quick and dirty first pass I'd do:
html = <<EOT
<div id="__DailyStat__">
<table>
<tr class="blh"><th colspan="3">Today</th><th class="r" colspan="3">Yesterday</th></tr>
<tr class="blh"><th>Qnty</th><th>Size</th><th>Length</th><th class="r">Length</th><th class="r">Size</th><th class="r">Qnty</th></tr>
<tr class="blr">
<td>3</td>
<td>455</td>
<td>34</td>
<td class="r">3454</td>
<td class="r">5656</td>
<td class="r">3</td>
</tr>
<tr class="bla">
<td>1</td>
<td>1300</td>
<td>3664</td>
<td class="r">3545</td>
<td class="r">1000</td>
<td class="r">10</td>
</tr>
<tr class="blr">
<td>10</td>
<td>100000</td>
<td>3444</td>
<td class="r">3411</td>
<td class="r">36223</td>
<td class="r">15</td>
</tr>
</table>
</div>
EOT
# Today Yesterday
# Qnty Size Length Length Size Qnty
# 3 455 34 3454 5656 3
# 1 1300 3664 3545 1000 10
# 10 100000 3444 3411 36223 15
require 'nokogiri'
doc = Nokogiri::HTML(html)
Use CSS to find the start of the table, and define some places to hold the data we're capturing:
table = doc.at('div#__DailyStat__ table')
today_data = []
yesterday_data = []
Loop over the rows in the table, rejecting the headers:
table.search('tr').each do |tr|
next if (tr['class'] == 'blh')
Initialize arrays to capture the pertinent data from each row, selectively push the data into the appropriate array:
today_td_data = [ 'Today' ]
yesterday_td_data = [ 'Yesterday' ]
tr.search('td').each do |td|
if (td['class'] == 'r')
yesterday_td_data << td.text.to_i
else
today_td_data << td.text.to_i
end
end
today_data << today_td_data
yesterday_data << yesterday_td_data
end
And output the data:
puts today_data.map{ |a| a.join(',') }
puts yesterday_data.map{ |a| a.join(',') }
> Today,3,455,34
> Today,1,1300,3664
> Today,10,100000,3444
> Yesterday,3454,5656,3
> Yesterday,3545,1000,10
> Yesterday,3411,36223,15
Just to help you visualize what's going, at the exit from the "tr" loop, the today_data and yesterday_data arrays are arrays-of-arrays looking like:
[["Today", 3, 455, 34], ["Today", 1, 1300, 3664], ["Today", 10, 100000, 3444]]
Alternatively, instead of looping over the "td" tags and sensing the class for the tag, I could have grabbed the contents of the "tr" and then used scan to grab the numbers and sliced the resulting array into "today" and "yesterday" arrays:
tr_data = tr.text.scan(/\d+/).map{ |i| i.to_i }
today_td_data = [ 'Today', *tr_data[0, 3] ]
yesterday_td_data = [ 'Yesterday', *tr_data[3, 3] ]
In real-world development, like at work, I'd use that instead of what I first wrote because it's succinct.
And notice that I didn't use XPath. It's very doable in Nokogiri to use XPath and accomplish this, but for simplicity I prefer CSS accessors. XPath would have allowed accessing individual "td" tag contents, but it also would begin to look like line-noise, which is something we want to avoid when writing code, because it impacts maintenance. I could also have used CSS to drill down to the correct "td" tags like 'tr td.r', but I don't think it would improve the code, it would just be an alternate way of doing it.