How to grab all the elements inside a table? - ruby

I have a HTML in the format of this
<tbody>
<tr>
<td> Test1 </td>
<td> .. </td>
<tr/>
<tr> ... </tr>
</tbody>
<tbody>
<tr>
<td> Test2 </td>
<td> .. </td>
<tr/>
<tr> .. </tr>
</tbody>
<tbody>
<tr>
<td> Test3 </td>
<td> .. </td>
<tr/>
<tr> .. </tr>
</tbody>
How can I iterate through all the tbody's and grab the text inside the tr/td? I tried to do the following:
items = driver.find_elements(:xpath, "//tbody/tr[1]/td[1]").map(&:text)
puts items
and
items = driver.find_elements(:xpath, ".//tbody//td[1]").map(&:text)
puts items
In both of these cases, puts is empty (its reaching that point in code I checked). How can I grab all the items inside of the tbody/tr/td?

Related

List only records populated with Capybara

Friends helped me with a solution that validates if there are [active/inactive] records in the list. When I list the records using pp capybara also returns blank lines. How do I disregard empty records?
def validate_active_inactive_records
expect(page).to have_css("td:nth-child(5)", :text => /^(ACTIVE|INACTIVE)$/)
# ***listing records***
page.all('.tvGrid tr > td:nth-child(5)').each do |td|
puts td.text
end
end
<table width="100%" class="tvGrid">
<tbody>
<tr>
<th colspan="1" class="tvHeader">Id</th>
<th colspan="1" class="tvHeader">Code</th>
<th colspan="1" class="tvHeader">Description</th>
<th colspan="1" class="tvHeader">Operational Center</th>
<th colspan="1" class="tvHeader">Status</th>
</tr>
<tr class="tvRowEmpty">
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr class="tvRowEmpty">
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr class="tvRowEmpty">
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr class="tvRowEmpty">
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr class="tvRowEmpty">
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr class="tvRowEmpty">
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr class="tvRowEmpty">
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
</tbody>
</table>
Are you asking how to remove the rows with the class tvRowEmpty from your search results? If so, you can use the :not operator in your finder:
def validate_active_inactive_records
expect(page).to have_css("td:nth-child(5)", :text => /^(ACTIVE|INACTIVE)$/)
# ***listing records***
page.all('.tvGrid tr:not(.tvRowEmpty) > td:nth-child(5)').each do |td|
puts td.text
end
end
If you want to exclude any td that just contains you could use the following finder with a regex that filters tags containing only whitespace characters:
page.all('.tvGrid tr > td:nth-child(5)', text: /[\s]^*/).each

Using contains returns too many results

In the html below, I'm trying to get the two nodes that contain values for shipment_number, but instead I get 6 <td> nodes - why? Doesn't contains limit the nodes to only those that match the text value? If so the statement below should only return two, not six?
In Chrome dev console:
$x("//tr//td[contains(.,'shipment number')]/following::td[1]")
html:
<!DOCTYPE html>
<html lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<title></title>
</head>
<body>
<table border="1">
<tbody>
<tr>
<td>
<table>
<tbody>
<tr>
<td>
<table>
<tbody>
<tr>
<td>Date</td>
<td>11/15/2019</td>
</tr>
<tr>
<td>shipment number</td>
<td>abc_123_florida-45</td>
</tr>
<tr>
<td>Departure time:</td>
<td>0430</td>
</tr>
</tbody>
</table>
</td>
<td>
<table>
<tbody>
<tr>
<td>Time arrival</td>
<td>1715</td>
</tr>
<tr>
<td>customer</td>
<td>bob smith</td>
</tr>
<tr>
<td>box type</td>
<td>square</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<table border="1">
<tbody>
<tr>
<td>
<table>
<tbody>
<tr name="laneStop">
<td>box1</td>
<td>23.45</td>
<td>lane1</td>
<td>south</td>
</tr>
<tr name="laneStop">
<td>box2</td>
<td>17.14</td>
<td>lane1</td>
<td>south</td>
</tr>
<tr name="laneStop">
<td>box3</td>
<td>17.18</td>
<td>lane1</td>
<td>north</td>
</tr>
<tr name="laneStop">
<td>box2</td>
<td>199.14</td>
<td>lane1</td>
<td>west</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<table border="1">
<tbody>
<tr>
<td>
<table>
<tbody>
<tr>
<td>
<table>
<tbody>
<tr>
<td>Date</td>
<td>11/16/2019</td>
</tr>
<tr>
<td>shipment number</td>
<td>abc_222_florida-35</td>
</tr>
<tr>
<td>Departure time:</td>
<td>0630</td>
</tr>
</tbody>
</table>
</td>
<td>
<table>
<tbody>
<tr>
<td>Time arrival</td>
<td>1715</td>
</tr>
<tr>
<td>customer</td>
<td>sue smith</td>
</tr>
<tr>
<td>box type</td>
<td>rect</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<table border="1">
<tbody>
<tr>
<td>
<table>
<tbody>
<tr name="laneStop">
<td>box1</td>
<td>33.45</td>
<td>lane1</td>
<td>south</td>
</tr>
<tr name="laneStop">
<td>box2</td>
<td>1.14</td>
<td>lane1</td>
<td>south</td>
</tr>
<tr name="laneStop">
<td>box3</td>
<td>27.18</td>
<td>lane1</td>
<td>north</td>
</tr>
<tr name="laneStop">
<td>box2</td>
<td>299.14</td>
<td>lane1</td>
<td>west</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</body>
</html>
You need
//tr//td[contains(text(),'shipment number')]/following::td[1]
That's because contains(., '...') converts . to string by expanding all its text descendants, not just children.
I'm adding this answer because text() node test might conflict with others requirements, mainly those dealing with inline markup.
The reason because you are getting six td elements is that there is six td having "shipment number" as part of theirs string value (concatenation of all descendant text nodes). And that is because you have nested tables, thus nested td elements. So, you want a td element not having a descendant td element.
The expression:
//tr//td[not(.//td)][contains(.,'shipment number')]/following::td[1]
It selects:
<td>abc_123_florida-45</td>
<td>abc_222_florida-35</td>
Check in http://www.xpathtester.com/xpath/37bd889231ad68bb7bfa377433aeca00
Do note that your input sample has a default namespace declaration with the namespace URI http://www.w3.org/1999/xhtml. Because niether your code sample nor your selected answer are ussing namespaces, I asume you know how to work with them.

nokogiri parsing first td in tr ignoring specific class

I have the following html
<table>
<tr>
<th>value</th>
<th>description</th>
</tr>
<tr>
<td>OverallHealthScore</td>
<td>
Overall HealthScore.
</td>
</tr>
<tr>
<td class="deprecated">DESTAGED_TRACKS_PER_SEC</td>
<td>
The tracks per second saved into disks.
</td>
</tr>
</table>
There are many many tr's but this is an excerpt of the two scenario's
I need to only print out OverallHealthScore
table.css('tr').map do |row|
puts row.css('td:not(.deprecated)').map(&:text)[0]
end
Gets me just about there but prints out the "description" td on the deprecated items. I can't seem to figure out what I need to do in order to get the results I am needing.
Assuming you want to get the first td's value which are not deprecated:
<table>
<tr>
<th>value</th>
<th>description</th>
</tr>
<tr>
<td>OverallHealthScore</td>
<td>
Overall HealthScore.
</td>
</tr>
<tr>
<td class="deprecated">DESTAGED_TRACKS_PER_SEC</td>
<td>
The tracks per second saved into disks.
</td>
</tr>
<tr>
<td>AvaiableAnother</td>
<td>
Another Available HealthScore.
</td>
</tr>
<tr>
<td class="deprecated">OTHER_DEPRE</td>
<td>
The tracks per second saved into disks.
</td>
</tr>
</table>
Then
puts table.css('td:first-child:not(.deprecated)').map(&:text)
# OverallHealthScore
# AvaiableAnother

Looping through a table twice

I'm trying to create a page that has a table. The table has a service and then the items under the service so the table should look like this
<table class="table table-striped table-bordered table-condensed pricing_table">
<tr>
<td></td>
<td>
Short
</td>
<td>
Medium
</td>
<td>
Long
</td>
</tr>
<tr>
<td>
Cut & Blow Dry
</td>
<td>
R170
</td>
<td>
R190
</td>
<td>
R220
</td>
</tr>
<tr>
<td>
Blow Dry
</td>
<td>
R120
</td>
<td>
R170
</td>
<td>
R190
</td>
</tr>
<tr>
<td>
Girls under 18
</td>
<td>
R130
</td>
<td></td>
<td></td>
</tr>
<tr>
<td>
Pensioners
</td>
<td>
R130
</td>
<td></td>
<td></td>
</tr>
<tr>
<td>
Gents Cut
</td>
<td>
R120
</td>
<td></td>
<td></td>
</tr>
<tr>
<td>
Boys Cut
</td>
<td>
R90
</td>
<td></td>
<td></td>
</tr>
<tr>
<td>
Upstyles - Trail
</td>
<td>
R270
</td>
<td></td>
<td></td>
</tr>
<tr>
<td>
Upstyles
</td>
<td>
R350
</td>
<td></td>
<td></td>
</tr>
</table>
but what I'm getting is this
<table class="table table-striped table-bordered table-condensed pricing_table">
<tbody>
<tr>
<td> Cutting & Styling </td>
<td> Short </td>
<td> Medium </td>
<td> Long </td>
</tr>
<tr>
<td> Cut & Blow Dry </td>
<td> 150 </td>
<td> 170 </td>
<td> 190 </td>
</tr>
<tr>
<td> Cutting & Styling </td>
<td> Short </td>
<td> Medium </td>
<td> Long </td>
</tr>
<tr>
<td> Blow Dry </td>
<td> 100 </td>
<td> 120 </td>
<td> 140 </td>
</tr>
<tr>
<td> Cutting & Styling </td>
<td> Short </td>
<td> Medium </td>
<td> Long </td>
</tr>
<tr>
<td> Girls under 18 </td>
<td> 120 </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td> Cutting & Styling </td>
<td> Short </td>
<td> Medium </td>
<td> Long </td>
</tr>
<tr>
<td> Pensioners (Ladies) </td>
<td> 70 </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td> Chemical Service </td>
<td> Short </td>
<td> Medium </td>
<td> Long </td>
</tr>
<tr>
<td> Color </td>
<td> 150 </td>
<td> 200 </td>
<td> 250 </td>
</tr>
<tr>
<td> Chemical Service </td>
<td> Short </td>
<td> Medium </td>
<td> Long </td>
</tr>
<tr>
<td> Half Head Foils </td>
<td> 200 </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td> Chemical Service </td>
<td> Short </td>
<td> Medium </td>
<td> Long </td>
</tr>
<tr>
<td> Full Head Foils </td>
<td> 300 </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td> Chemical Service </td>
<td> Short </td>
<td> Medium </td>
<td> Long </td>
</tr>
<tr>
<td> Per Foil </td>
<td> 10 </td>
<td> 15 </td>
<td> 20 </td>
</tr>
<tr>
<td> Chemical Service </td>
<td> Short </td>
<td> Medium </td>
<td> Long </td>
</tr>
<tr>
<td> Brazilian </td>
<td> 700 </td>
<td> 800 </td>
<td> 900 </td>
</tr>
<tr>
<td> Chemical Service </td>
<td> Short </td>
<td> Medium </td>
<td> Long </td>
</tr>
<tr>
<td> Perm </td>
<td> 150 </td>
<td> 170 </td>
<td> 190 </td>
</tr>
<tr>
<td> Treatment </td>
<td> Short </td>
<td> Medium </td>
<td> Long </td>
</tr>
<tr>
<td> Salon Treatment </td>
<td> 100 </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td> Treatment </td>
<td> Short </td>
<td> Medium </td>
<td> Long </td>
</tr>
<tr>
<td> Olaplex - Stand Alon </td>
<td> 180 </td>
<td> </td>
<td> </td>
</tr>
</tbody>
</table>
The service is always being looped and displayed. I would only like the service to be looped once and the items to be displayed under their respective service.
My pricing.blade.php
<div class="row">
<div class="col-lg-12 col-md-12 col-sm-12 about_content_area">
<h1>pricing</h1>
<div class="col-lg-5 col-lg-offset-3 col-md-5 col-md-offset-3 pricing_wrapper">
<table class="table table-striped table-bordered table-condensed pricing_table">
#foreach($services_options as $services)
#foreach($services->service as $service)
<tr>
<td>
{!! $service->title !!}
</td>
<td>
Short
</td>
<td>
Medium
</td>
<td>
Long
</td>
</tr>
#endforeach
<tr>
<td>
{!! $services->title !!}
</td>
<td>
{!! $services->short !!}
</td>
<td>
{!! $services->medium !!}
</td>
<td>
{!! $services->long !!}
</td>
</tr>
#endforeach
</table>
</div>
</div>
</div>
My controller
public function content($id)
{
$menus_child = Menu::where('menu_id', 0)->with('menusP')->get();
$menu = Menu::where('id', $id)->firstOrFail();
$layout = $menu->type;
$gallery_category = Gcategory::all();
$services_options = Price::all();
return view('open::public/'.$layout, compact('menus_child', 'menu', 'gallery_category', 'services_options'));
}
You have a loop for the header part of the table (the short, medium, long part) within the loop for each service, which is why its being output above each service row.
You just need to update pricing.blade.php to be the following
<div class="row">
<div class="col-lg-12 col-md-12 col-sm-12 about_content_area">
<h1>pricing</h1>
<div class="col-lg-5 col-lg-offset-3 col-md-5 col-md-offset-3 pricing_wrapper">
<table class="table table-striped table-bordered table-condensed pricing_table">
<tr>
<td></td>
<td>
Short
</td>
<td>
Medium
</td>
<td>
Long
</td>
</tr>
#foreach($services_options as $services)
<tr>
<td>
{!! $services->title !!}
</td>
<td>
{!! $services->short !!}
</td>
<td>
{!! $services->medium !!}
</td>
<td>
{!! $services->long !!}
</td>
</tr>
#endforeach
</table>
</div>
</div>
</div>

Need help to locate the text of element with class?

I have a file that I have got using the command page.css("table.vc_result span a"), I am not able to get the second and third Span element of the file:
File
<table border="0" bgcolor="#FFFFFF" onmouseout="resDef(this)" onmouseover="resEmp(this)" class="vc_result">
<tbody>
<tr>
<td width="260" valign="top">
<table>
<tbody>
<tr>
<td width="40%" valign="top"><span><a class="cAddName" href="/USA/Illinois/Chicago/Yellow+Page+Advertising+And+Telephone+Directory+Publica/gateway-megatech_13478733">
Gateway Megatech</a></span><br>
<span class="cAddText">P.O. BOX 99682, Chicago IL 60696</span></td>
</tr>
<tr>
<td><span class="cAddText">Cook County Illinois</span></td>
</tr>
<tr>
<td><span class="cAddCategory">Yellow Page Advertising And Telephone
Directory Publica Chicago</span></td>
</tr>
</tbody>
</table>
</td>
<td width="260">
<table align="center">
<tbody>
<tr>
<td>
<table>
<tbody>
<tr>
<td>
<div style=
"background: url('images/listings.png');background-position: -0px -0px; width: 16px; height: 16px">
</div>
</td>
<td><font style="font-weight:bold">847-506-7800</font></td>
</tr>
</tbody>
</table>
</td>
</tr>
<tr>
<td>
<table>
<tbody>
<tr>
<td>
<div style=
"background: url('images/listings.png');background-position: -0px -78px; width: 16px; height: 16px">
</div>
</td>
<td><a href=
"/USA/Illinois/Chicago/Yellow+Page+Advertising+And+Telephone+Directory+Publica/gateway-megatech_13478733"
class="cAddNearby">Businesses near 60696</a></td>
</tr>
</tbody>
</table>
</td>
</tr>
<tr>
<td>
<table>
<tbody>
<tr>
<td></td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
...This is not the complete file there are plenty more span entries in that file.
The code that I am using is able to locate the exact text but not able to associate it with the text of the nested element Span A.
require 'rubygems'
require 'nokogiri'
require 'open-uri'
name="yellow"
city="Chicago"
state="IL"
burl="http://www.sitename.com/"
url="#{burl}Business_Listings.php?name=#{name}&city=#{city}&state=#{state}&current=1&Submit=Search"
page = Nokogiri::HTML(open(url))
rows = page.css("table.vc_result span a")
rows.each do |arow|
if arow.text == "Gateway Megatech"
puts(arow.next_element.text)
puts("Capturing the next span text")
found="Got it"
break
else
puts("Found nothing")
found="None"
end
end
Assuming that each business is a new <tr> inside the top table you have supplied, the following code gives you an array of Hashes with the values:
require 'nokogiri'
doc = Nokogiri.HTML(html)
business_rows = doc.css('table.vc_result > tbody > tr')
details = business_rows.map do |tr|
# Inside the first <td> of the row, find a <td> with a.cAddName in it
business = tr.at_xpath('td[1]//td[//a[#class="cAddName"]]')
name = business.at_css('a.cAddName').text.strip
address = business.at_css('.cAddText').text.strip
# Inside the second <td> of the row, find the first <font> tag
phone = tr.at_xpath('td[2]//font').text.strip
# Return a hash of values for this row, using the capitalization requested
{ Name:name, Address:address, Phone:phone }
end
p details
#=> [
#=> {
#=> :Name=>"Gateway Megatech",
#=> :Address=>"P.O. BOX 99682, Chicago IL 60696",
#=> :Phone=>"847-506-7800"
#=> }
#=> ]
This is pretty fragile, but works for what you've given, and there do not seem to be very many semantic items to hang onto in this insane, horrorific abuse of HTML.
Parsing HTML with regular expressions is a bad idea, because HTML is not a regular language. Ideally, you want to parse the DOM / XML to a tree structure.
http://nokogiri.org/ is pretty popular.

Resources