Ruby code to display table element details - ruby

I have a HTML which displays the Product Details in the following way:
<div class="column">
<h3 class="hidden-xs">Product Details</h3>
<table class="table table-striped">
<tbody>
<tr class="header-row hidden-xs">
<th>Product</th>
<th>Duration</th>
<th>Unit Price</th>
<th>Price</th>
</tr>
<tr>
<td>google.com</td>
<td>1 Year</td>
<td class="hidden-xs">$5</td>
<td>$5</td>
</tr>
</tbody>
</table>
<div class="totals text-right">
<p>Subtotal: $5</p>
<p>Total: $5</p>
</div>
</div>
Ruby code is given below:
require 'watir'
browser = Watir::Browser.new(:chrome)
browser.goto('file:///C:/Users/Ashwin/Desktop/text.html')
browser.table(:class, 'table table-striped').trs.each do |tr|
p tr[0].text
p tr[1].text
p tr[2].text
p tr[3].text
end
I am getting the output this way:
"Product"
"Duration"
"Unit Price"
"Price"
"google.com"
"1 Year"
"$5"
"$5"
But I want the details to be displayed as below:
Product : google.com
Duration : 1 Year
Unit Price : $5
Price : $5
Can anyone please help?

The table looks quite simple, so you can use the Table#strings method to convert the table into an array of strings. Then you can output each column header with each row value.
# Get the table you want to work with
table = browser.table(class: 'table table-striped')
# Get the text of the table
rows = table.strings
# Get the column headers and determine the longest one
header = rows.shift
column_width = header.max { |a, b| a.length <=> b.length }.length
# Loop through the data rows and output the header/value
rows.each do |row|
header.zip(row) do |header, value|
puts "#{header.ljust(column_width)} : #{value}"
end
end
#=> Product : google.com
#=> Duration : 1 Year
#=> Unit Price : $5
#=> Price : $5

This code is only for the given table with two rows
require 'watir'
browser = Watir::Browser.new(:chrome)
browser.goto('file:///C:/Users/Ashwin/Desktop/text.html')
browser.table(:class, 'table table-striped').rows.each_with_index do |row,index|
if index.eql?0
firstRow=row
next
end
p firstRow[0].text+":"+row[0].text
p firstRow[1].text+":"+row[1].text
p firstRow[2].text+":"+row[2].text
p firstRow[3].text+":"+row[3].text
end

Related

Calculate and display the balances from bottom to top in ruby on rails

In order to display the records in descending order, I used "created_at DESC" and it worked for all the entries of the table, that is for Date column, Particulars column, Debit and Credit columns except for the Balance and it is still calculated and displayed from top to bottom. But I want to calculate and display from bottom to top. This can be seen in the below image.
expenses_controller.rb
class ExpensesController < ApplicationController
def index
#expenses = Expense.order("created_at DESC")
end
For better understanding, find the below image of the Bank statement, as I need to achieve the same.
index.html.erb
<% balance = 0 %>
<div class="container">
<table style="width:100%">
<thead>
<tr>
<th>Date</th>
<th>Particulars</th>
<th>Debit</th>
<th>Credit</th>
<th>Balance</th>
</tr>
</thead>
<tbody>
<% #expenses.each do |expense| %>
<tr>
<td><%= expense.date.strftime('%d/%m/%Y') %></td>
<td><%= expense.particulars %></td>
<td class="pos"><%= expense.debit %></td>
<td class="neg"><%= expense.credit %></td>
<% balance += expense.debit.to_f-expense.credit.to_f %>
<% color = balance >= 0 ? "pos" : "neg" %>
<td class="<%= color %>"><%= number_with_precision(balance.abs, :delimiter => ",", :precision => 0) %></td>
</tr>
<% end %>
</tbody>
</table>
</div>
Any suggestions are most welcome.
Thank you in advance.
I still don't understand how balance column [2500, 1500, 2000] is calculated, but I could argue something from the screenshot.
Basically you are sorting by a column not existing in the model. So, first you need to build that helper column, populate it, then sort by that column.
It should be possible to do it in SQL, but I'm showing in plain Ruby using a Hash as fake database. You can adapt it to your case easily or look for a most efficient way (SQL).
Let's say data are the following:
expenses = [{date: 1, narration: :a, debit: 3.0, credit: 0},
{date: 2, narration: :b, debit: 0.15, credit: 0},
{date: 3, narration: :c, debit: 75.0, credit: 0}]
And the initial balance is:
balance = 1434.64
Now lets loop the data adding the new field balance and sorting at the end of the loop:
expenses.each do |h|
balance += h[:credit] - h[:debit]
h[:balance] = balance
end.sort!{ |h| h[:balance]}
Now your sorted expenses are:
[
{:date=>3, :narration=>:a, :debit=>75.0, :credit=>0, :balance=>1356.49}
{:date=>2, :narration=>:b, :debit=>0.15, :credit=>0, :balance=>1431.49}
{:date=>1, :narration=>:c, :debit=>3.0, :credit=>0, :balance=>1431.64}
]
You can do calculation in the controller, then pass expenses to the view and loop without any need of calculation there.
For your rails app, you could implement as follow.
Add the temporary field balance to your model (no need to add a column to the database) and initialize to value 0:
class Expense < ApplicationRecord
attr_accessor :balance
after_initialize :init
def init
self.balance = 0
end
end
Do the calculation in controller, I'm using an initial value of balance, just to emulate the example:
def index
#expenses = Expense.all
balance = 1434.64
#expenses.each do |e|
balance += e.credit - e.debit
e.balance = balance
end
#expenses = #expenses.sort{ |e| e.balance }
end
Then in your view, just loop:
<% #expenses.each do |expense| %>
<tr>
<td><%= expense.narration %></td>
<td><%= expense.debit %></td>
<td><%= expense.credit %></td>
<td><%= expense.balance %></td>
</tr>
<% end %>
If you insert the records as in your example, you should end up with this result:
# ["c", "0.0", "75.0", "1356.49"]
# ["b", "0.0", "0.15", "1431.49"]
# ["a", "0.0", "3.0", "1431.64"]
If you need to order by creation date first and by balance second, you could use
#expenses = Expense.order('created_at DESC, balance DESC')
Since you told it to order by Expense.order('created_at desc'), then that's what it's doing. If you want to order by balance, you must instead say Expense.order('balance desc')

Display Nokogiri children nodes as raw HTML instead of >tag<

I am changing an XML table into an HTML table, and have to do some rearranging of nodes.
To accomplish the transformation, I scrape the XML, put it into a two-dimensional array, and then build the new HTML to output.
But some of the cells have HTML tags in them, and after my conversion <su> becomes >su<.
The XML data is:
<BOXHD>
<CHED H="1">Disc diameter, inches (cm)</CHED>
<CHED H="1">One-half or more of disc covered</CHED>
<CHED H="2">Number <SU>1</SU>
</CHED>
<CHED H="2">Exhaust foot <SU>3</SU>/min.</CHED>
<CHED H="1">Disc not covered</CHED>
<CHED H="2">Number <SU>1</SU>
</CHED>
<CHED H="2">Exhaust foot<SU>3</SU>/min.</CHED>
</BOXHD>
The steps I'm taking to convert this to an HTML table are:
class TableCell
attr_accessor :text, :rowspan, :colspan
def initialize(text='')
#text = text
#rowspan = 1
#colspan = 1
end
end
#frag = Nokogiri::HTML(xml)
# make a 2d array to store how the cells should be arranged
column = 0
prev_row = -1
#frag.xpath("boxhd/ched").each do |ched|
row = ched.xpath("#h").first.value.to_i - 1
if row <= prev_row
column +=1
end
prev_row = row
#data[row][column] = TableCell.new(ched.inner_html)
end
# methods to find colspan and rowspan, put them in #data
# ... snip ...
# now build an html table
doc = Nokogiri::HTML::DocumentFragment.parse ""
Nokogiri::HTML::Builder.with(doc) do |html|
html.table {
#data.each do |tr|
html.tr {
tr.each do |th|
next if th.nil?
html.th(:rowspan => th.rowspan, :colspan => th.colspan).table_header th.text
end
}
end
}
end
This gives the following HTML (notice the superscripts are escaped):
<table>
<tr>
<th rowspan="2" colspan="1" class="table_header">Disc diameter, inches (cm)</th>
<th rowspan="1" colspan="2" class="table_header">One-half or more of disc covered</th>
<th rowspan="1" colspan="2" class="table_header">Disc not covered</th>
</tr>
<tr>
<th rowspan="1" colspan="1" class="table_header">Number <su>1</su> </th>
<th rowspan="1" colspan="1" class="table_header">Exhaust foot <su>3</su>/min.</th>
<th rowspan="1" colspan="1" class="table_header">Number <su>1</su></th>
<th rowspan="1" colspan="1" class="table_header">Exhaust foot<su>3</su>/min.</th>
</tr>
</table>
How do I get the raw HTML instead of the entities?
I've tried these with no success
#data[row][column] = TableCell.new(ched.children)
#data[row][column] = TableCell.new(ched.children.to_s)
#data[row][column] = TableCell.new(ched.to_s)
This might help you understand what's happening:
require 'nokogiri'
doc = Nokogiri::XML('<root><foo></foo></root>')
doc.at('foo').content = '<html><body>bar</body></html>'
doc.to_xml # => "<?xml version=\"1.0\"?>\n<root>\n <foo><html><body>bar</body></html></foo>\n</root>\n"
doc.at('foo').children = '<html><body>bar</body></html>'
doc.to_xml # => "<?xml version=\"1.0\"?>\n<root>\n <foo>\n <html>\n <body>bar</body>\n </html>\n </foo>\n</root>\n"
doc.at('foo').children = Nokogiri::XML::Document.new.create_cdata '<html><body>bar</body></html>'
doc.to_xml # => "<?xml version=\"1.0\"?>\n<root>\n <foo><![CDATA[<html><body>bar</body></html>]]></foo>\n</root>\n"
I abandoned the builder, and simply built the HTML:
headers = html_headers()
def html_headers()
rows = Array.new
#data.each do |row|
cells = Array.new
row.each do |cell|
next if cell.nil?
cells << "<th rowspan=\"%d\" colspan=\"%d\">%s</th>" %
[cell.rowspan,
cell.colspan,
cell.text]
end
rows << "<tr>%s</tr>" % cells.join
end
rows.join
end
def replace_nodes(headers)
# ... snip ...
#frag.xpath("boxhd").each do |old|
puts "replacing boxhd..."
old.replace headers
end
# ... snip ...
end
I don't understand why, but it appears that the text I replaced the <BOXHD> tags with are parsed and searchable, as I was able to change tag names from data in cell.text.

Scraping multiple table row siblings with Nokogiri

I’m trying to parse a table with the following markup.
<table>
<tr class="athlete">
<td colspan="2" class="name">Alex</td>
</tr>
<tr class="run">
<td>5.00</td>
<td>10.00</td>
</tr>
<tr class="run">
<td>5.20</td>
<td>10.50</td>
</tr>
<tr class="end"></tr>
<tr class="athlete">
<td colspan="2" class="name">John</td>
</tr>
<tr class="run">
<td>5.00</td>
<td>10.00</td>
</tr>
<tr class="end"></tr>
</table>
I need to loop through each .athlete table row and get each sibling .run table row underneath until I reach the .end row. Then repeat for the next athlete and so on. Some .athlete rows have two .run rows, others have one.
Here’s what I have so far. I loop through the athletes:
require 'rubygems'
require 'nokogiri'
require 'open-uri'
url = "http://myurl.com"
doc = Nokogiri::HTML(open(url))
doc.css(".athlete").each do |athlete|
puts athlete.at_css("name").text
# Loop through the sibling .run rows until I reach the .end row
# output the value of the td’s in the .run row
end
I can’t figure out how to get each sibling .run row, and stop at the .end row. I feel like it would be easier if the table was better formed, but unfortunately I don’t have control of the markup. Any help would be greatly appreciated!
Voilà
require 'nokogiri'
doc = <<DOC
<table>
<tr class="athlete">
<td colspan="2" class="name">Alex</td>
</tr>
<tr class="run">
<td>5.00</td>
<td>10.00</td>
</tr>
<tr class="run">
<td>5.20</td>
<td>10.50</td>
</tr>
<tr class="end"></tr>
<tr class="athlete">
<td colspan="2" class="name">John</td>
</tr>
<tr class="run">
<td>5.00</td>
<td>10.00</td>
</tr>
<tr class="end"></tr>
</table>
DOC
doc = Nokogiri::HTML(doc)
# You can exclude .end, if it is always empty? and not required
trs = doc.css('.athlete, .run, .end').to_a
# This will return [['athlete', 'run', ...,'end'], ['athlete', 'run', ...,'end'] ...]
athletes = trs.slice_before{ |elm| elm.attr('class') =='athlete' }.to_a
athletes.map! do |athlete|
{
name: athlete.shift.at_css('.name').text,
runs: athlete
.select{ |tr| tr.attr('class') == 'run' }
.map{|run| run.text.to_f }
}
end
puts athletes.inspect
#[{:name=>"Alex", :runs=>[5.0, 5.2]}, {:name=>"John", :runs=>[5.0]}]
I would process the table as followed:
Locate the table you want to process
table = doc.at_css("table")
Get all the immediate rows in the table
rows = table.css("> tr")
Group the rows with boundary .athlete and .end
grouped = [[]]
rows.each do |row|
if row['class'] == 'athlete' and grouped.last.empty?
grouped.last << row
elsif row['class'] == 'end' and not grouped.last.empty?
grouped.last << row
grouped << []
elsif not grouped.last.empty?
grouped.last << row
end
end
grouped.pop if grouped.last.empty? || grouped.last.last['class'] != 'end'
Process the grouped rows
grouped.each do |group|
puts "BEGIN: >> #{group.first.text} <<"
group[1..-2].each do |row|
puts " #{row.text.squeeze}"
end
puts "END: >> #{group.last.text} <<"
end

Parsing bank statement and returning values from 2d array

I'm trying to parse my online bank statement, retrieve the values, and then get the individual values. Here's a sample statement. otherrefcode stands for the money I sent, and refcode stands for the money I received.
Date Description Type [?] In (£) Out (£) Balance (£)
29 Aug 13 person1 otherrefcode 29AUG13 18:23 FPO 42.81 662.68
29 Aug 13 person2 otherrefcode 29AUG13 18:21 FPO 599.91 705.49
29 Aug 13 person3 refcode TFR 30.80 1,305.40
28 Aug 13 person4 otherrefcode 28AUG13 14:23 FPO 25.27 1,336.20
28 Aug 13 person5 refcode TFR 41.08 1,361.47
And here's my ruby code. How do I grab the individual values?
require 'watir-webdriver'
require 'nokogiri'
def toprice(data)
data.to_s.match(/\d\d\.\d\d/).to_s
end
$browser = Watir::Browser.new :firefox
$browser.goto("bankurl")
$page_html = Nokogiri::HTML.parse($browser.html)
table_array = Array.new
table = $browser.table(:class,'statement smartRewardsOffers')
table.rows.each do |row|
row_array = Array.new
row.cells.each do |cell|
row_array << cell.text
end
table_array << row_array
end
puts "1strun"
puts table_array[1..4][1]
puts "2ndrun"
puts table_array[1][1..4]
That outputs
1strun
person1 otherrefcode 29AUG13 18:23
FPO
42.81
2ndrun
29 Aug 13
person2 otherrefcode 29AUG13 18:21
FPO
599.91
705.49
The HTML of the statement (well, the first 3 transactions - warning, 76 lines long.)
<table id="pnlgrpStatement:conS1:tblTransactionListView" class="statement smartRewardsOffers" summary="Table displaying the statement for your account Classic xxxxxxxxx xxxxxxxxx">
<thead>
<tr>
<th class="{sorter:false} first" scope="col">
<form id="pnlgrpStatement:conS1:tblTransactionListView:frmToggle" class="validationName:(pnlgrpStatement:conS1:tblTransactionListView:frmToggle) validate:()" enctype="application/x-www-form-urlencoded" autocomplete="off" action="/personal/a/viewproductdetails/ViewProductDetails.jsp" method="post" name="pnlgrpStatement:conS1:tblTransactionListView:frmToggle">
<input id="pnlgrpStatement:conS1:tblTransactionListView:frmToggle:btnASCSortStatements" class="tableSorter tableSorterReverse" type="image" title="Sort by oldest first" alt="Sort by oldest first" src="/wps/wcm/connect/xxxxxxxxxxxx/sort_arrow_up-8-1375113571.png?MOD=AJPERES&CACHEID=xxxxxxxxxxx" name="pnlgrpStatement:conS1:tblTransactionListView:frmToggle:btnASCSortStatements">
Date
<input type="hidden" value="pnlgrpStatement:conS1:tblTransactionListView:frmToggle" name="pnlgrpStatement:conS1:tblTransactionListView:frmToggle">
<input type="hidden" value="xxxxxxx" name="submitToken">
<input type="hidden" name="hasJS" value="true">
</form>
</th>
<th class="{sorter:false} description" scope="col">Description</th>
<th class="{sorter:false} transactionType" scope="col">
Type
<span class="cxtHelp">
<a class="cxtTrigger" href="#transForView" title="Click to find out more about transaction types">[?]</a>
</span>
</th>
<th class="{sorter:false} numeric" scope="col">In (£)</th>
<th class="{sorter:false} numeric" scope="col">Out (£)</th>
<th class="{sorter:false} numeric" scope="col">Balance (£)</th>
</tr>
</thead>
<tbody>
<tr class="alt">
<th class="first">29 Aug 13</th>
<td>
<span class="splitString">person1</span>
<span class="splitString"> </span>
<span class="splitString">ref</span>
<span class="splitString"> </span>
<span class="splitString">29AUG13 18:23</span>
<span class="splitString"> </span>
</td>
<td>
<abbr title="Faster Payments Outgoing">FPO</abbr>
</td>
<td class="numeric"></td>
<td class="numeric">42.81</td>
<td class="numeric">662.68</td>
</tr>
<tr>
<th class="first">29 Aug 13</th>
<td>
<span class="splitString">person2</span>
<span class="splitString"> </span>
<span class="splitString">ref</span>
<span class="splitString"> </span>
<span class="splitString">29AUG13 18:21</span>
<span class="splitString"> </span>
</td>
<td>
<abbr title="Faster Payments Outgoing">FPO</abbr>
</td>
<td class="numeric"></td>
<td class="numeric">599.91</td>
<td class="numeric">705.49</td>
</tr>
<tr class="alt">
<th class="first">29 Aug 13</th>
<td>
<span class="splitString">person3</span>
<span class="splitString"> </span>
<span class="splitString">ref>
</td>
<td>
<abbr title="Transfer">TFR</abbr>
</td>
<td class="numeric"></td>
<td class="numeric">30.80</td>
<td class="numeric">1,305.40</td>
</tr>
</tbody>
</table>
You have already gotten the text of each cell into the table_array. You just need to get the right cell. It is a 2D array, so the first index is the row and the second index is the column. Note that the array is 0-based index (ie 0 represents the first row/column).
# type in the first row
puts table_array[1][2]
#=> "FPO"
# person in the first row
puts table_array[1][1].split[0]
#=> "person2"
# out value in the second row
puts table_array[2][4]
#=> "599.91"
Working with these indicies is not so nice. As well, the splitting of the description column is harder at this point. Instead, I would suggest creating a hash for each row.
table_array = Array.new
table_rows = $browser.table(:class,'statement smartRewardsOffers')
table_rows.rows.to_a[1..-1].each do |row|
row_hash = Hash.new
row_hash[:date] = row.cell(:index => 0).text
row_hash[:person] = row.cell(:index => 1).span(:index => 0).text
row_hash[:code] = row.cell(:index => 1).span(:index => 2).text rescue ''
row_hash[:time] = row.cell(:index => 1).span(:index => 4).text rescue ''
row_hash[:type] = row.cell(:index => 2).text
row_hash[:in] = row.cell(:index => 3).text
row_hash[:out] = row.cell(:index => 4).text
row_hash[:balance] = row.cell(:index => 5).text
table_array << row_hash
end
# First data row's information
row = 0 # Note that the rows are 0-based index
puts table_array[row][:date] #=> "29 Aug 13"
puts table_array[row][:person] #=> "person1"
puts table_array[row][:code] #=> "ref"
puts table_array[row][:time] #=> "29AUG13 18:23"
puts table_array[row][:type] #=> "FPO"
puts table_array[row][:in] #=> ""
puts table_array[row][:out] #=> "42.81"
puts table_array[row][:balance] #=> "662.68"

Scraping Table with Nokogiri and need JSON output

So, I have a table with multiple rows and columns.
<table>
<tr>
<th>Employee Name</th>
<th>Reg Hours</th>
<th>OT Hours</th>
</tr>
<tr>
<td>Employee 1</td>
<td>10</td>
<td>20</td>
</tr>
<tr>
<td>Employee 2</td>
<td>5</td>
<td>10</td>
</tr>
</table>
There is also another table:
<table>
<tr>
<th>Employee Name</th>
<th>Revenue</th>
</tr>
<td>Employee 2</td>
<td>$10</td>
</tr>
<tr>
<td>Employee 1</td>
<td>$50</td>
</tr>
</table>
Notice that the employee order may be random between the tables.
How can I use nokogiri to create a json file that has each employee as an object, with their total hours and revenue?
Currently, I'm able to just get the individual table cells with some xpath. For example:
puts page.xpath(".//*[#id='UC255_tblSummary']/tbody/tr[2]/td[1]/text()").inner_text
Edit:
Using the page-object gem and the link from #Dave_McNulla, I tried this piece of code just to see what I get:
class MyPage
include PageObject
table(:report, :id => 'UC255_tblSummary')
def get_some_information
report_element[1][2].text
end
end
puts get_some_information
Nothing's being returned, however.
Data: https://gist.github.com/anonymous/d8cc0524160d7d03d37b
There's a duplicate of the hours table. The first one is fine. The other table needed is the accessory revenue table. (I'll also need the activations table, but I'll try to merge that from the code that merges the hours and accessory revenue tables.
I think the general approach is:
Create a hash for each table where the key is the employee
Merge the results from both tables together
Convert to JSON
Create a hash for each table where the key is the employee
This part you can do in Watir or Nokogiri. It only makes sense to use Nokogiri if Watir is giving poor performance due large tables.
Watir:
#I assume you would have a better way to identify the tables than by index
hours_table = browser.table(:index, 0)
wage_table = browser.table(:index, 1)
#Turn the tables into a hash
employee_hours = {}
hours_table.trs.drop(1).each do |tr|
tds = tr.tds
employee_hours[ tds[0].text ] = {"Reg Hours" => tds[1].text, "OT Hours" => tds[2].text}
end
#=> {"Employee 1"=>{"Reg Hours"=>"10", "OT Hours"=>"20"}, "Employee 2"=>{"Reg Hours"=>"5", "OT Hours"=>"10"}}
employee_wage = {}
wage_table.trs.drop(1).each do |tr|
tds = tr.tds
employee_wage[ tds[0].text ] = {"Revenue" => tds[1].text}
end
#=> {"Employee 2"=>{"Revenue"=>"$10"}, "Employee 1"=>{"Revenue"=>"$50"}}
Nokogiri:
page = Nokogiri::HTML.parse(browser.html)
hours_table = page.search('table')[0]
wage_table = page.search('table')[1]
employee_hours = {}
hours_table.search('tr').drop(1).each do |tr|
tds = tr.search('td')
employee_hours[ tds[0].text ] = {"Reg Hours" => tds[1].text, "OT Hours" => tds[2].text}
end
#=> {"Employee 1"=>{"Reg Hours"=>"10", "OT Hours"=>"20"}, "Employee 2"=>{"Reg Hours"=>"5", "OT Hours"=>"10"}}
employee_wage = {}
wage_table.search('tr').drop(1).each do |tr|
tds = tr.search('td')
employee_wage[ tds[0].text ] = {"Revenue" => tds[1].text}
end
#=> {"Employee 2"=>{"Revenue"=>"$10"}, "Employee 1"=>{"Revenue"=>"$50"}}
Merge the results from both tables together
You want to merge the two hashes together so that for a specific employee, the hash will include their hours as well as their revenue.
employee = employee_hours.merge(employee_wage){ |key, old, new| new.merge(old) }
#=> {"Employee 1"=>{"Revenue"=>"$50", "Reg Hours"=>"10", "OT Hours"=>"20"}, "Employee 2"=>{"Revenue"=>"$10", "Reg Hours"=>"5", "OT Hours"=>"10"}}
Convert to JSON
Based on this previous question, you can then convert the hash to json.
require 'json'
employee.to_json

Resources