CSS Selector for Table Row with X number of Cells - ruby

I'm trying to scrape some content off of a website and I am having trouble selecting the correct elements.
I'm using Nokogiri, and, as I know CSS best, I am trying to use it to select the data I want.
There is a big table with rows I do not want, but these can change; They are not always row 4, 5, 6, 10, 14 for example.
The only way I can tell if it's a row I want is if the row has TD tags in it.
What is the right CSS selector to do this?
# Search for nodes by css
doc.css('#mainContent p table tr').each do |td|
throw td
end
EDIT:
I'm trying to scrape boxrec.com/schedule.php. I want the rows for each match, but, it's a very large table with numerous rows which aren't the match. The first couple rows of each date section aren't needed, including every other line which has "bout subject to change....", and also spacing rows between days.
SOLUTION:
doc.xpath("//table[#align='center'][not(#id) and not(#class)]/tr").each do |trow|
#Try get the date
if trow.css('.show_left b').length == 1
match_date = trow.css('.show_left b').first.content
end
if trow.css('td a').length == 2 and trow.css('* > td').length > 10
first_boxer_td = trow.css('td:nth-child(5)').first
second_boxer_td = trow.css('td:nth-child(5)').first
match = {
:round => trow.css('td:nth-child(3)').first.content.to_i,
:weight => trow.css('td:nth-child(4)').first.content.to_s,
:first_boxer_name => first_boxer_td.css('a').first.content.to_s,
:first_boxer_link => first_boxer_td.css('a').first.attribute('href').to_s,
:second_boxer_name => second_boxer_td.css('a').first.content.to_s,
:second_boxer_link => second_boxer_td.css('a').first.attribute('href').to_s,
:date => Time.parse(match_date)
}
#:Weight => trow.css('td:nth-child(4)').to_s
#:BoxerA => trow.css('td:nth-child(5)').to_s
#:BoxerB => trow.css('td:nth-child(9)').to_s
myscrape.push(match)
end
end

You won't be able to tell how many td elements a tr contains, but you can tell if it is empty or not:
doc.css('#mainContent p table tr:not(:empty)').each do |td|
throw td
end

You can do something like this:
tr rows with a 4th td
doc.xpath('//tr/td[4]/..')
another way with css:
doc.css('tr').select{|tr| tr.css('td').length >= 4}

Related

Elixir/Phoenix sum of the column

I'm trying to get the sum of the particular column.
I have a schema of orders, with the field total, that stores the total price.
Now I'm trying to created a query that will sum total value of all the orders, however not sure if I'm doing it right.
Here is what i have so far:
def create(conn, %{"statistic" => %{"date_from" => %{"day" => day_from, "month" => month_from, "year" => year_from}}}) do
date_from = Ecto.DateTime.cast!({{year_from, month_from, day_from}, {0, 0, 0, 0}})
revenue = Repo.all(from p in Order, where: p.inserted_at >= ^date_from, select: sum(p.total))
render(conn, "result.html", revenue: revenue)
end
And just calling it like <%= #revenue %> in the html.eex.
As of right now, it doesn't return errors, just renders random symbol on the page, instead of the total revenue.
I think my query is wrong, but couldn't find good information about how to make it work properly. Any help appreciated, thanks!
Your query returns just 1 value, and Repo.all wraps it in a list. When you print a list using <%= ... %>, it treats integers inside the list as Unicode codepoints, and you get the character with that codepoint as output on the page. The fix is to use Repo.one instead, which will return the value directly, which in this case is an integer.
revenue = Repo.one(from p in Order, where: p.inserted_at >= ^date_from, select: sum(p.total))
#Dogbert's answer is correct. It is worth noting that if you are using Ecto 2.0 (currently in release candidate) then you can use Repo.aggregate/4:
revenue = Repo.aggregate(from p in Order, where: p.inserted_at >= ^date_from, :sum, :total)

Write actual values to bar chart using Gruff within Ruby

I am generating a bar chart with values [1,5,10,23]. Currently, I have no way of knowing those exact values when looking at the image generated by Gruff. I just know that 23 falls somewhere between the lines of 20 and 25.
Is it possible to write the exact values within the image?
I think you are looking for labels
g = Gruff::Bar.new
g.title = 'Wow! Look at this!'
g.data = "something", [1,5,10,23]
g.labels = { 0 => '1', 1 => '5', 2 => '10', 3 => '23'}
Read the documentation for more info on labels
I think this is what you are looking for:
g.show_labels_for_bar_values = true

A better way of iterating and filling data in HTML table using Watir

I have a table which might contain up to 50 rows and has 9 columns. However the code I am using to fill out data in the table its taking so long.
Is there a faster way of doing it?
Here is my code
table = $browser.div(:id => "market").table(:id => 'tableTradeIndMarket')
i = 3 + rand(1..table.rows.length-4)
table.rows[i].cells[4].select_list.select 'Buy'
table.rows[i].cells[5].select_list.select 'Market'
table.rows[i].cells[6].text_field.set ($share)
table.rows[i+1].cells[4].select_list.select 'Buy'
table.rows[i+1].cells[5].select_list.select 'Limit'
table.rows[i+1].cells[6].text_field.set ($share)
//To take out dollar sign which is found on the second column and put that value in to another column of the same row
table.rows[i+1].cells[8].text_field.set(
table.rows[i+1].cells[2].text[1..table.rows[i+1].cells[2].text.length]
)
table.rows[i+1].cells[9].select_list.select 'Day'
table.rows[i+2].cells[4].select_list.select 'Buy'
table.rows[i+2].cells[5].select_list.select 'Stop'
table.rows[i+2].cells[6].text_field.set ($share)
table.rows[i+2].cells[7].text_field.set ( table.rows[i+2].cells[2].text[1..table.rows[i+2].cells[2].text.length])
table.rows[i+2].cells[9].select_list.select 'GTC'
table.rows[i+3].cells[4].select_list.select 'Buy'
table.rows[i+3].cells[5].select_list.select 'Stop/Limit'
table.rows[i+3].cells[6].text_field.set ($share)
table.rows[i+3].cells[7].text_field.set ( table.rows[i+3].cells[2].text[1..table.rows[i+3].cells[2].text.length])
table.rows[i+3].cells[8].text_field.set ( table.rows[i+3].cells[2].text[1..table.rows[i+3].cells[2].text.length])
table.rows[i+3].cells[9].select_list.select 'Day'
Your best bet is likely to locate the row element with the help of Nokogiri. Ċ½eljko Filipin had a good blog post about doing this - http://zeljkofilipin.com/watir-nokogiri
As an example, the inputting of your ith row would be:
row_css = Nokogiri::HTML(browser.html).at_css("table#tableTradeIndMarket tr:nth-of-type(#{i})").css_path
row = browser.element(:css, row_css).to_subtype
row.cells[4].select_list.select 'Buy'
row.cells[5].select_list.select 'Market'
row.cells[6].text_field.set ($share)
You could apply the same concept to the other rows that you are inputting.
This helped at least for the test table that I was using.

Ruby Watir: Selecting a specific row

Consider the following html
http://www.carbide-red.com/prog/test_table.html
I have worked out that I can move left to right on the columns using
browser.td(:text => "Equipment").parent.td(:index => "2").flash
to flash the 3rd column over on the line containing "Equipement"
But how can I move down a certain number of rows? I am having terrible luck using .tr & .rows, no matter how I try it just crashes out when using those. Even something as simple as
browser.tr(:text => "Equipment").flash
Am I just misunderstanding how tr/row works?
Specific Row/Column
It sounds like you have already calculated which row/column you want. You can get the cell at a specific row/column index by simply doing:
browser.table[row_index][column_index]
Where row_index and column_index are integers for the row and column you want (note that it is zero-based index).
Specific Row
You can also do the following to select rows based on an index:
browser.table.tr(:index, 1).flash
browser.table.row(:index, 2).flash
Note that .tr includes nested tables while .row ignores nested tables.
Update - Find Rows After Specific Row
To find a row after a specific row containing a certain text, determine the index of the specific row first. Then you can locate the other rows in relation to it. Here are some examples:
#Get the 3rd row down from the row containing the text 'Equipment'
starting_row_index = browser.table.rows.to_a.index{ |row| row.text =~ /Equipment/ }
offset = 3
row = browser.table.row(:index, starting_row_index + offset)
puts row.text
# => CAT03 ...
#Get the 3rd row down from the row containing a cell with yellow background colour
starting_row_index = browser.table.rows.to_a.index{ |row| row.td(:css => "td[bgcolor=yellow]").present? }
offset = 3
row = browser.table.row(:index, starting_row_index + offset)
puts row.text
# => ETS36401 ...
#Output the first column text of each row after the row containing a cell with yellow background colour
starting_row_index = browser.table.rows.to_a.index{ |row| row.td(:css => "td[bgcolor=yellow]").present? }
(starting_row_index + 1).upto(browser.table.rows.length - 1){ |x| puts browser.table[x][0].text }
# => CAT03, CAT08, ..., INTEGRA10, INTEGRA11
Let me know if that helps or if you have a specific example you want.

increment value in a hash

I have a bunch of posts which have category tags in them.
I am trying to find out how many times each category has been used.
I'm using rails with mongodb, BUT I don't think I need to be getting the occurrence of categories from the db, so the mongo part shouldn't matter.
This is what I have so far
#recent_posts = current_user.recent_posts #returns the 10 most recent posts
#categories_hash = {'tech' => 0, 'world' => 0, 'entertainment' => 0, 'sports' => 0}
#recent_posts do |cat|
cat.categories.each do |addCat|
#categories_hash.increment(addCat) #obviously this is where I'm having problems
end
end
end
the structure of the post is
{"_id" : ObjectId("idnumber"), "created_at" : "Tue Aug 03...", "categories" :["world", "sports"], "message" : "the text of the post", "poster_id" : ObjectId("idOfUserPoster"), "voters" : []}
I'm open to suggestions on how else to get the count of categories, but I will want to get the count of voters eventually, so it seems to me the best way is to increment the categories_hash, and then add the voters.length, but one thing at a time, i'm just trying to figure out how to increment values in the hash.
If you aren't familiar with map/reduce and you don't care about scaling up, this is not as elegant as map/reduce, but should be sufficient for small sites:
#categories_hash = Hash.new(0)
current_user.recent_posts.each do |post|
post.categories.each do |category|
#categories_hash[category] += 1
end
end
If you're using mongodb, an elegant way to aggregate tag usage would be, to use a map/reduce operation. Mongodb supports map/reduce operations using JavaScript code. Map/reduce runs on the db server(s), i.e. your application does not have to retrieve and analyze every document (which wouldn't scale well for large collections).
As an example, here are the map and reduce functions I use in my blog on the articles collection to aggregate the usage of tags (which is used to build the tag cloud in the sidebar). Documents in the articles collection have a key named 'tags' which holds an array of strings (the tags)
The map function simply emits 1 on every used tag to count it:
function () {
if (this.tags) {
this.tags.forEach(function (tag) {
emit(tag, 1);
});
}
}
The reduce function sums up the counts:
function (key, values) {
var total = 0;
values.forEach(function (v) {
total += v;
});
return total;
}
As a result, the database returns a hash that has a key for every tag and its usage count as a value. E.g.:
{ 'rails' => 5, 'ruby' => 12, 'linux' => 3 }

Resources