I got Ruby to travel to a web site, iterate through a list of campaigns and scrape the pages for specific data. The problem I have now is getting it from the structure Nokogiri gives me, and outputting it into a readable form.
campaign_list = Array.new
campaign_list.push(1042360, 1042386, 1042365, 992307)
browser = Watir::Browser.new :chrome
browser.goto '<redacted>'
browser.text_field(:id => 'email').set '<redacted>'
browser.text_field(:id => 'password').set '<redacted>'
browser.send_keys :enter
file = File.new('hourlysales.csv', 'w')
data = {}
campaign_list.each do |campaign|
browser.goto "<redacted>"
if browser.text.include? "Application Error"
puts "Error loading page, I recommend restarting script"
# Possibly automatic restart of script
else
hourly_data = Nokogiri::HTML.parse(browser.html).text
# file.write data
puts hourly_data
end
This is the output I get:
{"views":[[17,145],[18,165],[19,99],[20,71],[21,31],[22,26],[23,10],[0,15],[1,1], [2,18],[3,19],[4,35],[5,47],[6,44],[7,67],[8,179],[9,141],[10,112],[11,95],[12,46],[13,82],[14,79],[15,70],[16,103]],"orders":[[17,10],[18,9],[19,5],[20,1],[21,1],[22,0],[23,0],[0,1],[1,0],[2,1],[3,0],[4,1],[5,2],[6,1],[7,5],[8,11],[9,6],[10,5],[11,3],[12,1],[13,2],[14,4],[15,6],[16,7]],"conversion_rates":[0.06870229007633588,0.05442176870748299,0.050505050505050504,0.014084507042253521,0.03225806451612903,0.0,0.0,0.06666666666666667,0.0,0.05555555555555555,0.0,0.02857142857142857,0.0425531914893617,0.022727272727272728,0.07462686567164178,0.06134969325153374,0.0425531914893617,0.044642857142857144,0.031578947368421054,0.021739130434782608,0.024390243902439025,0.05063291139240506,0.08571428571428572,0.06741573033707865]}
The arrays stand for { views [[hour, # of views], [hour, # of views], etc. }. Same with orders. I don't need conversion rates.
I also need to add the values up for each key, so after doing this for 5 pages, I have one key for each hour of the day, and the total number of views for that hour. I tried a couple each loops, but couldn't make any progress.
I appreciate any help you guys can give me.
It looks like the output (which from your code I assume is the content of hourly_data) is JSON. In that case, it's easy to parse and add up the numbers. Something like this:
require "json" # at the top of your script
# ...
def sum_hours_values(data, hours_values=nil)
# Start with an empty hash that automatically initializes missing keys to `0`
hours_values ||= Hash.new {|hsh,hour| hsh[hour] = 0 }
# Iterate through the [hour, value] arrays, adding `value` to the running
# count for that `hour`, and return `hours_values`
data.each_with_object(hours_values) do |(hour, value), hsh|
hsh[hour] += value
end
end
# ... Watir/Nokogiri stuff here...
# Initialize these so they persist outside the loop
hours_views, orders_views = nil
campaign_list.each do |campaign|
browser.goto "<redacted>"
if browser.text.include? "Application Error"
# ...
else
# ...
hourly_data_parsed = JSON.parse(hourly_data)
hours_views = sum_hours_values(hourly_data_parsed["views"], hours_views)
hours_orders = sum_hours_values(hourly_data_parsed["orders"], orders_views)
end
end
puts "Views by hour:"
puts hours_views.sort.map {|hour_views| "%2i\t%4i" % hour_views }
puts "Orders by hour:"
puts hours_orders.sort.map {|hour_orders| "%2i\t%4i" % hour_orders }
P.S. There's a really nice recursive version of sum_hours_values I didn't include since the iterative version is clearer to most Ruby programmers. If you're into recursion I leave it as an exercise for you. ;)
I'm opening a set of URLs with a WebDriver in ruby – kind of a slideshow with 3 seconds intervals between "slides" (pages). Person looking at that happening might click Space, and I need that page URL saved to another file. How could I handle those interruptions – catch the event of Space pressed?
require "watir-webdriver"
urls = [...list of URLs here...]
saved = []
b = Watir::Browser.new
urls.each do |url|
b.goto url
sleep(3)
# ...what should I put here to handle Space pressed?
if space_pressed
saved << b.url
end
end
It looks like your problem might be solvable with STDIN.getch.
If you create a file with the following script and then run it in a command prompt (eg "ruby script.rb"), the script will:
Navigate to the url.
Ask if the url should be captured.
If the user does not input anything in 10 seconds, it will proceed onto the next url. I changed it from 3 since it was too fast. You can change the time back to 3 seconds in the line Timeout::timeout(10).
If the user did input something, it will save the url if the input was a space. Otherwise it will ignore it and move on.
Script:
require "watir-webdriver"
require 'io/console'
require 'timeout'
urls = ['www.google.ca', 'www.yahoo.ca', 'www.gmail.com']
saved = []
b = Watir::Browser.new
urls.each do |url|
b.goto url
# Give user 10 seconds to provide input
puts "Capture url '#{url}'?"
$stdout.flush
input = Timeout::timeout(10) {
input = STDIN.getch
} rescue ''
# If the user's input is a space, save the url
if input == ' '
saved << b.url
end
end
p saved
A couple of notes:
Note that the inputs need to be to the command prompt rather than the browser.
If the user presses a key before the allotted timeout, the script will immediately proceed to the next url.
I am using Sinatra to build a WebGuesser with Jumpstart Labs. I enter a number into a text field in my browser. I click submit and I am supposed to get a response saying if my number is too low or too high (or within 5). I use Shotgun to load the server. I want to be able to guess a number without having the random number change every time I guess.
Code:
require 'sinatra'
require 'sinatra/reloader'
def check_guess(guess)
if params["guess"].to_i == guess
"You got it right!"
elsif params["guess"].to_i > guess
if params["guess"].to_i > (guess + 5)
"Way too high!"
else
"Close.. but too high!"
end
elsif params["guess"].to_i < guess
if params["guess"].to_i < (guess - 5)
"Way too low!"
else
"Close.. but too low!"
end
end
end
# Home route
get '/' do
SECRET_NUMBER = rand(100)
message = check_guess(SECRET_NUMBER)
erb :index, :locals => { :message => message }
end
Currently, I get a new random number every time I guess which doesn't help. I feel like it may have something to do with where my SECRET_NUMBER is scope-wise. Any thoughts?
Every time there is a GET request to "/", the relevant code is executed, which generates (with warnings) a new SECRET_NUMBER.
One way to deal with this is to route to different URLs for the first guess (in which case a secret number should be generated), and the consecutive guesses (in which case a new secret number should not be generated).
Also, it is very bad practice to use a constant for something that changes over time.
You could store the initial value in the user session, for that you would have to enable sessions in sinatra.
configure do
enable :sessions
set :session_secret, "somesecretstring"
end
After that you can create a number by going to a certain route
get '/random' do
session[:number] = rand(100)
end
You can then check your guesses on a different route
get '/checkguess' do
check_guess(session[:number]) unless session[:number].nil?
end
That's the basic thought, you'd have to define it further though. Hope it helps you a little
I was searching for the exact same question now and :
require "sinatra"
require "sinatra/reloader"
number = rand(100)
get '/' do
guess = params["guess"].to_i
message = check_guess(guess, number)
erb :index, :locals => {:bok => number, :alert => guess, :msg => message}
end
putting the rng outside the get block just worked. Generated number stays same until you change something in the code (even adding a space to the end and saving the file would work to re-random the number.) or restart the server completely.
About the constant(SECRET_NUMBER), it helps to give check_guess method only one argument since you define it as a constant at top. (since i'm new to ruby someone can correct me if i'm wrong.)
SECRET_NUMBER = rand(100)
get '/' do ... end
def check_guess(guess)
if guess < SECRET_NUMBER
"Your Guess is Too LOW!"
elsif guess > SECRET_NUMBER
"Your Guess is Too HIGH!"
else
"Conguratulations! You guessed it right:)"
end
For anyone still looking for the answer. rand should be defined outside of get block
require 'sinatra'
require 'sinatra/reloader'
rand = (rand() * 100).to_i
get '/' do
"The secret number is #{rand}"
end
I have a database full of URLs that I need to test HTTP response time for on a regular basis. I want to have many worker threads combing the database at all times for a URL that hasn't been tested recently, and if it finds one, test it.
Of course, this could cause multiple threads to snag the same URL from the database. I don't want this. So, I'm trying to use Mutexes to prevent this from happening. I realize there are other options at the database level (optimistic locking, pessimistic locking), but I'd at least prefer to figure out why this isn't working.
Take a look at this test code I wrote:
threads = []
mutex = Mutex.new
50.times do |i|
threads << Thread.new do
while true do
url = nil
mutex.synchronize do
url = URL.first(:locked_for_testing => false, :times_tested.lt => 150)
if url
url.locked_for_testing = true
url.save
end
end
if url
# simulate testing the url
sleep 1
url.times_tested += 1
url.save
mutex.synchronize do
url.locked_for_testing = false
url.save
end
end
end
sleep 1
end
end
threads.each { |t| t.join }
Of course there is no real URL testing here. But what should happen is at the end of the day, each URL should end up with "times_tested" equal to 150, right?
(I'm basically just trying to make sure the mutexes and worker-thread mentality are working)
But each time I run it, a few odd URLs here and there end up with times_tested equal to a much lower number, say, 37, and locked_for_testing frozen on "true"
Now as far as I can tell from my code, if any URL gets locked, it will have to unlock. So I don't understand how some URLs are ending up "frozen" like that.
There are no exceptions and I've tried adding begin/ensure but it didn't do anything.
Any ideas?
I'd use a Queue, and a master to pull what you want. if you have a single master you control what's getting accessed. This isn't perfect but it's not going to blow up because of concurrency, remember if you aren't locking the database a mutex doesn't really help you is something else accesses the db.
code completely untested
require 'thread'
queue = Queue.new
keep_running = true
# trap cntrl_c or something to reset keep_running
master = Thread.new do
while keep_running
# check if we need some work to do
if queue.size == 0
urls = URL.all(:times_tested.lt => 150)
urls.each do |u|
queue << u.id
end
# keep from spinning the queue
sleep(0.1)
end
end
end
workers = []
50.times do
workers << Thread.new do
while keep_running
# get an id
id = queue.shift
url = URL.get(id)
#do something with the url
url.save
sleep(0.1)
end
end
end
workers.each do |w|
w.join
end
I have a mechanize function to log me out of a site but on very rare occasions it times me out. The function involves going to a specific page, and then clicking on a logout button. On the occasional that mechanize suffers a timeout when either going to the logout page or clicking the logout button the code crashes. So I put in a small rescue and it seems to be working as seen below the first piece of code.
def logmeout(agent)
page = agent.get('http://www.example.com/')
agent.click(page.link_with(:text => /Log Out/i))
end
Logmeout with rescue:
def logmeout(agent)
begin
page = agent.get('http://www.example.com/')
agent.click(page.link_with(:text => /Log Out/i))
rescue Timeout::Error
puts "Timeout!"
retry
end
end
Assuming I understand rescue correctly, it will do both actions over even if just the clicking timed out, so in the effort to be efficient I am was wondering if I could use a proc in this situation and pass it a code block. Would something like this work:
def trythreetimes
tries = 0
begin
yield
rescue
tries += 1
puts "Trying again!"
retry if tries <= 3
end
end
def logmeout(agent)
trythreetimes {page = agent.get('http://www.example.com/')}
trythreetimes {agent.click(page.link_with(:text => /Log Out/i))}
end
Note in my trythreetimes function I left it as generic rescue so the function would be more re-usable.
Thanks so much for any help anyone can provide, I realize there are a couple different questions in here but they are all things I am trying to learn!
Instead of retrying some timeouts on some mechanize requests I think you'd better set Mechanize::HTTP::Agent::read_timeout attribute to a reasonable amount of seconds like 2 or 5, anyway one that prevent timeouts errors for this request.
Then, it seem's that your log out procedure only required access to a simple HTTP GET request. I mean there is no form to fill in so no HTTP POST request.
So if I were you, I would prefere inspected the page source code (Ctrl+U with Firefox or Chrome) in order to identify the link which is reached by your agent.click(page.link_with(:text => /Log Out/i))
It should be faster because these type of pages are usually blank and Mechanize will not have to load a full html web page in memory.
Here is the code I would prefer use :
def logmeout(agent)
begin
agent.read_timeout=2 #set the agent time out
page = agent.get('http://www.example.com/logout_url.php')
agent.history.pop() #delete this request in the history
rescue Timeout::Error
puts "Timeout!"
puts "read_timeout attribute is set to #{agent.read_timeout}s" if !agent.read_timeout.nil?
#retry #retry is no more needed
end
end
but you can use your retry function too :
def trythreetimes
tries = 0
begin
yield
rescue Exception => e
tries += 1
puts "Error: #{e.message}"
puts "Trying again!" if tries <= 3
retry if tries <= 3
puts "No more attempt!"
end
end
def logmeout(agent)
trythreetimes do
agent.read_timeout=2 #set the agent time out
page = agent.get('http://www.example.com/logout_url.php')
agent.history.pop() #delete this request in the history
end
end
hope it helps ! ;-)
Using mechanize 1.0.0 I got this problem from a different source of error.
In my case I was blocked by proxy and then SSL. This worked for me:
ag = Mechanize.new
ag.set_proxy('yourproxy', yourport)
ag.agent.http.verify_mode = OpenSSL::SSL::VERIFY_NONE
ag.get( url )