How can I get my script to loop? - ruby

I have it where my script signs in and goes to a browser url, yet when it signs out of the current web page it just sits there and won't restart the loop. How can I get the loop to realize its done and to restart?
x = 0
while x <= 5
File.open("yahoo_accounts.txt") do |email|
email.each do |item|
email, password = item.chomp.split(',')
emails << email
passwords << password
emails.zip(passwords) { |name, pass|
browser = Watir::Browser.new :ff
browser.goto "url"
#logs in and does what its suppose to do with the name and pass
}
end
x += 1
next
end
end
When the script is done it just sits at the webpage...I'm trying to get it to go to the beginning again...
You would think it would take each name,pass and go back to the beginning url.
Thanks for your help.

It looks like you may not be calling browser.close appropriately. In my quick mock-up testing, I definitely get weird behaviour if I don't do that. You're also using non-idiomatic Ruby looping. Try this:
5.times do
File.open("yahoo_accounts.txt") do |email|
email.each do |item|
email, password = item.chomp.split(',')
emails << email
passwords << password
emails.zip(passwords) do |name, pass|
browser = Watir::Browser.new :ff
browser.goto "url"
#logs in and does what its suppose to do with the name and pass
browser.close
end
end
end
end
EDIT:
Alternatively, if you want the same exact Watir::Browser instance to be doing all the work, initialize and close outside of your main loop. Right now, you're spawning a new Browser instance with every iteration of emails.zip, times every iteration of email.each, times the 5 iterations of your while loop. This is just ungainly, and may be screwing up your expected results. So just doing:
browser = Watir::Browser.new :ff
5.times do
... loop code ...
end
browser.close
Will at least make whatever's happening under the hood clearer.

Related

Structuring Nokogiri output without HTML tags

I got Ruby to travel to a web site, iterate through a list of campaigns and scrape the pages for specific data. The problem I have now is getting it from the structure Nokogiri gives me, and outputting it into a readable form.
campaign_list = Array.new
campaign_list.push(1042360, 1042386, 1042365, 992307)
browser = Watir::Browser.new :chrome
browser.goto '<redacted>'
browser.text_field(:id => 'email').set '<redacted>'
browser.text_field(:id => 'password').set '<redacted>'
browser.send_keys :enter
file = File.new('hourlysales.csv', 'w')
data = {}
campaign_list.each do |campaign|
browser.goto "<redacted>"
if browser.text.include? "Application Error"
puts "Error loading page, I recommend restarting script"
# Possibly automatic restart of script
else
hourly_data = Nokogiri::HTML.parse(browser.html).text
# file.write data
puts hourly_data
end
This is the output I get:
{"views":[[17,145],[18,165],[19,99],[20,71],[21,31],[22,26],[23,10],[0,15],[1,1], [2,18],[3,19],[4,35],[5,47],[6,44],[7,67],[8,179],[9,141],[10,112],[11,95],[12,46],[13,82],[14,79],[15,70],[16,103]],"orders":[[17,10],[18,9],[19,5],[20,1],[21,1],[22,0],[23,0],[0,1],[1,0],[2,1],[3,0],[4,1],[5,2],[6,1],[7,5],[8,11],[9,6],[10,5],[11,3],[12,1],[13,2],[14,4],[15,6],[16,7]],"conversion_rates":[0.06870229007633588,0.05442176870748299,0.050505050505050504,0.014084507042253521,0.03225806451612903,0.0,0.0,0.06666666666666667,0.0,0.05555555555555555,0.0,0.02857142857142857,0.0425531914893617,0.022727272727272728,0.07462686567164178,0.06134969325153374,0.0425531914893617,0.044642857142857144,0.031578947368421054,0.021739130434782608,0.024390243902439025,0.05063291139240506,0.08571428571428572,0.06741573033707865]}
The arrays stand for { views [[hour, # of views], [hour, # of views], etc. }. Same with orders. I don't need conversion rates.
I also need to add the values up for each key, so after doing this for 5 pages, I have one key for each hour of the day, and the total number of views for that hour. I tried a couple each loops, but couldn't make any progress.
I appreciate any help you guys can give me.
It looks like the output (which from your code I assume is the content of hourly_data) is JSON. In that case, it's easy to parse and add up the numbers. Something like this:
require "json" # at the top of your script
# ...
def sum_hours_values(data, hours_values=nil)
# Start with an empty hash that automatically initializes missing keys to `0`
hours_values ||= Hash.new {|hsh,hour| hsh[hour] = 0 }
# Iterate through the [hour, value] arrays, adding `value` to the running
# count for that `hour`, and return `hours_values`
data.each_with_object(hours_values) do |(hour, value), hsh|
hsh[hour] += value
end
end
# ... Watir/Nokogiri stuff here...
# Initialize these so they persist outside the loop
hours_views, orders_views = nil
campaign_list.each do |campaign|
browser.goto "<redacted>"
if browser.text.include? "Application Error"
# ...
else
# ...
hourly_data_parsed = JSON.parse(hourly_data)
hours_views = sum_hours_values(hourly_data_parsed["views"], hours_views)
hours_orders = sum_hours_values(hourly_data_parsed["orders"], orders_views)
end
end
puts "Views by hour:"
puts hours_views.sort.map {|hour_views| "%2i\t%4i" % hour_views }
puts "Orders by hour:"
puts hours_orders.sort.map {|hour_orders| "%2i\t%4i" % hour_orders }
P.S. There's a really nice recursive version of sum_hours_values I didn't include since the iterative version is clearer to most Ruby programmers. If you're into recursion I leave it as an exercise for you. ;)

ruby webdriver: how to catch key pressed while the script is running?

I'm opening a set of URLs with a WebDriver in ruby – kind of a slideshow with 3 seconds intervals between "slides" (pages). Person looking at that happening might click Space, and I need that page URL saved to another file. How could I handle those interruptions – catch the event of Space pressed?
require "watir-webdriver"
urls = [...list of URLs here...]
saved = []
b = Watir::Browser.new
urls.each do |url|
b.goto url
sleep(3)
# ...what should I put here to handle Space pressed?
if space_pressed
saved << b.url
end
end
It looks like your problem might be solvable with STDIN.getch.
If you create a file with the following script and then run it in a command prompt (eg "ruby script.rb"), the script will:
Navigate to the url.
Ask if the url should be captured.
If the user does not input anything in 10 seconds, it will proceed onto the next url. I changed it from 3 since it was too fast. You can change the time back to 3 seconds in the line Timeout::timeout(10).
If the user did input something, it will save the url if the input was a space. Otherwise it will ignore it and move on.
Script:
require "watir-webdriver"
require 'io/console'
require 'timeout'
urls = ['www.google.ca', 'www.yahoo.ca', 'www.gmail.com']
saved = []
b = Watir::Browser.new
urls.each do |url|
b.goto url
# Give user 10 seconds to provide input
puts "Capture url '#{url}'?"
$stdout.flush
input = Timeout::timeout(10) {
input = STDIN.getch
} rescue ''
# If the user's input is a space, save the url
if input == ' '
saved << b.url
end
end
p saved
A couple of notes:
Note that the inputs need to be to the command prompt rather than the browser.
If the user presses a key before the allotted timeout, the script will immediately proceed to the next url.

Sinatra + Ruby: Random Number keeps changing every time I guess. Scope issue?

I am using Sinatra to build a WebGuesser with Jumpstart Labs. I enter a number into a text field in my browser. I click submit and I am supposed to get a response saying if my number is too low or too high (or within 5). I use Shotgun to load the server. I want to be able to guess a number without having the random number change every time I guess.
Code:
require 'sinatra'
require 'sinatra/reloader'
def check_guess(guess)
if params["guess"].to_i == guess
"You got it right!"
elsif params["guess"].to_i > guess
if params["guess"].to_i > (guess + 5)
"Way too high!"
else
"Close.. but too high!"
end
elsif params["guess"].to_i < guess
if params["guess"].to_i < (guess - 5)
"Way too low!"
else
"Close.. but too low!"
end
end
end
# Home route
get '/' do
SECRET_NUMBER = rand(100)
message = check_guess(SECRET_NUMBER)
erb :index, :locals => { :message => message }
end
Currently, I get a new random number every time I guess which doesn't help. I feel like it may have something to do with where my SECRET_NUMBER is scope-wise. Any thoughts?
Every time there is a GET request to "/", the relevant code is executed, which generates (with warnings) a new SECRET_NUMBER.
One way to deal with this is to route to different URLs for the first guess (in which case a secret number should be generated), and the consecutive guesses (in which case a new secret number should not be generated).
Also, it is very bad practice to use a constant for something that changes over time.
You could store the initial value in the user session, for that you would have to enable sessions in sinatra.
configure do
enable :sessions
set :session_secret, "somesecretstring"
end
After that you can create a number by going to a certain route
get '/random' do
session[:number] = rand(100)
end
You can then check your guesses on a different route
get '/checkguess' do
check_guess(session[:number]) unless session[:number].nil?
end
That's the basic thought, you'd have to define it further though. Hope it helps you a little
I was searching for the exact same question now and :
require "sinatra"
require "sinatra/reloader"
number = rand(100)
get '/' do
guess = params["guess"].to_i
message = check_guess(guess, number)
erb :index, :locals => {:bok => number, :alert => guess, :msg => message}
end
putting the rng outside the get block just worked. Generated number stays same until you change something in the code (even adding a space to the end and saving the file would work to re-random the number.) or restart the server completely.
About the constant(SECRET_NUMBER), it helps to give check_guess method only one argument since you define it as a constant at top. (since i'm new to ruby someone can correct me if i'm wrong.)
SECRET_NUMBER = rand(100)
get '/' do ... end
def check_guess(guess)
if guess < SECRET_NUMBER
"Your Guess is Too LOW!"
elsif guess > SECRET_NUMBER
"Your Guess is Too HIGH!"
else
"Conguratulations! You guessed it right:)"
end
For anyone still looking for the answer. rand should be defined outside of get block
require 'sinatra'
require 'sinatra/reloader'
rand = (rand() * 100).to_i
get '/' do
"The secret number is #{rand}"
end

Odd bug with DataMapper, Mutexes, and Threads?

I have a database full of URLs that I need to test HTTP response time for on a regular basis. I want to have many worker threads combing the database at all times for a URL that hasn't been tested recently, and if it finds one, test it.
Of course, this could cause multiple threads to snag the same URL from the database. I don't want this. So, I'm trying to use Mutexes to prevent this from happening. I realize there are other options at the database level (optimistic locking, pessimistic locking), but I'd at least prefer to figure out why this isn't working.
Take a look at this test code I wrote:
threads = []
mutex = Mutex.new
50.times do |i|
threads << Thread.new do
while true do
url = nil
mutex.synchronize do
url = URL.first(:locked_for_testing => false, :times_tested.lt => 150)
if url
url.locked_for_testing = true
url.save
end
end
if url
# simulate testing the url
sleep 1
url.times_tested += 1
url.save
mutex.synchronize do
url.locked_for_testing = false
url.save
end
end
end
sleep 1
end
end
threads.each { |t| t.join }
Of course there is no real URL testing here. But what should happen is at the end of the day, each URL should end up with "times_tested" equal to 150, right?
(I'm basically just trying to make sure the mutexes and worker-thread mentality are working)
But each time I run it, a few odd URLs here and there end up with times_tested equal to a much lower number, say, 37, and locked_for_testing frozen on "true"
Now as far as I can tell from my code, if any URL gets locked, it will have to unlock. So I don't understand how some URLs are ending up "frozen" like that.
There are no exceptions and I've tried adding begin/ensure but it didn't do anything.
Any ideas?
I'd use a Queue, and a master to pull what you want. if you have a single master you control what's getting accessed. This isn't perfect but it's not going to blow up because of concurrency, remember if you aren't locking the database a mutex doesn't really help you is something else accesses the db.
code completely untested
require 'thread'
queue = Queue.new
keep_running = true
# trap cntrl_c or something to reset keep_running
master = Thread.new do
while keep_running
# check if we need some work to do
if queue.size == 0
urls = URL.all(:times_tested.lt => 150)
urls.each do |u|
queue << u.id
end
# keep from spinning the queue
sleep(0.1)
end
end
end
workers = []
50.times do
workers << Thread.new do
while keep_running
# get an id
id = queue.shift
url = URL.get(id)
#do something with the url
url.save
sleep(0.1)
end
end
end
workers.each do |w|
w.join
end

Catching timeout errors with ruby mechanize

I have a mechanize function to log me out of a site but on very rare occasions it times me out. The function involves going to a specific page, and then clicking on a logout button. On the occasional that mechanize suffers a timeout when either going to the logout page or clicking the logout button the code crashes. So I put in a small rescue and it seems to be working as seen below the first piece of code.
def logmeout(agent)
page = agent.get('http://www.example.com/')
agent.click(page.link_with(:text => /Log Out/i))
end
Logmeout with rescue:
def logmeout(agent)
begin
page = agent.get('http://www.example.com/')
agent.click(page.link_with(:text => /Log Out/i))
rescue Timeout::Error
puts "Timeout!"
retry
end
end
Assuming I understand rescue correctly, it will do both actions over even if just the clicking timed out, so in the effort to be efficient I am was wondering if I could use a proc in this situation and pass it a code block. Would something like this work:
def trythreetimes
tries = 0
begin
yield
rescue
tries += 1
puts "Trying again!"
retry if tries <= 3
end
end
def logmeout(agent)
trythreetimes {page = agent.get('http://www.example.com/')}
trythreetimes {agent.click(page.link_with(:text => /Log Out/i))}
end
Note in my trythreetimes function I left it as generic rescue so the function would be more re-usable.
Thanks so much for any help anyone can provide, I realize there are a couple different questions in here but they are all things I am trying to learn!
Instead of retrying some timeouts on some mechanize requests I think you'd better set Mechanize::HTTP::Agent::read_timeout attribute to a reasonable amount of seconds like 2 or 5, anyway one that prevent timeouts errors for this request.
Then, it seem's that your log out procedure only required access to a simple HTTP GET request. I mean there is no form to fill in so no HTTP POST request.
So if I were you, I would prefere inspected the page source code (Ctrl+U with Firefox or Chrome) in order to identify the link which is reached by your agent.click(page.link_with(:text => /Log Out/i))
It should be faster because these type of pages are usually blank and Mechanize will not have to load a full html web page in memory.
Here is the code I would prefer use :
def logmeout(agent)
begin
agent.read_timeout=2 #set the agent time out
page = agent.get('http://www.example.com/logout_url.php')
agent.history.pop() #delete this request in the history
rescue Timeout::Error
puts "Timeout!"
puts "read_timeout attribute is set to #{agent.read_timeout}s" if !agent.read_timeout.nil?
#retry #retry is no more needed
end
end
but you can use your retry function too :
def trythreetimes
tries = 0
begin
yield
rescue Exception => e
tries += 1
puts "Error: #{e.message}"
puts "Trying again!" if tries <= 3
retry if tries <= 3
puts "No more attempt!"
end
end
def logmeout(agent)
trythreetimes do
agent.read_timeout=2 #set the agent time out
page = agent.get('http://www.example.com/logout_url.php')
agent.history.pop() #delete this request in the history
end
end
hope it helps ! ;-)
Using mechanize 1.0.0 I got this problem from a different source of error.
In my case I was blocked by proxy and then SSL. This worked for me:
ag = Mechanize.new
ag.set_proxy('yourproxy', yourport)
ag.agent.http.verify_mode = OpenSSL::SSL::VERIFY_NONE
ag.get( url )

Resources