I am going through a list of sites and going to each one using Watir to look for something in the source code of each page. However, after about 20 or 30 sites, the browser times out when loading a certain page and it breaks my script and I get this error:
rbuf_fill: execution expired (Timeout::Error)
I am trying to implement a way to detect when it times out and then restart testing the sites from where it left off but am having trouble.
This is my code:
ie = Watir::Browser.new :firefox, :profile => "default"
testsite_array = Array.new
y=0
File.open('topsites.txt').each do |line|
testsite_array[y] = line
y=y+1
end
total = testsite_array.length
count = 0
begin
while count <= total
site = testsite_array[count]
ie.goto site
if ie.html.include? 'teststring'
puts site + ' yes'
else
puts site + ' no'
end
rescue
retry
count = count+1
end
end
ie.close
Your loop can be:
#Use Ruby's method for iterating through the array
testsite_array.each do |site|
attempt = 1
begin
ie.goto site
if ie.html.include? 'teststring'
puts site + ' yes'
else
puts site + ' no'
end
rescue
attempt += 1
#Retry accessing the site or stop trying
if attempt > MAX_ATTEMPTS
puts site + ' site failed, moving on'
else
retry
end
end
end
Related
The following code when run on jenkins throws the error as 'The set password url is
invalid argument(Session info: chrome=100.0.4896.88) (Selenium::WebDriver::Error::InvalidArgumentError)
Backtrace: Ordinal0 [0x00A67413+2389011]"
STEP FILE:
When(/^the user clicks on activate online account link$/) do
on(CheckoutPage) do |page|
#sleep for 30 seconds for the email to be received
sleep 30
p #set_password_link = page.get_password_token
puts "The set password url is #{#set_password_link}"
page.navigate_to(#set_password_link)
end
end
Code FILE:
def get_password_token
begin
retries ||= 0
Gmail.new("xxxxxxx#gmail.com", "xxxxxxxx") do |gmail|
email = gmail.inbox.emails(:from => 'orders#cottonon.com', :subject => 'Activate your online account').last
html = email.html_part.body.to_s
urls = URI.extract(html, %w(https))
return urls[1]
end
rescue
retry if (retries += 1) < $code_retry
end
end
it could be number of things, maybe you just need URI.parse(urls[1]) or fetched url is invalid
also it seems like your gmail code always fetches last mail, which can return wrong one if email is still not received
Here is a gmail_check method that should be more resistant to mail content and time received
def gmail_check(url_part, receiver, timeout = 30)
time = (Time.now-5.minutes).to_i
Gmail.connect("xxxxxxx#gmail.com", "xxxxxxxx") do |gmail|
puts("Reading emails to: #{receiver}")
while (timeout > 0)
gmail.inbox.find(:gm => "\"after:#{time}\"").each do |mail|
if mail.message.to.first == receiver
content = mail.multipart? ? mail.html_part.decoded : mail.message.decoded
Nokogiri::HTML(content).css("a").each do |a|
href = a.attributes["href"].to_s
return href if href.include?(url_part)
end
end
end
puts("Waiting 5 seconds before reading mail again.")
timeout = timeout - 5
sleep 5
end
end
end
but you should be able to easily debug the problem by ssh-ing into jenkins machine,
type irb
type require gmail
paste your code there
check the url
good luck :)
I'm trying to return to the loop beginnig when an error shows up as the code below.
I'm using the command "next" when a casual error occurs but it is not coming back for the loop beginning.
describe 'Test', :test do
before(:each) do
visit '/admin'
end
it 'Adding new images' do
image = 'barcelona.jpg'
#imagem = Dir.pwd + '/spec/fixtures/' + image
produto = '1'
100.times do
visit '/admin/' + produto
if page.has_no_css?('#mensagem > h1')
within_frame(:xpath, "//*[#id='app-content']/main/div/iframe") do
find('#ctl00_Conteudo_tbxNome_txtId').set 'test_name'
find('#ctl00_Conteudo_BtnSalvar').click
if page.has_no_css?('#mensagem > h1')
find('#ctl00_Conteudo_tbxIdArquivoControle_lnkInserirArquivo').click
attach_file('ctl00_Conteudo_tbxIdArquivoControle_tbxArquivo', #imagem)
find('#ctl00_Conteudo_tbxIdArquivoControle_btnEnviar').click
if page.has_no_css?('#mensagem > h1')
find('#skuTabNavigation a[href="#tabImages"]').click
expect(page).to have_content image
puts 'Test ok'
else
puts 'Error was presented, starting over..'
next
end
else
puts 'Error was presented, starting over..'
next
end
end
else
puts 'Error was presented, starting over..'
next
end
end
end
end
I would like that every time when the system goes to "else" condition, it restart the loop.
I don't think there is a direct way to move back to the initial iteration of a loop. redo exists but it only moves you back to the current iteration.
In this case, you probably want to change the way you're looping so you can more easily control when to start/stop. For example:
i = 0
while i <= 100 do
if page.has_no_css?('#mensagem > h1')
i = 0
puts 'Error'
next
end
i += 1
end
So you don't have to reset the loop index and call puts each time you could rescue an error:
class MyError < StandardError; end
i = 0
while i <= 100 do
begin
if page.has_no_css?('#mensagem > h1')
raise MyError, 'thing was missing'
end
puts i
i += 1
rescue MyError => boom
puts "Error: #{boom.message}"
i = 0
redo
end
end
I have the BrowserMob Proxy set up correctly with Watir and it is capturing traffic and saving the HAR file; however, what it's not doing is that it's not capturing the traffic continuously. So following is what I'm trying to achieve:
Go to homepage
Click on a link to go to another page where I need to wait for some events to happen
Once on the second page, start capturing traffic after the event happens and wait for a specific call to occur and capture its contents.
What I'm noticing however, is that it's following all of the above steps, but on step 3 the proxy stops capturing traffic before that call is even made on that page. The HAR that is returned doesn't have that call in it hence the test fails before it even does its job. Following is how the code looks like.
class BMP
attr_accessor :server, :proxy, :net_har, :sel_proxy
def initialize
bm_path = File.path(Support::Paths.cucumber_root + "/browsermob-
proxy-2.1.4/bin/browsermob-proxy")
#server = BrowserMob::Proxy::Server.new(bm_path, {:port => 9999,
:log => false, :use_little_proxy => true, :timeout => 100})
#server.start
#proxy = #server.create_proxy
#sel_proxy = #proxy.selenium_proxy
#proxy.timeouts(:read => 50000, :request => 50000, :dns_cache =>
50000)
#net_har = #proxy.new_har("new_har", :capture_binary_content =>
true, :capture_headers => true, :capture_content => true)
end
def fetch_har_entries(target_url)
har_logs = File.join(Support::Paths.har_logs, "har_file # .
{Time.now.strftime("%m%d%y_%H%M%S")} .har")
#net_har.save_to har_logs
index = 0
while (#net_har.entries.count > index) do
if #net_har.entries[index].request.url.include?(target_url) &&
entry.request.method.eql?("GET")
logs = JSON.parse(entry.response.content.text) if not
entry.response.content.text.nil?
har_logs = File.join(Support::Paths.har_logs, "json_file_# .
{Time.now.strftime("%m%d%y_%H%M%S")}.json")
File.open(har_logs, "w") do |json|
json.write(logs)
end
break
end
index += 1
end
end
end
In my test file I have following
Then("I navigate to the homepage") do
visit(HomePage) do |page|
page.element.click
end
end
And("I should wait for event to capture traffic") do
visit(SecondPage) do |page|
page.wait_until{page.element2.present?)
BMP.fetch_har_entries("target/url")
end
end
What am I missing that is causing the proxy to not capture traffic in its entirety?
In case anyone gets here from a google search, I figured out how to resolve this on my own (thanks stackoverflow community for nothing, lol). So to resolve the issue, i used a custom retriable loop called eventually method.
logs = nil
eventually(timeout: 110, interval: 1) do
#net_har = #proxy.new_har("har", capture_binary_content: true, capture_headers: true, capture_content: true)
#net_har.entries.each do |entry|
begin
break if #net_har.entries.index entry == #net_har.entries.count
next unless entry.request.url.include?(target_url) &&
entry.request.post_data.text.include?(target_body_text)
logs = entry.request.post_data.text
break
rescue TypeError
fail("Response body for the network call came back empty")
end
end
raise EOFError if logs_hash.nil?
end
logs
end
Basically I'm assuming what was happening was the BMP would only cache or capture 30 seconds worth of har logs, and if my network event didn't occur during those 30 secs, i was SOL. So the what above code is doing is that's it's waiting for the logs variable to be not nil, if it is, it raises an EOFError and goes back to the loop initializes the har again and looks for the network call again. It keeps on doing that until it find the call or 110 seconds are up. Following is the eventually method I'm using
def eventually(options = {})
timeout = options[:timeout] || 30
interval = options[:interval] || 0.1
time_limit = Time.now + timeout
loop do
begin
yield
rescue EOFError => error
end
return if error.nil?
raise error if Time.now >= time_limit
sleep interval
end
end
The script is working fine for me.
Now, Iam downloading 500 files at a time.
I want to download the files by specifying some range like (10-30) files at one time and (30-60) at another time so on using ruby watir.
These is my code:
require 'watir'
require 'rubygems'
begin
chromedriver_path = File.join(File.absolute_path(File.dirname(__FILE__)),"browser","chromedriver.exe")
Selenium::WebDriver::Chrome.driver_path = chromedriver_path
browser = Watir::Browser.new:chrome
browser.goto "" //url to login
sleep 3
browser.text_field(:name=>"").set "" #e_id
sleep 3
browser.text_field(:name=>"").set "" #pwd
browser.button(:value=>"Login").click #submit
browser.div(:id=>"DivMenu").click
#sleep 3
browser.span(:class =>"down").click
sleep 3
browser.execute_script("document.getElementById('hlGenerateStatusReports').click();")
sleep 3
browser.execute_script("document.getElementById('Report').click();")
sleep 3
optncount = browser.select_list(:id => 'head_ddlClient').options.count
puts optncount
i = 0
while i <= optncount do
puts "Inside the loop i = "+i.to_s
i +=1
browser.select_list(:id => 'ddlClient').option(:index => i).select
sleep 3
browser.button(:value=>"Generate Report").click #submit
sleep 10
end
browser.goto " " //url to logout
rescue Exception => e
puts e.message
puts e.backtrace.inspect
end
I also have download scripts for images etc.
The small part of the one I use here to demonstrate the technique is for 500px.com.
I keep all downloaded files in a textfile and check if allready downloaded against this file. That way you can break off at any moment and resume later.
Of course you could break off if downloaded reached a limit.
I don't publish the whole of the script, just what matters regarding your question.
def download url
filename = "#{url[-32..-1]}.jpg"
if get(url, filename, SAVE_FOLDER)
File.open(PROGRESS_FILE,'a+'){|f|f.puts filename}
end
end
PROGRESS_FILE = './500px.txt'
downloaded = 0
....
response = http.get(path, headers)
json = JSON.parse(response.body)["data"]
processed = File.read(PROGRESS_FILE)
json.each do |item|
url = item['images'].last['url']
signature = url[-32..-1]
filename = "#{signature}.jpg"
# check if the filenames is in the textfile and so was downloaded allready
unless processed[filename]
download url
downloaded += 1
end
end
The file has more than 500 million lines so far and works fast enough (the downloading takes much longer). If I hit a limit I can easily put the lines in a simple database like Sqlite.
I am using a mechanize Ruby script to loop through about 1,000 records in a tab delimited file. Everything works as expected until i reach about 300 records.
Once I get to about 300 records, my script keeps calling rescue on every attempt and eventually stops working. I thought it was because I had not properly set max_history, but that doesn't seem to be making a difference.
Here is the error message that I start getting:
getaddrinfo: nodename nor servname provided, or not known
Any ideas on what I might be doing wrong here?
require 'mechanize'
result_counter = 0
used_file = File.open(ARGV[0])
total_rows = used_file.readlines.size
mechanize = Mechanize.new { |agent|
agent.open_timeout = 10
agent.read_timeout = 10
agent.max_history = 0
}
File.open(ARGV[0]).each do |line|
item = line.split("\t").map {|item| item.strip}
website = item[16]
name = item[11]
if website
begin
tries ||= 3
page = mechanize.get(website)
primary1 = page.link_with(text: 'text')
secondary1 = page.link_with(text: 'other_text')
contains_primary = true
contains_secondary = true
unless contains_primary || contains_secondary
1.times do |count|
result_counter+=1
STDERR.puts "Generate (#{result_counter}/#{total_rows}) #{name} - No"
end
end
for i in [primary1]
if i
page_to_visit = i.click
page_found = page_to_visit.uri
1.times do |count|
result_counter+=1
STDERR.puts "Generate (#{result_counter}/#{total_rows}) #{name}"
end
break
end
end
rescue Timeout::Error
STDERR.puts "Generate (#{result_counter}/#{total_rows}) #{name} - Timeout"
rescue => e
STDERR.puts e.message
STDERR.puts "Generate (#{result_counter}/#{total_rows}) #{name} - Rescue"
end
end
end
You get this error because you don't close the connection after you used it.
This should fix your problem:
mechanize = Mechanize.new { |agent|
agent.open_timeout = 10
agent.read_timeout = 10
agent.max_history = 0
agent.keep_alive = false
}