Browsermob Proxy + Watir not capturing traffic continuously - ruby

I have the BrowserMob Proxy set up correctly with Watir and it is capturing traffic and saving the HAR file; however, what it's not doing is that it's not capturing the traffic continuously. So following is what I'm trying to achieve:
Go to homepage
Click on a link to go to another page where I need to wait for some events to happen
Once on the second page, start capturing traffic after the event happens and wait for a specific call to occur and capture its contents.
What I'm noticing however, is that it's following all of the above steps, but on step 3 the proxy stops capturing traffic before that call is even made on that page. The HAR that is returned doesn't have that call in it hence the test fails before it even does its job. Following is how the code looks like.
class BMP
attr_accessor :server, :proxy, :net_har, :sel_proxy
def initialize
bm_path = File.path(Support::Paths.cucumber_root + "/browsermob-
proxy-2.1.4/bin/browsermob-proxy")
#server = BrowserMob::Proxy::Server.new(bm_path, {:port => 9999,
:log => false, :use_little_proxy => true, :timeout => 100})
#server.start
#proxy = #server.create_proxy
#sel_proxy = #proxy.selenium_proxy
#proxy.timeouts(:read => 50000, :request => 50000, :dns_cache =>
50000)
#net_har = #proxy.new_har("new_har", :capture_binary_content =>
true, :capture_headers => true, :capture_content => true)
end
def fetch_har_entries(target_url)
har_logs = File.join(Support::Paths.har_logs, "har_file # .
{Time.now.strftime("%m%d%y_%H%M%S")} .har")
#net_har.save_to har_logs
index = 0
while (#net_har.entries.count > index) do
if #net_har.entries[index].request.url.include?(target_url) &&
entry.request.method.eql?("GET")
logs = JSON.parse(entry.response.content.text) if not
entry.response.content.text.nil?
har_logs = File.join(Support::Paths.har_logs, "json_file_# .
{Time.now.strftime("%m%d%y_%H%M%S")}.json")
File.open(har_logs, "w") do |json|
json.write(logs)
end
break
end
index += 1
end
end
end
In my test file I have following
Then("I navigate to the homepage") do
visit(HomePage) do |page|
page.element.click
end
end
And("I should wait for event to capture traffic") do
visit(SecondPage) do |page|
page.wait_until{page.element2.present?)
BMP.fetch_har_entries("target/url")
end
end
What am I missing that is causing the proxy to not capture traffic in its entirety?

In case anyone gets here from a google search, I figured out how to resolve this on my own (thanks stackoverflow community for nothing, lol). So to resolve the issue, i used a custom retriable loop called eventually method.
logs = nil
eventually(timeout: 110, interval: 1) do
#net_har = #proxy.new_har("har", capture_binary_content: true, capture_headers: true, capture_content: true)
#net_har.entries.each do |entry|
begin
break if #net_har.entries.index entry == #net_har.entries.count
next unless entry.request.url.include?(target_url) &&
entry.request.post_data.text.include?(target_body_text)
logs = entry.request.post_data.text
break
rescue TypeError
fail("Response body for the network call came back empty")
end
end
raise EOFError if logs_hash.nil?
end
logs
end
Basically I'm assuming what was happening was the BMP would only cache or capture 30 seconds worth of har logs, and if my network event didn't occur during those 30 secs, i was SOL. So the what above code is doing is that's it's waiting for the logs variable to be not nil, if it is, it raises an EOFError and goes back to the loop initializes the har again and looks for the network call again. It keeps on doing that until it find the call or 110 seconds are up. Following is the eventually method I'm using
def eventually(options = {})
timeout = options[:timeout] || 30
interval = options[:interval] || 0.1
time_limit = Time.now + timeout
loop do
begin
yield
rescue EOFError => error
end
return if error.nil?
raise error if Time.now >= time_limit
sleep interval
end
end

Related

Unable to fetch a link from an email in the gmail when run on jenkins but works and fetches desired result locally (in the WATIR framework using ruby)

The following code when run on jenkins throws the error as 'The set password url is
invalid argument(Session info: chrome=100.0.4896.88) (Selenium::WebDriver::Error::InvalidArgumentError)
Backtrace: Ordinal0 [0x00A67413+2389011]"
STEP FILE:
When(/^the user clicks on activate online account link$/) do
on(CheckoutPage) do |page|
#sleep for 30 seconds for the email to be received
sleep 30
p #set_password_link = page.get_password_token
puts "The set password url is #{#set_password_link}"
page.navigate_to(#set_password_link)
end
end
Code FILE:
def get_password_token
begin
retries ||= 0
Gmail.new("xxxxxxx#gmail.com", "xxxxxxxx") do |gmail|
email = gmail.inbox.emails(:from => 'orders#cottonon.com', :subject => 'Activate your online account').last
html = email.html_part.body.to_s
urls = URI.extract(html, %w(https))
return urls[1]
end
rescue
retry if (retries += 1) < $code_retry
end
end
it could be number of things, maybe you just need URI.parse(urls[1]) or fetched url is invalid
also it seems like your gmail code always fetches last mail, which can return wrong one if email is still not received
Here is a gmail_check method that should be more resistant to mail content and time received
def gmail_check(url_part, receiver, timeout = 30)
time = (Time.now-5.minutes).to_i
Gmail.connect("xxxxxxx#gmail.com", "xxxxxxxx") do |gmail|
puts("Reading emails to: #{receiver}")
while (timeout > 0)
gmail.inbox.find(:gm => "\"after:#{time}\"").each do |mail|
if mail.message.to.first == receiver
content = mail.multipart? ? mail.html_part.decoded : mail.message.decoded
Nokogiri::HTML(content).css("a").each do |a|
href = a.attributes["href"].to_s
return href if href.include?(url_part)
end
end
end
puts("Waiting 5 seconds before reading mail again.")
timeout = timeout - 5
sleep 5
end
end
end
but you should be able to easily debug the problem by ssh-ing into jenkins machine,
type irb
type require gmail
paste your code there
check the url
good luck :)

In RoR, how do I catch an exception if I get no response from a server?

I’m using Rails 4.2.3 and Nokogiri to get data from a web site. I want to perform an action when I don’t get any response from the server, so I have:
begin
content = open(url).read
if content.lstrip[0] == '<'
doc = Nokogiri::HTML(content)
else
begin
json = JSON.parse(content)
rescue JSON::ParserError => e
content
end
end
rescue Net::OpenTimeout => e
attempts = attempts + 1
if attempts <= max_attempts
sleep(3)
retry
end
end
Note that this is different than getting a 500 from the server. I only want to retry when I get no response at all, either because I get no TCP connection or because the server fails to respond (or some other reason that causes me not to get any response). Is there a more generic way to take account of this situation other than how I have it? I feel like there are a lot of other exception types I’m not thinking of.
This is generic sample how you can define timeout durations for HTTP connection, and perform several retries in case of any error while fetching content (edited)
require 'open-uri'
require 'nokogiri'
url = "http://localhost:3000/r503"
openuri_params = {
# set timeout durations for HTTP connection
# default values for open_timeout and read_timeout is 60 seconds
:open_timeout => 1,
:read_timeout => 1,
}
attempt_count = 0
max_attempts = 3
begin
attempt_count += 1
puts "attempt ##{attempt_count}"
content = open(url, openuri_params).read
rescue OpenURI::HTTPError => e
# it's 404, etc. (do nothing)
rescue SocketError, Net::ReadTimeout => e
# server can't be reached or doesn't send any respones
puts "error: #{e}"
sleep 3
retry if attempt_count < max_attempts
else
# connection was successful,
# content is fetched,
# so here we can parse content with Nokogiri,
# or call a helper method, etc.
doc = Nokogiri::HTML(content)
p doc
end
When it comes to rescuing exceptions, you should aim to have a clear understanding of:
Which lines in your system can raise exceptions
What is going on under the hood when those lines of code run
What specific exceptions could be raised by the underlying code
In your code, the line that's fetching the content is also the one that could see network errors:
content = open(url).read
If you go to the documentation for the OpenURI module you'll see that it uses Net::HTTP & friends to get the content of arbitrary URIs.
Figuring out what Net::HTTP can raise is actually very complicated but, thankfully, others have already done this work for you. Thoughtbot's suspenders project has lists of common network errors that you can use. Notice that some of those errors have to do with different network conditions than what you had in mind, like the connection being reset. I think it's worth rescuing those as well, but feel free to trim the list down to your specific needs.
So here's what your code should look like (skipping the Nokogiri and JSON parts to simplify things a bit):
require 'net/http'
require 'open-uri'
HTTP_ERRORS = [
EOFError,
Errno::ECONNRESET,
Errno::EINVAL,
Net::HTTPBadResponse,
Net::HTTPHeaderSyntaxError,
Net::ProtocolError,
Timeout::Error,
]
MAX_RETRIES = 3
attempts = 0
begin
content = open(url).read
rescue *HTTP_ERRORS => e
if attempts < MAX_RETRIES
attempts += 1
sleep(2)
retry
else
raise e
end
end
I would think about using a Timeout that raises an exception after a short period:
MAX_RESPONSE_TIME = 2 # seconds
begin
content = nil # needs to be defined before the following block
Timeout.timeout(MAX_RESPONSE_TIME) do
content = open(url).read
end
# parsing `content`
rescue Timeout::Error => e
attempts += 1
if attempts <= max_attempts
sleep(3)
retry
end
end

Twitter rate limit hit while requesting friends with ruby gem

I am having trouble printing out a list of people I am following on twitter. This code worked at 250, but fails now that I am following 320 people.
Failure Description: The code request exceeds twitter's rate limit. The code sleeps for the time required for the limit to reset, and then tries again.
I think the way it's written, it just keeps retrying the same entire rejectable request, rather than picking up where it left off.
MAX_ATTEMPTS = 3
num_attempts = 0
begin
num_attempts += 1
#client.friends.each do |user|
puts "#{user.screen_name}"
end
rescue Twitter::Error::TooManyRequests => error
if num_attempts <= MAX_ATTEMPTS
sleep error.rate_limit.reset_in
retry
else
raise
end
end
Thanks!
The following code will return an array of usernames. The vast majority of the code was written by the author of: http://workstuff.tumblr.com/post/4556238101/a-short-ruby-script-to-pull-your-twitter-followers-who
First create the following definition.
def get_cursor_results(action, items, *args)
result = []
next_cursor = -1
until next_cursor == 0
begin
t = #client.send(action, args[0], args[1], {:cursor => next_cursor})
result = result + t.send(items)
next_cursor = t.next_cursor
rescue Twitter::Error::TooManyRequests => error
puts "Rate limit error, sleeping for #{error.rate_limit.reset_in} seconds...".color(:yellow)
sleep error.rate_limit.reset_in
retry
end
end
return result
end
Second gather your twitter friends using the following two lines
friends = get_cursor_results('friends', 'users', 'twitterusernamehere')
screen_names = friends.collect{|x| x.screen_name}
try using a cursor: http://rdoc.info/gems/twitter/Twitter/API/FriendsAndFollowers#friends-instance_method (for example, https://gist.github.com/kent/451413)

Catching timeout errors with ruby mechanize

I have a mechanize function to log me out of a site but on very rare occasions it times me out. The function involves going to a specific page, and then clicking on a logout button. On the occasional that mechanize suffers a timeout when either going to the logout page or clicking the logout button the code crashes. So I put in a small rescue and it seems to be working as seen below the first piece of code.
def logmeout(agent)
page = agent.get('http://www.example.com/')
agent.click(page.link_with(:text => /Log Out/i))
end
Logmeout with rescue:
def logmeout(agent)
begin
page = agent.get('http://www.example.com/')
agent.click(page.link_with(:text => /Log Out/i))
rescue Timeout::Error
puts "Timeout!"
retry
end
end
Assuming I understand rescue correctly, it will do both actions over even if just the clicking timed out, so in the effort to be efficient I am was wondering if I could use a proc in this situation and pass it a code block. Would something like this work:
def trythreetimes
tries = 0
begin
yield
rescue
tries += 1
puts "Trying again!"
retry if tries <= 3
end
end
def logmeout(agent)
trythreetimes {page = agent.get('http://www.example.com/')}
trythreetimes {agent.click(page.link_with(:text => /Log Out/i))}
end
Note in my trythreetimes function I left it as generic rescue so the function would be more re-usable.
Thanks so much for any help anyone can provide, I realize there are a couple different questions in here but they are all things I am trying to learn!
Instead of retrying some timeouts on some mechanize requests I think you'd better set Mechanize::HTTP::Agent::read_timeout attribute to a reasonable amount of seconds like 2 or 5, anyway one that prevent timeouts errors for this request.
Then, it seem's that your log out procedure only required access to a simple HTTP GET request. I mean there is no form to fill in so no HTTP POST request.
So if I were you, I would prefere inspected the page source code (Ctrl+U with Firefox or Chrome) in order to identify the link which is reached by your agent.click(page.link_with(:text => /Log Out/i))
It should be faster because these type of pages are usually blank and Mechanize will not have to load a full html web page in memory.
Here is the code I would prefer use :
def logmeout(agent)
begin
agent.read_timeout=2 #set the agent time out
page = agent.get('http://www.example.com/logout_url.php')
agent.history.pop() #delete this request in the history
rescue Timeout::Error
puts "Timeout!"
puts "read_timeout attribute is set to #{agent.read_timeout}s" if !agent.read_timeout.nil?
#retry #retry is no more needed
end
end
but you can use your retry function too :
def trythreetimes
tries = 0
begin
yield
rescue Exception => e
tries += 1
puts "Error: #{e.message}"
puts "Trying again!" if tries <= 3
retry if tries <= 3
puts "No more attempt!"
end
end
def logmeout(agent)
trythreetimes do
agent.read_timeout=2 #set the agent time out
page = agent.get('http://www.example.com/logout_url.php')
agent.history.pop() #delete this request in the history
end
end
hope it helps ! ;-)
Using mechanize 1.0.0 I got this problem from a different source of error.
In my case I was blocked by proxy and then SSL. This worked for me:
ag = Mechanize.new
ag.set_proxy('yourproxy', yourport)
ag.agent.http.verify_mode = OpenSSL::SSL::VERIFY_NONE
ag.get( url )

Am i using eventmachine in the right way?

I am using ruby-smpp and redis to achive a queue based background worker to send SMPP messages.
And i am wondering if I am using eventmachine in the right way. It works but it doesnt feel right.
#!/usr/bin/env ruby
# Sample SMS gateway that can receive MOs (mobile originated messages) and
# DRs (delivery reports), and send MTs (mobile terminated messages).
# MTs are, in the name of simplicity, entered on the command line in the format
# <sender> <receiver> <message body>
# MOs and DRs will be dumped to standard out.
require 'smpp'
require 'redis/connection/hiredis'
require 'redis'
require 'yajl'
require 'time'
LOGFILE = File.dirname(__FILE__) + "/sms_gateway.log"
PIDFILE = File.dirname(__FILE__) + '/worker_test.pid'
Smpp::Base.logger = Logger.new(LOGFILE)
#Smpp::Base.logger.level = Logger::WARN
REDIS = Redis.new
class MbloxGateway
# MT id counter.
##mt_id = 0
# expose SMPP transceiver's send_mt method
def self.send_mt(sender, receiver, body)
if sender =~ /[a-z]+/i
source_addr_ton = 5
else
source_addr_ton = 2
end
##mt_id += 1
##tx.send_mt(('smpp' + ##mt_id.to_s), sender, receiver, body, {
:source_addr_ton => source_addr_ton
# :service_type => 1,
# :source_addr_ton => 5,
# :source_addr_npi => 0 ,
# :dest_addr_ton => 2,
# :dest_addr_npi => 1,
# :esm_class => 3 ,
# :protocol_id => 0,
# :priority_flag => 0,
# :schedule_delivery_time => nil,
# :validity_period => nil,
# :registered_delivery=> 1,
# :replace_if_present_flag => 0,
# :data_coding => 0,
# :sm_default_msg_id => 0
#
})
end
def logger
Smpp::Base.logger
end
def start(config)
# Write this workers pid to a file
File.open(PIDFILE, 'w') { |f| f << Process.pid }
# The transceiver sends MT messages to the SMSC. It needs a storage with Hash-like
# semantics to map SMSC message IDs to your own message IDs.
pdr_storage = {}
# Run EventMachine in loop so we can reconnect when the SMSC drops our connection.
loop do
EventMachine::run do
##tx = EventMachine::connect(
config[:host],
config[:port],
Smpp::Transceiver,
config,
self # delegate that will receive callbacks on MOs and DRs and other events
)
# Let the connection start before we check for messages
EM.add_timer(3) do
# Maybe there is some better way to do this. IDK, But it works!
EM.defer do
loop do
# Pop a message
message = REDIS.lpop 'messages:send:queue'
if message # If there is a message. Process it and check the queue again
message = Yajl::Parser.parse(message, :check_utf8 => false) # Parse the message from Json to Ruby hash
if !message['send_after'] or (message['send_after'] and Time.parse(message['send_after']) < Time.now)
self.class.send_mt(message['sender'], message['receiver'], message['body']) # Send the message
REDIS.publish 'log:messages', "#{message['sender']} -> #{message['receiver']}: #{message['body']}" # Push the message to the redis queue so we can listen to the channel
else
REDIS.lpush 'messages:queue', Yajl::Encoder.encode(message)
end
else # If there is no message. Sleep for a second
sleep 1
end
end
end
end
end
sleep 2
end
end
# ruby-smpp delegate methods
def mo_received(transceiver, pdu)
logger.info "Delegate: mo_received: from #{pdu.source_addr} to #{pdu.destination_addr}: #{pdu.short_message}"
end
def delivery_report_received(transceiver, pdu)
logger.info "Delegate: delivery_report_received: ref #{pdu.msg_reference} stat #{pdu.stat}"
end
def message_accepted(transceiver, mt_message_id, pdu)
logger.info "Delegate: message_accepted: id #{mt_message_id} smsc ref id: #{pdu.message_id}"
end
def message_rejected(transceiver, mt_message_id, pdu)
logger.info "Delegate: message_rejected: id #{mt_message_id} smsc ref id: #{pdu.message_id}"
end
def bound(transceiver)
logger.info "Delegate: transceiver bound"
end
def unbound(transceiver)
logger.info "Delegate: transceiver unbound"
EventMachine::stop_event_loop
end
end
# Start the Gateway
begin
puts "Starting SMS Gateway. Please check the log at #{LOGFILE}"
# SMPP properties. These parameters work well with the Logica SMPP simulator.
# Consult the SMPP spec or your mobile operator for the correct settings of
# the other properties.
config = {
:host => 'server.com',
:port => 3217,
:system_id => 'user',
:password => 'password',
:system_type => 'type', # default given according to SMPP 3.4 Spec
:interface_version => 52,
:source_ton => 0,
:source_npi => 1,
:destination_ton => 1,
:destination_npi => 1,
:source_address_range => '',
:destination_address_range => '',
:enquire_link_delay_secs => 10
}
gw = MbloxGateway.new
gw.start(config)
rescue Exception => ex
puts "Exception in SMS Gateway: #{ex} at #{ex.backtrace.join("\n")}"
end
Some easy steps to make this code more EventMachine-ish:
Get rid of the blocking Redis driver, use em-hiredis
Stop using defer. Pushing work out to threads with the Redis driver will make things even worse as it relies on locks around the socket it's using.
Get rid of the add_timer(3)
Get rid of the inner loop, replace it by rescheduling a block for the next event loop using EM.next_tick. The outer one is somewhat unnecessary. You shouldn't loop around EM.run as well, it's cleaner to properly handle a disconnect by doing a reconnect in your unbound method instead of stopping and restarting the event loop, by calling the ##tx.reconnect.
Don't sleep, just wait. EventMachine will tell you when new things come in on a network socket.
Here's how the core code around EventMachine would look like with some of the improvements:
def start(config)
File.open(PIDFILE, 'w') { |f| f << Process.pid }
pdr_storage = {}
EventMachine::run do
##tx = EventMachine::connect(
config[:host],
config[:port],
Smpp::Transceiver,
config,
self
)
REDIS = EM::Hiredis.connect
pop_message = lambda do
REDIS.lpop 'messages:send:queue' do |message|
if message # If there is a message. Process it and check the queue again
message = Yajl::Parser.parse(message, :check_utf8 => false) # Parse the message from Json to Ruby hash
if !message['send_after'] or (message['send_after'] and Time.parse(message['send_after']) < Time.now)
self.class.send_mt(message['sender'], message['receiver'], message['body'])
REDIS.publish 'log:messages', "#{message['sender']} -> #{message['receiver']}: #{message['body']}"
else
REDIS.lpush 'messages:queue', Yajl::Encoder.encode(message)
end
end
EM.next_tick &pop_message
end
end
end
end
Not perfect and could use some cleaning up too, but this is more what it should be like in an EventMachine manner. No sleeps, avoid using defer if possible, and don't use network drivers that potentially block, implement traditional loop by rescheduling things on the next reactor loop. In terms of Redis, the difference is not that big, but it's more EventMachine-y this way imho.
Hope this helps. Happy to explain further if you still have questions.
You're doing blocking Redis calls in EM's reactor loop. It works, but isn't the way to go. You could take a look at em-hiredis to properly integrate Redis calls with EM.

Resources