We have intermittently failing tests due to Net::ReadTimeout errors.
We have yet to figure out a permanent fix. For right now we want to try rescuing that specific error and re-running the test. Not a ideal solution or true fix but we need something in the short term.
How can we re-run the rspec test that failed? I mean how can the test suite do this automatically for that test?
We can catch the error like this:
# spec/support/capybara.rb
def rescue_net_read_timeout(max_tries = 1, tries = 0, &block)
yield
rescue Net::ReadTimeout => e
Rails.logger.error e.message
end
but how do we make it try re-running that test?
We want to try re-running the test and then if the re-run passes move on with no error (ideally log it though), else fail for real and consider the test and hence the suite to have failed.
The following method uses the wait gem to retry a test - or a portion of a test, possibly just an assertion.
It
def retry_test(&block)
Wait.new(delay: 0.5, attempts: 10).until do
begin
yield
true # test passed
rescue RSpec::Expectations::ExpectationNotMetError => x
# make sure it failed for the expected reason
expect(x.to_s).to match /Net::ReadTimeout/
false # test failed, will retry
end
end
end
One can verify why it failed, and report a failure immediately if ever the test failed for another reason.
it 'should get a response' do
retry_test do
# The entire test, or just the portion to retry
end
end
I use a ruby gem called rspec-repeat. Out of a few hundred automated tests, we might run into a flaky one here or there and this helps us get past false negatives.
Ideally it's best to continue to diagnose these flaky tests, but this helps alleviate issues in the interim.
Side note, rspec-repeat based off of a another library named rspec-retry, but I find the code base for this one to be tidier and the configuration easier to use.
Related
I have the following Worker
class JobBlastingWorker
include Sidekiq::Worker
sidekiq_options queue: 'job_blasting_worker'
def perform(job_id, action=nil)
job = Job.find(job_id)
JobBlastingService.new(job).call
sidekiq_id = JobBlastingWorker.perform_in(2.minutes, job.id, 're-blast', true)
job.sidekiq_trackers.create(sidekiq_id: sidekiq_id, worker_type: 'blast_version_update')
end
end
In my rspec test, i have the following job_blasting_worker_spec.erb
require 'rails_helper'
describe JobBlastingWorker do
before(:all) do
Rails.cache.clear
end
describe 'perform' do
context 'create' do
it 'creates job schedule for next 2mins' do
#job = create(:job)
worker = JobBlastingWorker.new
expect(JobBlastingWorker).to have_enqueued_sidekiq_job(#job.id, 're-blast').in(2.minutes)
worker.perform(#job.id, 'create')
end
end
end
end
I expect this to work but i realize that the sidekiq job that should be scheduled for the next 2minutes never gets created. Hence, the test fails.
How am i able to ensure that the sidekiq job actually creates for the next 2mins and the test runs successfully?
Well...for this kind of expectation, I suggest just test the message sent to the method.
expect(JobBlastingWorker).to have_enqueued_sidekiq_job(#job.id, 're-blast')
expect(JobBlastingWorker).to receive(:perform_in).with(2.minutes, job.id, 're-blast', true).and_call_original
worker = JobBlastingWorker.new
worker.perform(#job.id, 'create')
Of course, if you dig really hard, I think you will finally find a way to find the active job object in the queue, for example, by using the Redis API directly.
And then you can further examine the object and get the time you set for the job to be performed.
But why? That's ActiveJob responsibility to make sure those jobs will be performed at the right time.
Finding this doesn't help you much, and this behavior should be already tested in RSpec its tests.
I think you don't need to worry about that unless it works incorrectly and you want to reproduce that situation and issue a bug.
On the other hand, the time you send to the method is what you should care about. You don't want someone to change it to 2.hours by accident.
So I suggest you should test the message you send to the method.
I have a series of scripts that I have developed using Ruby and the Watir gem. Those are wrapped by Spinach, but that is beside what I am about to ask.
The intent of those scripts is to do some functional spot check or simply alleviate some very repetitive tasks.
They have been running well for a while, but lately, I've started to see a lot of failure due to Timeouts between the Chromedriver / Geckodriver (tried both browsers) and the scripts. Of course, I could simply restart the script, but when the success rate goes below 70 % it really starts to be aggravating.
What I ended up doing is wrap up all of my calls to Watir in a Proc with a Begin, rescue that would do a retry in case of a timeout.
This is ugly and violates so many rules that I am nearly ashamed to had to resort to this solution, but at least using this my scripts are now completing.
here is how I worked around the issue:
# takes a proc and wraps it around a series of rescue
def execute_block_and_rety_if_needed
yield
rescue Net::ReadTimeout
puts 'Read Timeout detected, retrying operation'
retry
rescue Net::HTTPRequestTimeOut
puts 'Http Request Timeout detected, retrying operation'
retry
rescue Errno::ETIMEDOUT
puts 'Errno::ETIMEDOUT detected, retrying operation'
retry
end
a sample use would look like this:
execute_block_and_rety_if_needed { #browser.link(name: 'OK').wait_until_present.click } # click the 'OK' button
As you can see, this clearly violates the DRY principle as I need to call this proc every single time.
My question is: how can I move this as a module / feature of Watir so that it picks it up automatically. (ideally I would add a maximum number of retry to prevent an infinite loop).
Version information:
- Chromedriver => 2.29.461585
- GeckoDriver => 0.16.1
- Firefox => ESR 52
- Chrome => 58
- Watir => 6.2.1
As far as the DRY comment, I referred to the fact that I had to wrap ALL of my Watir calls with the proc, sorry if this wasn't clear.
execute_block_and_rety_if_needed { #browser.link(name: 'User').wait_until_present.click } # click the 'Edit' button
execute_block_and_rety_if_needed { #browser.link(name: 'Cancel').wait_until_present.click } # click the 'Cancel' button
execute_block_and_rety_if_needed { #browser.link(name: 'OK').wait_until_present.click } # click the 'OK' button
The above is just an example that has to happen if I want to use the retry mechanism.
Given that you want to retry every command sent to the browser, you might want to consider addressing the issue in the underlying Selenium-WebDriver rather than Watir. Watir commands get sent to Selenium-WebDriver, which in turn sends them to the browser/driver.
Each command (or at least most) is currently sent through Selenium::WebDriver::Remote::Http:Default#request. You could patch the method to wrap it in a retry. Not only would your clicks retry for timeouts, but so would every other command - eg navigation, setting fields, getting values, etc.
# Patch to retry timeouts during requests
require 'watir'
module Selenium
module WebDriver
module Remote
module Http
module DefaultExt
def request(*args)
tries ||= 3
super
rescue Net::ReadTimeout, Net::HTTPRequestTimeOut, Errno::ETIMEDOUT => ex
puts "#{ex.class} detected, retrying operation"
(tries -= 1).zero? ? raise : retry
end
end
end
end
end
end
Selenium::WebDriver::Remote::Http::Default.prepend(Selenium::WebDriver::Remote::Http::DefaultExt)
# Then you can use Watir as usual
browser = Watir::Browser.new :chrome # this will retry timeouts
browser.goto('http://www.example.com') # this will also retry timeouts
browser.link.click # this will also retry timeouts
You shouldn't need to use a block for this. You can implement a method that does something like:
def ensure_click(element, retries = 3)
#retries ||= retries
element.click
rescue Net::ReadTimeout, Net::HTTPRequestTimeOut, Errno::ETIMEDOUT => ex
raise unless #retries > 0
#retries = #retries - 1
puts "#{ex.class} detected, retrying"
retry
end
...
ensure_click(#browser.link(name: 'User'))
...
That being said, those exceptions are not typically driver errors, but network issues of some sort. The are not normal.
Is there any way we can ensure certain code to run event after the delayed job is failed or succeeds just like we can write ensure block in exception handling?
What's wrong with the following approach?
def delayed_job_method
do_the_job
ensure
something
end
I'm trying out the whole BDD approach and would like to test the AMQP-based aspect of a vanilla Ruby application I am writing. After choosing Minitest as the test framework for its balance of features and expressiveness as opposed to other aptly-named vegetable frameworks, I set out to write this spec:
# File ./test/specs/services/my_service_spec.rb
# Requirements for test running and configuration
require "minitest/autorun"
require "./test/specs/spec_helper"
# External requires
# Minitest Specs for EventMachine
require "em/minitest/spec"
# Internal requirements
require "./services/distribution/my_service"
# Spec start
describe "MyService", "A Gateway to an AMQP Server" do
# Connectivity
it "cannot connect to an unreachable AMQP Server" do
# This line breaks execution, commented out
# include EM::MiniTest::Spec
# ...
# (abridged) Alter the configuration by specifying
# an invalid host such as "l0c#alho$t" or such
# ...
# Try to connect and expect to fail with an Exception
MyApp::MyService.connect.must_raise EventMachine::ConnectionError
end
end
I have commented out the inclusion of the em-minitest-spec gem's functionality which should coerce the spec to run inside the EventMachine reactor, if I include it I run into an even sketchier exception regarding (I suppose) inline classes and such: NoMethodError: undefined method 'include' for #<#<Class:0x3a1d480>:0x3b29e00>.
The code I am testing against, namely the connect method within that Service is based on this article and looks like this:
# Main namespace
module MyApp
# Gateway to an AMQP Server
class MyService
# External requires
require "eventmachine"
require "amqp"
# Main entry method, connects to the AMQP Server
def self.connect
# Add debugging, spawn a thread
Thread.abort_on_exception = true
begin
#em_thread = Thread.new {
begin
EM.run do
#connection = AMQP.connect(#settings["amqp-server"])
AMQP.channel = AMQP::Channel.new(#connection)
end
rescue
raise
end
}
# Fire up the thread
#em_thread.join
rescue Exception
raise
end
end # method connect
end
end # class MyService
The whole "exception handling" is merely an attempt to bubble the exception out to a place where I can catch/handle it, that didn't help either, with or without the begin and raise bits I still get the same result when running the spec:
EventMachine::ConnectionError: unable to resolve server address, which actually is what I would expect, yet Minitest doesn't play well with the whole reactor concept and fails the test on ground of this Exception.
The question then remains: How does one test EventMachine-related code using Minitest's spec mechanisms? Another question has also been hovering around regarding Cucumber, also unanswered.
Or should I focus on my main functionality (e.g. messaging and seeing if the messages get sent/received) and forget about edge cases? Any insight would truly help!
Of course, it can all come down to the code I wrote above, maybe it's not the way one goes about writing/testing these aspects. Could be!
Notes on my environment: ruby 1.9.3p194 (2012-04-20) [i386-mingw32] (yes, Win32 :>), minitest 3.2.0, eventmachine (1.0.0.rc.4 x86-mingw32), amqp (0.9.7)
Thanks in advance!
Sorry if this response is too pedantic, but I think you'll have a much easier time writing the tests and the library if you distinguish between your unit tests and your acceptance tests.
BDD vs. TDD
Be careful not to confuse BDD with TDD. While both are quite useful, it can lead to problems when you try to test every edge case in an acceptance test. For example, BDD is about testing what you're trying to accomplish with your service, which has more to do with what you're doing with the message queue than connecting to the queue itself. What happens when you try to connect to a non-existent message queue fits more into the realm of a unit test in my opinion. It's also worth pointing out that your service shouldn't be responsible for testing the message queue itself, since that's the responsibility of AMQP.
BDD
While I'm not sure what your service is supposed to do exactly, I would imagine your BDD tests should look something like:
start the service (can do this in a separate thread in the tests if you need to)
write something to the queue
wait for your service to respond
check the results of the service
In other words, BDD (or acceptance tests, or integration tests, however you want to think about them) can treat your app as a black box that is supposed to provide certain functionality (or behavior). The tests keep you focused on your end goal, but are more meant for ensuring one or two golden use cases, rather than the robustness of the app. For that, you need to break down into unit tests.
TDD
When you're doing TDD, let the tests guide you somewhat in terms of code organization. It's difficult to test a method that creates a new thread and runs EM inside that thread, but it's not so hard to unit test either of these individually. So, consider putting the main thread code into a separate function that you can unit test separately. Then you can stub out that method when unit testing the connect method. Also, instead of testing what happens when you try to connect to a bad server (which tests AMQP), you can test what happens when AMQP throws an error (which is your code's responsibility to handle). Here, your unit test can stub out the response of AMQP.connect to throw an exception.
Part of my command-line Ruby program involves checking if there is an internet connection before any commands are processed. The actual check in the program is trivial (using Socket::TCPSocket), but I'm trying to test this behaviour in Cucumber for an integration test.
The code:
def self.has_internet?(force = nil)
if !force.nil? then return force
begin
TCPSocket.new('www.yelp.co.uk', 80)
return true
rescue SocketError
return false
end
end
if has_internet? == false
puts("Could not connect to the Internet!")
exit 2
end
The feature:
Scenario: Failing to log in due to no Internet connection
Given the Internet is down
When I run `login <email_address> <password>`
Then the exit status should be 2
And the output should contain "Could not connect to the Internet!"
I obviously don't want to change the implementation to fit the test, and I require all my scenarios to pass. Clearly if there is actually no connection, the test passes as it is, but my other tests fail as they require a connection.
My question: How can I test for this in a valid way and have all my tests pass?
You can stub your has_internet? method and return false in the implementation of the Given the Internet is down step.
YourClass.stub!(:has_internet?).and_return(false)
There are three alternative solutions I can think of:
have the test temporarily monkeypatch TCPSocket.initialize (or maybe Socket#connect, if that's where it ends up) to pretend the internet is down.
write (suid) a script that adds/removes an iptables firewall rule to disable the internet, and have your test call the script
use LD_PRELOAD on a specially written .so shared library that overrides the connect C call. This is harder.
Myself, I would probably try option 1, give up after about 5 minutes and go with option 2.
maybe a bit late for you :), but have a look at
https://github.com/mmolhoek/vcr-uri-catcher
I made this to test network failures, so this should do the trick for you.