Typhoeus retry if fail - ruby

Currently, Typhoeus doesn't have automatic re-download in case of failure. What would be the best way of ensuring a retry if the download is not successful?
def request
request ||= Typhoeus::Request.new("www.example.com")
request.on_complete do |response|
if response.success?
xml = Nokogiri::XML(response.body)
else
# retry to download it
end
end
end

I think you need to refactor your code. You should have two queues and threads you're working with, at a minimum.
The first is a queue of URLs that you pull from to read via Typhoeus::Request.
If the queue is empty you sleep that thread for a minute, then look for a URL to retrieve. If you successfully read the page, parse it and push the resulting XML doc into a second queue of DOMs to work on. Process that from a second thread. And, if the second queue is empty, sleep that second thread until there is something to work on.
If reading a URL fails, automatically re-push it onto the first queue.
If both queues are empty you could exit the code, or let both threads sleep until something says to start processing URLs again and you repopulate the first queue.
You also need a retries-counter associated with the URL, otherwise if a site goes down you could retry forever. You could push little sub-arrays onto the queue as:
["url", 0]
where 0 is the retry, or get more complex using an object or define a class. Whatever you do, increment that counter until it hits a drop-dead value, then stop adding that to the queue and report it or remove it from your source of URLs database somehow.
That's somewhat similar to code I've written a couple times to handle big spidering tasks.
See Ruby's Thread and Queue classes for examples of this.
Also:
request ||= Typhoeus::Request.new("www.example.com")
makes no sense. request will be nil when that code runs, so the ||= will always fire. Instead use:
request = Typhoeus::Request.new("www.example.com")
modified with the appropriate code to pull the next value from the first queue mentioned above.

Related

Handling Asynchronous API Call in Jmeter

I am using Jmeter for functional Testing, below is a problem that I am facing and need some help/suggestion on how to overcome that.
I have a thread-group that consists of 2 requests, 1st is API call and 2nd is sending message to Active MQ.
Now the flow is that I need to do first the API call (this will wait for response), then send the message to a particular Active MQ queue and then only I will get the response for the API.
But since jmeter does sequential execution of requests, its get stuck at the API call waiting for the reply and never executes the second part.
I worked on the below solution but even that did not help.
1 Use a parallel controller and put both the API and ACtive MQ call under the same.
2 Add a Timer to the Active MQ call, so that it just did after the API call (2 Sec)
But when I checked in details I see that both the requests are sent at the same time and the timer does not come into effect anywhere.
Any way I can handle this scenario?
Please note I will get a response to the API only when I send message to the particular Active MQ Queue, else it will timeout in a minute.
Your Parallel Controller approach will work, however you need to amend the configuration a little bit, something like:
You could put your ActiveMQ Request under a different Thread Group and use Inter-Thread Communication Plugin for synchronization between threads
You can keep the current setup but replace the JMS Sampler with the JSR223 Sampler and send the message to ActiveMQ programmatically:
Textual code representation for your convenicence:
sleep(2000)
def connectionFactory = new org.apache.activemq.ActiveMQConnectionFactory('your activemq URL')
def connection = connectionFactory.createConnection()
connection.start()
def session = connection.createSession(false, Session.AUTO_ACKNOWLEDGE)
def destination = session.createQueue('your queue name')
def producer = session.createProducer(destination)
def message = session.createTextMessage('your message body')
producer.send(message)
connection.close()
For your Problem statement, following design will work.
Use 2 Thread Groups, add API call to first Thread group and Message to Active MQ call to second Thread Group
Add a delay to second Thread Group so that it should not run before first Thread Group
Run Test Plan
Use while controller. It will keep on executing till the desired outcome then the next request will be executed.
Hope this helps.
Update:-
While Loop controller execute its samplers until the condition specified is not set to False. The condition can be any variable or function that eventually evaluates to the string 'false'.
So, you need to specify a variable or function in While Loop, that has value 'true' and becomes 'false' somewhere else in the script. Once it changes to 'false', JMeter will exit the While loop.
For example if you are using a X-Path extractor in your script which have a variable named Status and its value changes from 'Start' to 'Finish' during the execution and you want to execute your script till 'Finish' has not been met, then you can use the expression ${__javaScript("'${imp_Status}'!='finish'",)} in your While loop and it will execute the samplers under While controller till the status = finish is met.
It is sort of polling based on certain condition. In your first API reponse, consider one value to be appear as the condition upon which first api call is successful.
It sounds that you just need to define timeout for HTTP Request,
If you define Response Timeout as 60000 (milliseconds), and it will only wait for a minute and then continue to next request
Connect Timeout Connection Timeout. Number of milliseconds to wait for a connection to open. No
Response Timeout Response Timeout. Number of milliseconds to wait for a response. Note that this applies to each wait for a response. If the server response is sent in several chunks, the overall elapsed time may be longer than the timeout.

Run when you can

In my sinatra web application, I have a route:
get "/" do
temp = MyClass.new("hello",1)
redirect "/home"
end
Where MyClass is:
class MyClass
#instancesArray = []
def initialize(string,id)
#string = string
#id = id
#instancesArray[id] = this
end
def run(id)
puts #instancesArray[id].string
end
end
At some point I would want to run MyClass.run(1), but I wouldn't want it to execute immediately because that would slow down the servers response to some clients. I would want the server to wait to run MyClass.run(temp) until there was some time with a lighter load. How could I tell it to wait until there is an empty/light load, then run MyClass.run(temp)? Can I do that?
Addendum
Here is some sample code for what I would want to do:
$var = 0
get "/" do
$var = $var+1 # each time a request is recieved, it incriments
end
After that I would have a loop that would count requests/minute (so after a minute it would reset $var to 0, and if $var was less than some number, then it would run tasks util the load increased.
As Andrew mentioned (correctly—not sure why he was voted down), Sinatra stops processing a route when it sees a redirect, so any subsequent statements will never execute. As you stated, you don't want to put those statements before the redirect because that will block the request until they complete. You could potentially send the redirect status and header to the client without using the redirect method and then call MyClass#run. This will have the desired effect (from the client's perspective), but the server process (or thread) will block until it completes. This is undesirable because that process (or thread) will not be able to serve any new requests until it unblocks.
You could fork a new process (or spawn a new thread) to handle this background task asynchronously from the main process associated with the request. Unfortunately, this approach has the potential to get messy. You would have to code around different situations like the background task failing, or the fork/spawn failing, or the main request process not ending if it owns a running thread or other process. (Disclaimer: I don't really know enough about IPC in Ruby and Rack under different application servers to understand all of the different scenarios, but I'm confident that here there be dragons.)
The most common solution pattern for this type of problem is to push the task into some kind of work queue to be serviced later by another process. Pushing a task onto the queue is ideally a very quick operation, and won't block the main process for more than a few milliseconds. This introduces a few new challenges (where is the queue? how is the task described so that it can be facilitated at a later time without any context? how do we maintain the worker processes?) but fortunately a lot of the leg work has already been done by other people. :-)
There is the delayed_job gem, which seems to provide a nice all-in-one solution. Unfortunately, it's mostly geared towards Rails and ActiveRecord, and the efforts people have made in the past to make it work with Sinatra look to be unmaintained. The contemporary, framework-agnostic solutions are Resque and Sidekiq. It might take some effort to get up and running with either option, but it would be well worth it if you have several "run when you can" type functions in your application.
MyClass.run(temp) is never actually executing. In your current request to / path you instantiate a new instance of MyClass then it will immediately do a get request to /home. I'm not entirely sure what the question is though. If you want something to execute after the redirect, that functionality needs to exist within the /home route.
get '/home' do
# some code like MyClass.run(some_arg)
end

Basic Sidekiq Questions about Idempotency and functions

I'm using Sidekiq to perform some heavy processing in the background. I looked online but couldn't find the answers to the following questions. I am using:
Class.delay.use_method(listing_id)
And then, inside the class, I have a
self.use_method(listing_id)
listing = Listing.find_by_id listing_id
UserMailer.send_mail(listing)
Class.call_example_function()
Two questions:
How do I make this function idempotent for the UserMailer sendmail? In other words, in case the delayed method runs twice, how do I make sure that it only sends the mail once? Would wrapping it in something like this work?
mail_sent = false
if !mail_sent
UserMailer.send_mail(listing)
mail_sent = true
end
I'm guessing not since the function is tried again and then mail_sent is set to false for the second run through. So how do I make it so that UserMailer is only run once.
Are functions called within the delayed async method also asynchronous? In other words, is Class.call_example_function() executed asynchronously (not part of the response / request cycle?) If not, should I use Class.delay.call_example_function()
Overall, just getting familiar with Sidekiq so any thoughts would be appreciated.
Thanks
I'm coming into this late, but having been around the loop and had this StackOverflow entry appearing prominently via Google, it needs clarification.
The issue of idempotency and the issue of unique jobs are not the same thing. The 'unique' gems look at the parameters of job at the point it is about to be processed. If they find that there was another job with the same parameters which had been submitted within some expiry time window then the job is not actually processed.
The gems are literally what they say they are; they consider whether an enqueued job is unique or not within a certain time window. They do not interfere with the retry mechanism. In the case of the O.P.'s question, the e-mail would still get sent twice if Class.call_example_function() threw an error thus causing a job retry, but the previous line of code had successfully sent the e-mail.
Aside: The sidekiq-unique-jobs gem mentioned in another answer has not been updated for Sidekiq 3 at the time of writing. An alternative is sidekiq-middleware which does much the same thing, but has been updated.
https://github.com/krasnoukhov/sidekiq-middleware
https://github.com/mhenrixon/sidekiq-unique-jobs (as previously mentioned)
There are numerous possible solutions to the O.P.'s email problem and the correct one is something that only the O.P. can assess in the context of their application and execution environment. One would be: If the e-mail is only going to be sent once ("Congratulations, you've signed up!") then a simple flag on the User model wrapped in a transaction should do the trick. Assuming a class User accessible as an association through the Listing via listing.user, and adding in a boolean flag mail_sent to the User model (with migration), then:
listing = Listing.find_by_id(listing_id)
unless listing.user.mail_sent?
User.transaction do
listing.user.mail_sent = true
listing.user.save!
UserMailer.send_mail(listing)
end
end
Class.call_example_function()
...so that if the user mailer throws an exception, the transaction is rolled back and the change to the user's flag setting is undone. If the "call_example_function" code throws an exception, then the job fails and will be retried later, but the user's "e-mail sent" flag was successfully saved on the first try so the e-mail won't be resent.
Regarding idempotency, you can use https://github.com/mhenrixon/sidekiq-unique-jobs gem:
All that is required is that you specifically set the sidekiq option
for unique to true like below:
sidekiq_options unique: true
For jobs scheduled in the future it is possible to set for how long
the job should be unique. The job will be unique for the number of
seconds configured or until the job has been completed.
*If you want the unique job to stick around even after it has been successfully processed then just set the unique_unlock_order to
anything except :before_yield or :after_yield (unique_unlock_order =
:never)
I'm not sure I understand the second part of the question - when you delay a method call, the whole method call is deferred to the sidekiq process. If by 'response / request cycle' you mean that you are running a web server, and you call delay from there, so all the calls within the use_method are called from the sidekiq process, and hence outside of that cycle. They are called synchronously relative to each other though...

Ruby Eventmachine queueing problem

I have a Http client written in Ruby that can make synchronous requests to URLs. However, to quickly execute multiple requests I decided to use Eventmachine. The idea is to
queue all the requests and execute them using eventmachine.
class EventMachineBackend
...
...
def execute(request)
$q ||= EM.Queue.new
$q.push(request)
$q.pop {|request| request.invoke}
EM.run{EM.next_tick {EM.stop}}
end
...
end
Forgive my use of a global queue variable. I will refactor it later. Is what I am doing in EventMachineBackend#execute the right way of using Eventmachine queues?
One problem I see in my implementation is it is essentially synchronous. I push a request, pop and execute the request and wait for it to complete.
Could anyone suggest a better implementation.
Your the request logic has to be asynchronous for it to work with EventMachine, I suggest that you use em-http-request. You can find an example on how to use it here, it shows how to run the requests in parallel. An even better interface for running multiple connections in parallel is the MultiRequest class from the same gem.
If you want to queue requests and only run a fixed number of them in parallel you can do something like this:
EM.run do
urls = [...] # regular array with URLs
active_requests = 0
# this routine will be used as callback and will
# be run when each request finishes
when_done = proc do
active_requests -= 1
if urls.empty? && active_requests == 0
# if there are no more urls, and there are no active
# requests it means we're done, so shut down the reactor
EM.stop
elsif !urls.empty?
# if there are more urls launch a new request
launch_next.call
end
end
# this routine launches a request
launch_next = proc do
# get the next url to fetch
url = urls.pop
# launch the request, and register the callback
request = EM::HttpRequest.new(url).get
request.callback(&when_done)
request.errback(&when_done)
# increment the number of active requests, this
# is important since it will tell us when all requests
# are done
active_requests += 1
end
# launch three requests in parallel, each will launch
# a new requests when done, so there will always be
# three requests active at any one time, unless there
# are no more urls to fetch
3.times do
launch_next.call
end
end
Caveat emptor, there may very well be some detail I've missed in the code above.
If you think it's hard to follow the logic in my example, welcome to the world of evented programming. It's really tricky to write readable evented code. It all goes backwards. Sometimes it helps to start reading from the end.
I've assumed that you don't want to add more requests after you've started downloading, it doesn't look like it from the code in your question, but should you want to you can rewrite my code to use an EM::Queue instead of a regular array, and remove the part that does EM.stop, since you will not be stopping. You can probably remove the code that keeps track of the number of active requests too, since that's not relevant. The important part would look something like this:
launch_next = proc do
urls.pop do |url|
request = EM::HttpRequest.new(url).get
request.callback(&launch_next)
request.errback(&launch_next)
end
end
Also, bear in mind that my code doesn't actually do anything with the response. The response will be passed as an argument to the when_done routine (in the first example). I also do the same thing for success and error, which you may not want to do in a real application.

Typhoeus: How to write code in order to send and run a request more times but stop this process on the first successful response?

I am using the Typhoeus gem. The official documentation refers to Memoization:
Memoization: Hydra memoizes requests within a single run call. You
can also disable memoization.
hydra = Typhoeus::Hydra.new
2.times do
r = Typhoeus::Request.new("http://localhost/3000/users/1")
hydra.queue r
end
hydra.run # this will result in a single request being issued. However, the on_complete handlers of both will be called.
hydra.disable_memoization
2.times do
r = Typhoeus::Request.new("http://localhost/3000/users/1")
hydra.queue r
end
hydra.run # this will result in a two requests.
How do I write code to send and run a request multiple times but stop on the first successful response? Also, I would like to skip the current request if it has timed-out.
Take a look at Typhoeus's times.rb example.
Don't submit multiple requests to a URL to the Hydra queue, only do one per URL.
Inside the on_complete block you have access to the response object. The response object has a timed_out? method which checks to see if the request timed out. If it did, resubmit your request to Hydra then exit the block, otherwise process the content as normal.

Resources