How to test deferred action - EventMachine - ruby

I have a Sinatra app that runs inside of EventMachine. Currently, I am taking a post request of JSON data, deferring storage, and returning a 200 OK status code. The deferred task simply pushes the data to a queue and increments a stats counter. The code is similar to:
class App < Sinatra::Base
...
post '/' do
json = request.body.read
operation = lambda do
push_to_queue(json)
incr_incoming_stats
end
callback = lambda {}
EM.defer(operation, callback)
end
...
end
My question is, how do I test this functionality. If I use Rack::Test::Methods, then I have to put in something like sleep 1 to make sure the deferred task has completed before checking the queue and stats such that my test may look like:
it 'should push data to queue with valid request' do
post('/', #json)
sleep 1
#redis.llen("#{#opts[:redis_prefix]}-queue").should > 0
end
Any help is appreciated!

The solution was pretty simple and once I realized it, I felt kind of silly. I created a test-helper that contained the following:
module EM
def self.defer(op, callback)
callback.call(op.call)
end
end
Then just include this into your test-files. This way the defer method will just run the operation and callback on the same thread.

Related

Tornado cancel httpclient.AsyncHTTPClient fetch() from on_chunk()

Inside one of the handlers I am doing the following:
async def get(self):
client = httpclient.AsyncHTTPClient()
url = 'some url here'
request = httpclient.HTTPRequest(url=url, streaming_callback=self.on_chunk, request_timeout=120)
result = await client.fetch(request)
self.write("done")
#gen.coroutine
def on_chunk(self, chunk):
self.write(chunk)
yield self.flush()
The requests can sometimes be quite large and the client may leave while the request is still in progress of being fetched and pumped to the client. If this happens an exception will appear in the on_chunk function when self.write() is attempted. My question is how do I abort the remaining download if my client went away ?
If your streaming_callback raises an exception, the client request should be aborted. This will spam the logs with stack traces, but there's not currently a cleaner way to do it. You can override on_connection_close to detect when the client has disconnected and set an attribute on self that you can check in on_chunk.

How to push messages from unacked to ready

My question is similar to a question asked previously, however it does not find an answer, I have a Consumer which I want to process an action called a Web Service, however, if this web service does not respond for some reason, I want the consumer not to process the message of the RabbitMQ but I encole it to process it later, my consumer is the following one:
require File.expand_path('../config/environment.rb', __FILE__)
conn=Rabbit.connect
conn.start
ch = conn.create_channel
x = ch.exchange("d_notification_ex", :type=> "x-delayed-message", :arguments=> { "x-delayed-type" => "direct"})
q = ch.queue("d_notification_q", :durable =>true)
q.bind(x)
p 'Wait ....'
q.subscribe(:manual_ack => true, :block => true) do |delivery_info, properties, body|
datos=JSON.parse(body)
if datos['status']=='request'
#I call a web service and process the json
result=Notification.send_payment_notification(datos.to_json)
else
#I call a web service and process the body
result=Notification.send_payment_notification(body)
end
#if the call to the web service, the web server is off the result will be equal to nil
#therefore, he did not notify RabbitMQ, but he puts the message in UNACKED status
# and does not process it later, when I want him to keep it in the queue and evaluate it afterwards.
unless result.nil?
ch.ack(delivery_info.delivery_tag)
end
end
An image of RabbitMQ,
There is some way that in the statement: c hack (delivery_info.delivery_tag), this instead of deleting the element of the queue can process it later, any ideas? Thanks
The RabbitMQ team monitors this mailing list and only sometimes answers questions on StackOverflow.
Try this:
if result.nil?
ch.nack(delivery_info.delivery_tag)
else
ch.ack(delivery_info.delivery_tag)
end
I decided to send the data back to the queue with a style "producer within the consumer", my code now looks like this:
if result.eql? 'ok'
ch.ack(delivery_info.delivery_tag)
else
if(datos['count'] < 5)
datos['count'] += 1
d_time=1000
x.publish(datos.to_json, :persistent => true, :headers=>{"x-delay" => d_time})
end
end
However I was forced to include one more attribute in the JSON attribute: Count! so that it does not stay in an infinite cycle.

Why does using asyncio.ensure_future for long jobs instead of await run so much quicker?

I am downloading jsons from an api and am using the asyncio module. The crux of my question is, with the following event loop as implemented as this:
loop = asyncio.get_event_loop()
main_task = asyncio.ensure_future( klass.download_all() )
loop.run_until_complete( main_task )
and download_all() implemented like this instance method of a class, which already has downloader objects created and available to it, and thus calls each respective download method:
async def download_all(self):
""" Builds the coroutines, uses asyncio.wait, then sifts for those still pending, loops """
ret = []
async with aiohttp.ClientSession() as session:
pending = []
for downloader in self._downloaders:
pending.append( asyncio.ensure_future( downloader.download(session) ) )
while pending:
dne, pnding= await asyncio.wait(pending)
ret.extend( [d.result() for d in dne] )
# Get all the tasks, cannot use "pnding"
tasks = asyncio.Task.all_tasks()
pending = [tks for tks in tasks if not tks.done()]
# Exclude the one that we know hasn't ended yet (UGLY)
pending = [t for t in pending if not t._coro.__name__ == self.download_all.__name__]
return ret
Why is it, that in the downloaders' download methods, when instead of the await syntax, I choose to do asyncio.ensure_future instead, it runs way faster, that is more seemingly "asynchronously" as I can see from the logs.
This works because of the way I have set up detecting all the tasks that are still pending, and not letting the download_all method complete, and keep calling asyncio.wait.
I thought that the await keyword allowed the event loop mechanism to do its thing and share resources efficiently? How come doing it this way is faster? Is there something wrong with it? For example:
async def download(self, session):
async with session.request(self.method, self.url, params=self.params) as response:
response_json = await response.json()
# Not using await here, as I am "supposed" to
asyncio.ensure_future( self.write(response_json, self.path) )
return response_json
async def write(self, res_json, path):
# using aiofiles to write, but it doesn't (seem to?) support direct json
# so converting to raw text first
txt_contents = json.dumps(res_json, **self.json_dumps_kwargs);
async with aiofiles.open(path, 'w') as f:
await f.write(txt_contents)
With full code implemented and a real API, I was able to download 44 resources in 34 seconds, but when using await it took more than three minutes (I actually gave up as it was taking so long).
When you do await in each iteration of for loop it will await to download every iteration.
When you do ensure_future on the other hand it doesn't it creates task to download all the files and then awaits all of them in second loop.

Is it reasonable to use resque(ruby) to manage external long-running commands (and log tasks)

I have to run bash heavy-job.sh <data-num> (that takes 0.5~2 days) frequently on my computer to process data located at ~/a/data/num . The script call a few sub-processes sequentially and write a log to ~/a/result/num.log . I have done this manually until now.
I wanted to visualize processed tasks and it's status(success or fail), etc as html table. I wrote simple sinatra app to render a table that shows
the list of ~/a/data/num to be processed
~/a/result/num.log exists or not (process not-launched/processing/done)
it's status (the log file contains the word "error" or not)
I found that it would be convenient that if I could launch a bash heavy-job.sh <data-num> from the sinatra app, log the tasks (and info like time,date,etc..) and it's args (heavy-jobs takes some optional args ) and show them as html table.
So I need something that manages jobs and logs to files (or db).
First I wrote a code like below for test (! for test, not integrated with my system yet !), but later I found resque is what i wanted. I am a beginner and not sure if my decision is reasonable or not.
my questions are
is it reasonable to use resque to manage external long-running commands (and log tasks)
or should I use another tool (not necessarily ruby-tool).
(extra;) the task-manager and the sinatra app should work separately (and communicate each other over REST or something) OR not ?
The jobs are not critical since I can retry tasks manually later if failed.
I am not good at English and my question may be misleading. I appreciate any help :) .
class TaskSpawn
def initialize()
#pids = []
end
def spawn(command, options = {})
#opt = {:pgroup => true}
#pids << Kernel.spawn(command, options)
end
def pids()
return #pids.clone
end
def waitany_nohang()
delete_idx = nil
ret = nil
#pids.each_with_index do |p, idx|
pid,status = Process.waitpid2(p, Process::WNOHANG)
unless pid.nil?
delete_idx = idx
ret = [pid,status]
break
end
end
if delete_idx
#pids.delete_at(delete_idx)
return ret
else
# no task fininshed
return nil
end
end
def waitall()
ret = waitall
raise "interal error" if ret.size != pids.size
return ret
end
end

Is it possible to get the headers of a request using Ruby's HTTPClient gem before the request completes?

I have a requirement to proxy a request in a Rails app. I was hoping I could proxy it with chunking (so, 1 chunk received, one chunk is sent). The app is working fine without chunking (load the request into memory, and transmit).
Here is my code to proxy the chunks through to the end-client:
self.response.headers['Last-Modified'] = Time.now.ctime.to_s
self.response_body = Enumerator.new do |y|
client = HTTPClient.new
http_response = client.get(proxy_url, nil, headers) do |chunk|
y << chunk
end
end
The problem is, I can't inspect "http_response" until all the chunks have been received, thus I can't set the headers based on the headers of the client.
What I'm trying to do is transmit the headers returned from the client before the first chunk is sent. Is this possible?
If not, is this pattern possible in any other Ruby HTTP client gem?
Update
I have a solution for you.
If you call get_async instead, it will retun immediately with an HTTPClient::Connection object that is updated with the header information as soon as it is received. This code sample demonstrates.
The patch to HTTPClient::Connection is almost certainly not necessary for you, but it lets you write things like conn.queue.size? and conn.queue.empty?.
conn.pop blocks until the response (or exception) has been pushed to the queue by the async thread and then returns the normal HTTP::Message object. (Note that, if you are using the monkey patch, you can use conn.queue.empty? to see if pop is going to block.)
resp.content returns an IO object which is a pipe read endpoint, and can be called as soon as pop hs returned. The other end is written by the async thread as the data arrives, and you can read the entire content in one go or in whatever size chunks you like using read.
require 'httpclient'
class HTTPClient::Connection
attr_reader :queue
end
client = HTTPClient.new
conn = client.get_async 'http://en.wikipedia.org/wiki/Ruby_(programming_language)'
resp = conn.pop
resp.header.all.each { |name, val| puts "#{name}=#{val}" }
puts
pipe = resp.content
while chunk = pipe.read(8192)
print chunk
end
You could parse the first chunk you receive to extract the headers, but I suggest you call head first to get the header information. Then do the get as well.
(Updated - the first chunk holds the beginning of the content so this won't work.)

Resources