EventMachine - how can you tell if you're falling behind? - ruby

I'm looking into using the EventMachine powered twitter-stream rubygem to track and capture tweets. I'm kind of new to the whole evented programming thing. How can I tell if whatever processing I'm doing in my event loop is causing me to fall behind? Is there an easy way to check?

You can determine the latency by using a periodic timer and printing out the elapsed time. If you're using a timer of 1 second you should have about 1 second elapsed, if it's greater you know how much you're slowing down the reactor.
#last = Time.now.to_f
EM.add_periodic_timer(1) do
puts "LATENCY: #{Time.now.to_f - #last}"
#last = Time.now.to_f
end

EventMachine has a EventMachine::Queue.size method that lets you peek at the current queue and get an idea how big it is.
You could add_periodic_timer() and, in that event, get the size of the queue and print it.
If the number is not getting smaller you are at parity. If it's going up you are falling behind.

Related

How do I use Ruby to do a certain number of actions per second?

I want to test a rate-limiting app with Ruby where I define different behavior based on the number of requests per second.
For example, if I see 300 request per second or more, I want it to respond with a block.
But how would I test this by generating 300 requests per second in Ruby? I understand there are hard limitations based on CPU for example, but if I kept the number well below that limitation, how would I still send something that both exceeds the threshold and stays below?
Just looping N-times doesn't guarantee me the throughput.
The quick and dirty way is to spin up 300 threads that each do one request per second. The more elegant way is to use something like Eventmachine to create requests at the required rate. With the right non-blocking HTTP library it can easily generate that level of activity.
You also might try these tools:
ab the Apache benchmarking tool, common many systems. It's very good at abusing your system.
Seige for load testing.
How about a minimal homebrew solution:
OPS_PER_SECOND = 300
count = 0
duration = 10
start = Time.now
while true
elapsed = Time.now - start
break if elapsed >= duration
delay = (count - (elapsed / OPS_PER_SECOND)) / OPS_PER_SECOND
sleep(delay) if delay > 0
do_request
count += 1
end

Howto know that I do not block Ruby eventmachine with a mongodb operation

I am working on a eventmachine based application that periodically polls for changes of MongoDB stored documents.
A simplified code snippet could look like:
require 'rubygems'
require 'eventmachine'
require 'em-mongo'
require 'bson'
EM.run {
#db = EM::Mongo::Connection.new('localhost').db('foo_development')
#posts = #db.collection('posts')
#comments = #db.collection('comments')
def handle_changed_posts
EM.next_tick do
cursor = #posts.find(state: 'changed')
resp = cursor.defer_as_a
resp.callback do |documents|
handle_comments documents.map{|h| h["comment_id"]}.map(&:to_s) unless documents.length == 0
end
resp.errback do |err|
raise *err
end
end
end
def handle_comments comment_ids
meta_product_ids.each do |id|
cursor = #comments.find({_id: BSON::ObjectId(id)})
resp = cursor.defer_as_a
resp.callback do |documents|
magic_value = documents.first['weight'].to_i * documents.first['importance'].to_i
end
resp.errback do |err|
raise *err
end
end
end
EM.add_periodic_timer(1) do
puts "alive: #{Time.now.to_i}"
end
EM.add_periodic_timer(5) do
handle_changed_posts
end
}
So every 5 seconds EM iterates over all posts, and selects the changed ones. For each changed post it stores the comment_id in an array. When done that array is passed to a handle_comments which loads every comment and does some calculation.
Now I have some difficulties in understanding:
I know, that this load_posts->load_comments->calculate cycle takes 3 seconds in a Rails console with 20000 posts, so it will not be much faster in EM. I schedule the handle_changed_posts method every 5 seconds which is fine unless the number of posts raises and the calculation takes longer than the 5 seconds after which the same run is scheduled again. In that case I'd have a problem soon. How to avoid that?
I trust em-mongo but I do not trust my EM knowledge. To monitor EM is still running I puts a timestamp every second. This seems to be working fine but gets a bit bumpy every 5 seconds when my calculation runs. Is that a sign, that I block the loop?
Is there any general way to find out if I block the loop?
Should I nice my eventmachine process with -19 to give it top OS prio always?
I have been reluctant to answer here since I've got no mongo experience so far, but considering no one is answering and some of the stuff here is general EM stuff I may be able to help:
schedule next scan on first scan's end (resp.callback and resp.errback in handle_changed_posts seem like good candidates to chain next scan), either with add_timer or with next_tick
probably, try handling your mongo trips more often so they handle smaller chunks of data, any cpu cycle hog inside your reactor would make your reactor loop too busy to accept events such as periodic timer ticks
no simple way, no. One idea would be to measure diff of Time.now to next_tick{Time.now}, do benchmark and then trace possible culprits when the diff crosses a threshold. Simulating slow queries (Simulate slow query in mongodb? ?) and many parallel connections is a good idea
I honestly don't know, I've never encountered people who do that, I expect it depends on other things running on that server
To expand upon bbozo's answer, specifically in relation to your second question, there is no time when you run code that you do not block the loop. In my experience, when we talk about 'non-blocking' code what we really mean is 'code that doesn't block very long'. Typically, these are very short periods of time (less than a millisecond), but they still block while executing.
Further, the only thing next_tick really does is to say 'do this, but not right now'. What you really want to do, as bbozo mentioned, is split up your processing over multiple ticks such that each iteration blocks for as little time as possible.
To use your own benchmarks, if 20,000 records takes about 3 seconds to process, 4,000 records should take about 0.6 seconds. This would be short enough to not usually affect your 1 second heartbeat. You could split it up even farther to reduce the amount of blockage and make the reactor run smoother, but it really depends on how much concurrency you need from the reactor.

Is there a way to call a block every microsecond using celluloid?

I'm using celluloid's every method to execute a block every microsecond however it seems to always call the block every second even when I specify a decimal.
interval = 1.0 / 2.0
every interval do
puts "*"*80
puts "Time: #{Time.now}"
puts "*"*80
end
I would expect this to be called every 0.5 seconds. But it is called every one second.
Any suggestions?
You can get fractional second resolution with Celluloid.
Celluloid uses the Timers gem to manage the every, which does good floating point time math and ruby's sleep which has reasonable sub-second resolution.
The following code works perfectly:
class Bob
include Celluloid
def fred
every 0.5 do
puts Time.now.strftime "%M:%S.%N"
end
end
end
Bob.new.fred
And it produces the following output:
22:51.299923000
22:51.801311000
22:52.302229000
22:52.803512000
22:53.304800000
22:53.805759000
22:54.307003000
22:54.808279000
22:55.309358000
22:55.810017000
As you can see, it is not perfect, but close enough for most purposes.
If you are seeing different results, it is likely because of how long your code takes in the block you have given to every or other timers running and starving that particular one. I would approach it by simplifying the situation as much as possible and slowly adding parts back in to determine where the slowdown is occurring.
As for microsecond resolution, I don't think you can hope to get that far down reliably with any non-trivial code.
The trivial example:
def bob
puts Time.now.strftime "%M:%S.%N"
sleep 1.0e-6
puts Time.now.strftime "%M:%S.%N"
end
Produces:
31:07.373858000
31:07.373936000
31:08.430110000
31:08.430183000
31:09.062000000
31:09.062079000
31:09.638078000
31:09.638156000
So as you can see, even just a base ruby version on my machine running nothing but a simple IO line doesn't reliably give me microsecond speeds.

ruby eventmachine timer interval too big error

I use the Timer class of ruby eventmachine library as follow:
EM::Timer.new(interval) do
# do something
end
If I set the interval value too big (bigger than the max Integer value) such as '5183877.350508', it will raise an error:
integer 5183883250 too big to convert to `int'
Is this a limit of Timer interval of eventmachine or a bug?
What should I do if I have to set the timer interval bigger (such as several months or years).
What you should do depends on your use case.
While I think relying on your process to continue running uninterrupted for several months on end is optimistic, it could conceivably happen. However unless this is a watchdog timer (eg. your server should definitely die after 4 months for some reason) I think you most likely want a scheduler instead.
I believe Rufus Scheduler integrates reasonably well with EventMachine.

Ruby: connection timeout detection for a TCPServer

been trying to understand how to implement a timeout detection to a ruby TCP server of mine. Mainly because sometimes clients with instable internet lose connection and i need my server to detect it.
The idea is to teach my server to detect when a connection had been silent for longer than 30 seconds and abort it. I've been trying to use timeout, but it terminates the program, so i need to use something like a simple timer that will just return an integer of seconds passed since the activation of the said timer.
Is there an already made solution for that? Sorry if it is a stupid question, it's just that googling it led me nowhere.
ps: using ruby 1.8 here.
The 'Time' object can report the number of seconds past by comparing it to previously created instances. Consider:
require 'time'
t0 = Time.now
sleep(2)
t1 = Time.now
t1.to_f - t0.to_f # => 2.00059294700623
So by creating a "last transmission" time object then checking its difference from "now" you can determine the number of seconds passed and act accordingly.
This might help: http://en.wikibooks.org/wiki/Ruby_Programming/Reference/Objects/Socket#Keeping_a_connection_alive_over_time_when_there_is_no_traffic_being_sent

Resources