How do I figure out where my script is sleeping? - ruby

One of my unit tests is extremely slow, taking more than a full second for each test. Profiling that test file, the top few lines read:
% cumulative self self total
time seconds seconds calls ms/call ms/call name
198.71 12.28 12.28 2 6140.00 6140.00 Mutex#sleep
14.40 13.17 0.89 9658 0.09 1.09 REXML::Element#namespace
10.68 13.83 0.66 18814 0.04 0.08 REXML::Element#root
I can't figure out where the sleep call comes from though! My whole application only sleeps in one place, in a class called Throttle, and I've inserted a breakpoint in front of it which doesn't trigger during this test. I have tried:
class Mutex
def sleep(time)
require "byebug"; byebug
end
end
Which never breaks, and I've tried:
def setup
Mutex.define_singleton_method(:sleep) { |time|
require "byebug"; byebug
}
end
Which also does absolutely nothing. I have also tried both of these with Kernel instead of Mutex. I have read through my code countless times and I can't for the life of me figure out why my application is constantly sleeping! Can anybody give me any pointers?

I figured it out. Turns out Mutex#sleep was invoked by Net::HTTP#post. My unit test bootstrap script was supposed to override all functions that writes to disk or makes network requests, but I had forgotten to require the functions that I was supposed to override, so the end result was that my overrides were overwritten by the actual functions, causing my unit tests to make real requests to online APIs.
I only discovered this by chance as I was stepping through the code in byebug.
As soon as I fixed the bootstrap script, my unit tests ran as quickly as ever.

Related

Testing that a file gets created within a certain amount of time in rspec

I have this ruby project where I spawn a process. This process creates a log file when it starts.
I'm having a hard time testing this. Currently I start the process, sleep for a short amount of time (0.1 seconds) and check if the file was created.
The rspec test looks something like this:
describe 'the process' do
it 'should create a log file' do
start_the_process
sleep 0.1
expect('log-file.log').to exist
end
end
This works well on my machine, but this test is flaky. When it runs on CI, it fails because the process didn't have enough time to create the file.
I could increase the time. This would fix the problem on CI, but it would make the test slower.
What I really want to test is that the file gets created within the next 3 seconds. If I sleep for 3 seconds, the test would take way too much time to run. One approach would be to check that the file exists in a loop and fail if we don't see the file within the next 3 seconds. There doesn't seem to be a clean way of doing this with rspec.
I think perhaps you're coming at it from too high a level. You should be testing the logging class rather than the process. For example:
class Logger
def initialize
# create the log file
end
end
Then your in your spec:
describe Logger do
let(:log_file_path) { 'log-file.log' }
let(:subject) { described_class.new }
it 'creates a log file on new' do
subject
expect( log_file_path ).to exist
end
end
Then you can trust that the log file is created when your process instantiates a new Logger object.
How about this then.
describe 'the process' do
let(:sleep_time) { env is CI ? 3.0 : 0.1 }
it 'should create a log file' do
start_the_process
sleep sleep_time
expect('log-file.log').to exist
end
end
Normally when testing time related stuff, we stub the time to avoid problem like yours, you could do something like:
it "test" do
time = Time.parse("Jan 01 2015")
Time.stub!(:now).and_return(time)
# your test, because now when you say
# Time.now, it will always returns 'Jan 01 2015'
end
Updated answer
I think you could use the same technique and override the sleep method in you spec and return 0, by that was, there will not be a wait.

Howto know that I do not block Ruby eventmachine with a mongodb operation

I am working on a eventmachine based application that periodically polls for changes of MongoDB stored documents.
A simplified code snippet could look like:
require 'rubygems'
require 'eventmachine'
require 'em-mongo'
require 'bson'
EM.run {
#db = EM::Mongo::Connection.new('localhost').db('foo_development')
#posts = #db.collection('posts')
#comments = #db.collection('comments')
def handle_changed_posts
EM.next_tick do
cursor = #posts.find(state: 'changed')
resp = cursor.defer_as_a
resp.callback do |documents|
handle_comments documents.map{|h| h["comment_id"]}.map(&:to_s) unless documents.length == 0
end
resp.errback do |err|
raise *err
end
end
end
def handle_comments comment_ids
meta_product_ids.each do |id|
cursor = #comments.find({_id: BSON::ObjectId(id)})
resp = cursor.defer_as_a
resp.callback do |documents|
magic_value = documents.first['weight'].to_i * documents.first['importance'].to_i
end
resp.errback do |err|
raise *err
end
end
end
EM.add_periodic_timer(1) do
puts "alive: #{Time.now.to_i}"
end
EM.add_periodic_timer(5) do
handle_changed_posts
end
}
So every 5 seconds EM iterates over all posts, and selects the changed ones. For each changed post it stores the comment_id in an array. When done that array is passed to a handle_comments which loads every comment and does some calculation.
Now I have some difficulties in understanding:
I know, that this load_posts->load_comments->calculate cycle takes 3 seconds in a Rails console with 20000 posts, so it will not be much faster in EM. I schedule the handle_changed_posts method every 5 seconds which is fine unless the number of posts raises and the calculation takes longer than the 5 seconds after which the same run is scheduled again. In that case I'd have a problem soon. How to avoid that?
I trust em-mongo but I do not trust my EM knowledge. To monitor EM is still running I puts a timestamp every second. This seems to be working fine but gets a bit bumpy every 5 seconds when my calculation runs. Is that a sign, that I block the loop?
Is there any general way to find out if I block the loop?
Should I nice my eventmachine process with -19 to give it top OS prio always?
I have been reluctant to answer here since I've got no mongo experience so far, but considering no one is answering and some of the stuff here is general EM stuff I may be able to help:
schedule next scan on first scan's end (resp.callback and resp.errback in handle_changed_posts seem like good candidates to chain next scan), either with add_timer or with next_tick
probably, try handling your mongo trips more often so they handle smaller chunks of data, any cpu cycle hog inside your reactor would make your reactor loop too busy to accept events such as periodic timer ticks
no simple way, no. One idea would be to measure diff of Time.now to next_tick{Time.now}, do benchmark and then trace possible culprits when the diff crosses a threshold. Simulating slow queries (Simulate slow query in mongodb? ?) and many parallel connections is a good idea
I honestly don't know, I've never encountered people who do that, I expect it depends on other things running on that server
To expand upon bbozo's answer, specifically in relation to your second question, there is no time when you run code that you do not block the loop. In my experience, when we talk about 'non-blocking' code what we really mean is 'code that doesn't block very long'. Typically, these are very short periods of time (less than a millisecond), but they still block while executing.
Further, the only thing next_tick really does is to say 'do this, but not right now'. What you really want to do, as bbozo mentioned, is split up your processing over multiple ticks such that each iteration blocks for as little time as possible.
To use your own benchmarks, if 20,000 records takes about 3 seconds to process, 4,000 records should take about 0.6 seconds. This would be short enough to not usually affect your 1 second heartbeat. You could split it up even farther to reduce the amount of blockage and make the reactor run smoother, but it really depends on how much concurrency you need from the reactor.

Getting Thread not to run until join in ruby

I am getting into ruby and have been using threads for a little while now with out fully understanding them. I notice that when adding a thread to an array and if I add a sleep() command as the first command the thread does not run until I do a join which is mostly what I want. So I have 2 questions.
1.Is that suppose to happen?
2.Is there a better way to do that other then the way I'm doing it. Here is a sample code that I have to show what I'm talking about.
job = Array.new
10.times do |n|
job << Thread.new do
sleep 0.001
puts "done #{n}"
end
end
#job.each do |t|
#t.join
#end
puts "End of script"
Output is
End of script
If I remove the comments output is
done 1
done 0
done 7
done 6
done 5
done 4
done 3
done 2
done 9
done 8
End of script
So I use this now but I don't understand why it does that. Sometimes I notice even doing something like `echo hi` instead of sleep does the trick.
Thanks in advance.
Timing of threads isn't a defined behavior. Once you put them to sleep, they will be put in a queue to be run later. You can't ever expect it to run one way or another.
Your main program doesn't take very long to run, so it is likely to happen to finish before your other threads get picked back up to run again. Really, when you think about it, 0.001 seconds is quite a long time to computer, so spinning off 10 threads in that time is likely to happen -- but even if it takes longer, there is no guarantee the thread will resume immediately after .001 seconds. Often there's really no guarantee it won't start before .001 seconds, either, but sleep calls usually don't end early.
When you add the join calls, you are introducing additional time into your main thread which allows the other threads time to run, so this behavior is expected.

Is there a way to call a block every microsecond using celluloid?

I'm using celluloid's every method to execute a block every microsecond however it seems to always call the block every second even when I specify a decimal.
interval = 1.0 / 2.0
every interval do
puts "*"*80
puts "Time: #{Time.now}"
puts "*"*80
end
I would expect this to be called every 0.5 seconds. But it is called every one second.
Any suggestions?
You can get fractional second resolution with Celluloid.
Celluloid uses the Timers gem to manage the every, which does good floating point time math and ruby's sleep which has reasonable sub-second resolution.
The following code works perfectly:
class Bob
include Celluloid
def fred
every 0.5 do
puts Time.now.strftime "%M:%S.%N"
end
end
end
Bob.new.fred
And it produces the following output:
22:51.299923000
22:51.801311000
22:52.302229000
22:52.803512000
22:53.304800000
22:53.805759000
22:54.307003000
22:54.808279000
22:55.309358000
22:55.810017000
As you can see, it is not perfect, but close enough for most purposes.
If you are seeing different results, it is likely because of how long your code takes in the block you have given to every or other timers running and starving that particular one. I would approach it by simplifying the situation as much as possible and slowly adding parts back in to determine where the slowdown is occurring.
As for microsecond resolution, I don't think you can hope to get that far down reliably with any non-trivial code.
The trivial example:
def bob
puts Time.now.strftime "%M:%S.%N"
sleep 1.0e-6
puts Time.now.strftime "%M:%S.%N"
end
Produces:
31:07.373858000
31:07.373936000
31:08.430110000
31:08.430183000
31:09.062000000
31:09.062079000
31:09.638078000
31:09.638156000
So as you can see, even just a base ruby version on my machine running nothing but a simple IO line doesn't reliably give me microsecond speeds.

Regulating / rate limiting ruby mechanize

I need to regulate how often a Mechanize instance connects with an API (once every 2 seconds, so limit connections to that or more)
So this:
instance.pre_connect_hooks << Proc.new { sleep 2 }
I had thought this would work, and it sort of does BUT now every method in that class sleeps for 2 seconds, as if the mechanize instance is touched and told to hold 2 seconds. I'm going to try a post connect hook, but it is obvious I need something a bit more elaborate, but what I don't know what at this point.
Code is more explanation so if you are interested following along: https://github.com/blueblank/reddit_modbot, otherwise my question concerns how to efficiently and effectively rate limit a Mechanize instance to within a specific time frame specified by an API (where overstepping that limit results in dropped requests and bans). Also, I'm guessing I need to better integrate a mechanize instance to my class as well, any pointers on that appreciated as well.
Pre and post connect hooks are called on every connect, so if there is some redirection it could trigger many times for one request. Try history_added which only gets called once:
instance.history_added = Proc.new {sleep 2}
I use SlowWeb to rate limit calls to a specific URL.
require 'slowweb'
SlowWeb.limit('example.com', 10, 60)
In this case calls to example.com domain are limited to 10 requests every 60 seconds.

Resources