RSpec testing of a multiprocess library - ruby

I'm trying to test a gem I'm creating with RSpec. The gem's purpose is to create queues (using 'bunny'). It will serve to communicate between processes on several servers.
But I cannot find documentation on how to safely create processes inside RSpec running environment without spawning several testing processes (all displaying example failures and successes).
Here is what I wanted the tests to do :
Spawn children processes, waiting on the queue
Push messages from the main RSpec process
Consumes the queue on the children processes
Wait for children to stop and get the number of messages received from each child.
For now I implemented a simple case where child is consuming only one message and then stops.
Here is my code currently :
module Queues
# Basic CR accepting only jobs of type cmd_line
class CR
attr_reader :nb_jobs
def initialize
# opening communication pipes
#rout, #wout = IO.pipe
#nb_jobs = nil # not yet available.
end
def main
#todo = JobPipe.instance
job = #todo.pop do |j|
# accept only CMD_LINE type of jobs.
j.type == Messages::Job::CMD_LINE
end
# run command
%x{#{job.cmd}}
#wout.puts "1" # saying that we did one job
end
def run
#pid = Process.fork
if #pid.nil? then
# we are in the child
self.main
#rout.close
#wout.close
exit
end
end
def wait
#nb_jobs = #rout.gets(nil).to_i
Process.wait(#pid)
#rout.close
#wout.close
#nb_jobs
end
end
#job = Messages::Job.new({:type => Messages::Job::CMD_LINE, :cmd => "sleep 1" })
RSpec.describe JobPipe do
context "one Orchestrator and one CR" do
before(:each) do
indalo_queue_pre_configure
end
it "can send a job with Orchestrator and be received by CR" do
cr = CR.new
cr.run # execute the C.R. process
todo = JobPipe.instance
todo.push(#job)
nb_jobs = cr.wait
expect(nb_jobs).to eql(1)
end
end
context "one Orchestrator and severals CR" do
it 'can send one job per CR and get all back' do
crs = Array.new(rand(2..10)) { CR.new }
crs.each do |cr|
cr.run
end
todo = JobPipe.instance
crs.each do |_|
todo.push(#job)
end
nb_jobs = 0
crs.each do |cr|
nb_jobs += cr.wait
end
expect(nb_jobs).to eql(crs.length)
end
end
end
end
Edit: The question is (sorry not putting it right away, this was a mistake):
Is there a way to use correctly RSpec on a multi-process environment ?
I'm not looking for a code review, just wanted to display a clear example of what I wanted to do. Here I used fork, but this duplicate all the process (including RSpec part) and got numerous RSpec outputs which is not what we would expect in a test suite.
I would expect that only the main program states the RSpec outputs/stats and the subprocesses just interact with it.
The only way I see to do that correctly is not fork, but call subprocesses through an other mean. Maybe I answer alone to this question...
But not knowing well RSpec, I was wondering if someone knew how to do it within RSpec without writing external code. It seems to me that having separate codes linked to a single test example is not a good idea.
What I found about multi-process testing is this plugin to RSpec. The only thing is I don't know about the mock concept, but maybe I have to learn about it...

Ok, I found an answer which is to use the &block argument of the Process.fork method. In this case, you don't really duplicate all the process, but just execute the block of code in an other process and then return 0 (like said in the Ruby doc).
This prevent the children to get all the RSpec environment and displaying plenty of times the states of your tests.
PS : Be careful not to forget to redirect STDOUT/STDERR of child process if you don't want them to pollute the STDOUT/STDERR of the test.
PS2: don't forget to close #wout on the parent side if you call #rout.gets(nil) in it, because having it opened on the parent prevent EOF from happening (a bug in the code I presented) even if you close it in the child.
PS3: Use two pipes instead of one to prevent child/parent to talk and listen in the same. Childhood error but I did it again.
PS4: Use exit statement (at the end of the &block) to prevent zombie state of the child and usure parent not waiting too long that the rest of the child process dies.
Sorry for that long post, but it's good it stays also for me ^^

Related

Rspec How to test if a Sidekiq Job was scheduled to run at a certain time

I have the following Worker
class JobBlastingWorker
include Sidekiq::Worker
sidekiq_options queue: 'job_blasting_worker'
def perform(job_id, action=nil)
job = Job.find(job_id)
JobBlastingService.new(job).call
sidekiq_id = JobBlastingWorker.perform_in(2.minutes, job.id, 're-blast', true)
job.sidekiq_trackers.create(sidekiq_id: sidekiq_id, worker_type: 'blast_version_update')
end
end
In my rspec test, i have the following job_blasting_worker_spec.erb
require 'rails_helper'
describe JobBlastingWorker do
before(:all) do
Rails.cache.clear
end
describe 'perform' do
context 'create' do
it 'creates job schedule for next 2mins' do
#job = create(:job)
worker = JobBlastingWorker.new
expect(JobBlastingWorker).to have_enqueued_sidekiq_job(#job.id, 're-blast').in(2.minutes)
worker.perform(#job.id, 'create')
end
end
end
end
I expect this to work but i realize that the sidekiq job that should be scheduled for the next 2minutes never gets created. Hence, the test fails.
How am i able to ensure that the sidekiq job actually creates for the next 2mins and the test runs successfully?
Well...for this kind of expectation, I suggest just test the message sent to the method.
expect(JobBlastingWorker).to have_enqueued_sidekiq_job(#job.id, 're-blast')
expect(JobBlastingWorker).to receive(:perform_in).with(2.minutes, job.id, 're-blast', true).and_call_original
worker = JobBlastingWorker.new
worker.perform(#job.id, 'create')
Of course, if you dig really hard, I think you will finally find a way to find the active job object in the queue, for example, by using the Redis API directly.
And then you can further examine the object and get the time you set for the job to be performed.
But why? That's ActiveJob responsibility to make sure those jobs will be performed at the right time.
Finding this doesn't help you much, and this behavior should be already tested in RSpec its tests.
I think you don't need to worry about that unless it works incorrectly and you want to reproduce that situation and issue a bug.
On the other hand, the time you send to the method is what you should care about. You don't want someone to change it to 2.hours by accident.
So I suggest you should test the message you send to the method.

Background thread in Rails can't see instance variables

I need to gather up some data from a rails application, aggregate it, and send it off to a remote server periodically. I instantiate my aggregation class in a global variable (I know, I know) in application.rb.
Inside my aggregation class, I fire up a thread that sleeps for 10 seconds, then looks at the queue, processes the data, and sends it. The queue is a hash stored in an instance variable of the class.
From the rails controller, I call a method in the aggregator class to queue the data in the hash. Of course this is on a different thread than the background task that reads the queue. The problem is that the background task never sees any data in the hash. In my log, I print out the object_id of the hash both when I write to it (from the controllers thread), and when I read from it (from the background thread). The hash#object_id matches from both threads, but the background thread never sees the data.
Whats killing me is that this works fine outside of rails. I've set up tests with many threads that really pound on it, and it works fine (there is some thread protection that I am not showing for clarity). Anyone know how the object_id's can match, but the contents are not consistent?
class Aggregator
def initialize
#q = {}
#timer = nil
end
def start
#timer = Thread.new do
loop do
sleep(10)
flush_q
end
end
end
def flush_q
logger.debug "flush: q.object_id = #{#q.object_id}" # matches what I get below
logger.debug "flush: q.length = #{#q.length}" # always zero!
#q.each_pair do |k,v|
# pack it up and send it
end
#q.clear
end
def add(item)
logger.debug "add: q.object_id = #{#q.object_id}" # matches what I get above
#q[item.name] ||= item
logger.debug "add: q.length = #{#q.length}" # increases with each add
# not actually that simple, but not relevant
end
end
I'm going to go out on a limb and assume that your code is deployed using a forking app server (eg unicorn or passenger).
This means that your app is loaded once and then new instances are forked from that master instances. Forking is cheap so this means that new instances of the app can be started up/shutdown really quickly.
I believe that your aggregator instance is getting created/started in this master process. When this forks the process's entire memory space is copied (so there an instance of aggregator in the new process, with the same object id and so on).
However when forking only the current thread is copied , so the aggregator flushing is only happening in the master process, but all the appending is happening in the child processes. You could confirm this by adding Proccess.pid to what you log - you should see that your logging is coming from 2 different process.
One way of fixing this would be to start/restart your thread after the child process has forked. How you do this depends on how the app is being served. With unicorn you can do this in your unicorn config via the after_fork method. With passenger you do
PhusionPassenger.on_event(:starting_worker_process) do |forked|
if forked
...
end
end

How do I loop the restart of a daemon?

I am trying to use Ruby's daemon gem and loop the restart of a daemon that has its own loop. My code looks like this now:
require 'daemons'
while true
listener = Daemons.call(:force => true) do
users = accounts.get_updated_user_list
TweetStream::Client.new.follow(users) do |status|
puts "#{status.text}"
end
end
sleep(60)
listener.restart
end
Running this gives me the following error (after 60 seconds):
undefined method `restart' for #<Daemons::Application:0x007fc5b29f5658> (NoMethodError)
So obviously Daemons.call doesn't return a controllable daemon like I think it does. What do I need to do to set this up correctly. Is a daemon the right tool here?
I think this is what you're after, although I haven't tested it.
class RestartingUserTracker
def initialize
#client = TweetStream::Client.new
end
def handle_status(status)
# do whatever it is you're going to do with the status
end
def fetch_users
accounts.get_updated_user_list
end
def restart
#client.stop_stream
users = fetch_users
#client.follow(users) do |status|
handle_status(status)
end
end
end
EM.run do
client = RestartingUserTracker.new
client.restart
EM::PeriodicTimer.new(60) do
client.restart
end
end
Here's how it works:
TweetStream uses EventMachine internally, as a way of polling the API forever and handling the responses. I can see why you might have felt stuck, because the normal TweetStream API blocks forever and doesn't give you a way to intervene at any point. However, TweetStream does allow you to set up other things in the same event loop. In your case, a timer. I found the documentation on how to do that here: https://github.com/intridea/tweetstream#removal-of-on_interval-callback
By starting up our own EventMachine reactor, we're able to inject our own code into the reactor as well as use TweetStream. In this case, we're using a simple timer that just restarts the client every 60 seconds.
EventMachine is an implementation of something called the Reactor Pattern. If you want to fully understand and maintain this code, it would serve you well to find some resources about it and gain a full understanding. The reactor pattern is very powerful, but can be difficult to grasp at first.
However, this code should get you started. Also, I'd consider renaming the RestartingUserTracker to something more appropriate.

Using Ruby on Rails app to control console application

The console application I would like to control is Bluesoleil. It's a Bluetooth software/driver, but details of the software isn't that important I think. What I want to do is basically, type console commands in Windows or Linux terminal environment using web browser running Ruby on Rails app.
So high level design of the Ruby on Rails app would be something like this.
Web browser showing a page with UI for Bluesoleil
Ruby on Rails app render the page for UI, takes in commands from the user and displays result through web browser, just like regular Ruby on Rails app
On the backend, Ruby on Rails types in commands in the console that is running Bluesoleil. And the result shown in the console is grabbed as string by Ruby on Rails.
Is something like this possible with Ruby on Rails?
Just to clear possible confusion, when I say console and console application here, I don't mean Rails console or Ruby console. Console here is just a terminal environment running console applications and so on.
Thank you.
If you only need to run a "one-off" command, just use backticks. If you need to maintain a long-running background process, which accepts commands and returns responses, you can do something like this (some of the details have been edited out, since this code is from a proprietary application):
class Backend
def initialize
#running = false
#server = nil
# if we forget to call "stop", make sure to close down background process on exit
ObjectSpace.define_finalizer(self,lambda { stop if #running })
end
def start
stop if #running
#server = IO.popen("backend","w+")
#running = true
end
def stop
return if not #running
#server << "exit\n"
#server.flush
#running = false
end
def query(*args)
raise "backend not running" if not #running
#server << "details edited out\n"
#server.flush
loop do
response = parse_response
# handle response
# break loop when backend is finished sending data
end
end
private
def parse_response
# details edited out, uses c = #server.getc to read data from backend
# getc will block if there is nothing to read,
# so there needs to be an unambiguous terminator telling you where
# to stop reading
end
end
end
You can adapt this to your own needs. Beware of situations where the background process dies and leaves the main process hanging.
Although it doesn't apply to your situation, if you are designing the background program yourself: Build the background process so that if ANYTHING makes it crash, it will send an unambiguous message like "PANIC" or something which tells the main process to either exit with an error message, or try starting another background process. Also, make sure it is completely unambiguous where each "message" begins and ends, and test the coordination between main/background process well -- if there are bugs on either end, it is very easy to get a situation where both processes get stuck waiting for each other. One more thing: design the "protocol" which the 2 processes speak to each other in a way which makes it easy to maintain synchronization between the 2.

How to use ruby fibers to avoid blocking IO

I need to upload a bunch of files in a directory to S3. Since more than 90% of the time required to upload is spent waiting for the http request to finish, I want to execute several of them at once somehow.
Can Fibers help me with this at all? They are described as a way to solve this sort of problem, but I can't think of any way I can do any work while an http call blocks.
Any way I can solve this problem without threads?
I'm not up on fibers in 1.9, but regular Threads from 1.8.6 can solve this problem. Try using a Queue http://ruby-doc.org/stdlib/libdoc/thread/rdoc/classes/Queue.html
Looking at the example in the documentation, your consumer is the part that does the upload. It 'consumes' a URL and a file, and uploads the data. The producer is the part of your program that keeps working and finds new files to upload.
If you want to upload multiple files at once, simply launch a new Thread for each file:
t = Thread.new do
upload_file(param1, param2)
end
#all_threads << t
Then, later on in your 'producer' code (which, remember, doesn't have to be in its own Thread, it could be the main program):
#all_threads.each do |t|
t.join if t.alive?
end
The Queue can either be a #member_variable or a $global.
To answer your actual questions:
Can Fibers help me with this at all?
No they can't. Jörg W Mittag explains why best.
No, you cannot do concurrency with Fibers. Fibers simply aren't a concurrency construct, they are a control-flow construct, like Exceptions. That's the whole point of Fibers: they never run in parallel, they are cooperative and they are deterministic. Fibers are coroutines. (In fact, I never understood why they aren't simply called Coroutines.)
The only concurrency construct in Ruby is Thread.
When he says that the only concurrency contruct in Ruby is Thread, remember that there are many different implimentations of Ruby and that they vary in their threading implementations. Jörg once again provides a great answer to these differences; and correctly concludes that only something like JRuby (that uses JVM threads mapped to native threads) or forking your process is how you can achieve true parallelism.
Any way I can solve this problem without threads?
Other than forking your process, I would also suggest that you look at EventMachine and something like em-http-request. It's an event driven, non-blocking, reactor pattern based HTTP client that is asynchronous and does not incur the overhead of threads.
Aaron Patterson (#tenderlove) uses an example almost exactly like yours to describe exactly why you can and should use threads to achieve concurrency in your situation.
Most I/O libraries are now smart enough to release the GVL (Global VM Lock, or most people know it as the GIL or Global Interpreter Lock) when doing IO. There is a simple function call in C to do this. You don't need to worry about the C code, but for you this means that most IO libraries worth their salt are going to release the GVL and allow other threads to execute while the thread that is doing the IO waits for the data to return.
If what I just said was confusing, you don't need to worry about it too much. The main thing that you need to know is that if you are using a decent library to do your HTTP requests (or any other I/O operation for that matter... database, interprocess communication, whatever), the Ruby interpreter (MRI) is smart enough to be able to release the lock on the interpreter and allow other threads to execute while one thread awaits IO to return. If the next thread has its own IO to grab, the Ruby interpreter will do the same thing (assuming that the IO library is built to utilize this feature of Ruby, which I believe most are these days).
So, to sum up what I am saying, use threads! You should see the performance benefit. If not, check to see whether your http library is using the rb_thread_blocking_region() function in C and, if not, find out why not. Maybe there is a good reason, maybe you need to consider using a better library.
The link to the Aaron Patterson video is here: http://www.youtube.com/watch?v=kufXhNkm5WU
It is worth a watch, even if just for the laughs, as Aaron Patterson is one of the funniest people on the internet.
You could use separate processes for this instead of threads:
#!/usr/bin/env ruby
$stderr.sync = true
# Number of children to use for uploading
MAX_CHILDREN = 5
# Hash of PIDs for children that are working along with which file
# they're working on.
#child_pids = {}
# Keep track of uploads that failed
#failed_files = []
# Get the list of files to upload as arguments to the program
#files = ARGV
### Wait for a child to finish, adding the file to the list of those
### that failed if the child indicates there was a problem.
def wait_for_child
$stderr.puts " waiting for a child to finish..."
pid, status = Process.waitpid2( 0 )
file = #child_pids.delete( pid )
#failed_files << file unless status.success?
end
### Here's where you'd put the particulars of what gets uploaded and
### how. I'm just sleeping for the file size in bytes * milliseconds
### to simulate the upload, then returning either +true+ or +false+
### based on a random factor.
def upload( file )
bytes = File.size( file )
sleep( bytes * 0.00001 )
return rand( 100 ) > 5
end
### Start a child uploading the specified +file+.
def start_child( file )
if pid = Process.fork
$stderr.puts "%s: uploaded started by child %d" % [ file, pid ]
#child_pids[ pid ] = file
else
if upload( file )
$stderr.puts "%s: done." % [ file ]
exit 0 # success
else
$stderr.puts "%s: failed." % [ file ]
exit 255
end
end
end
until #files.empty?
# If there are already the maximum number of children running, wait
# for one to finish
wait_for_child() if #child_pids.length >= MAX_CHILDREN
# Start a new child working on the next file
start_child( #files.shift )
end
# Now we're just waiting on the final few uploads to finish
wait_for_child() until #child_pids.empty?
if #failed_files.empty?
exit 0
else
$stderr.puts "Some files failed to upload:",
#failed_files.collect {|file| " #{file}" }
exit 255
end

Resources