How can I terminate a SupervisionGroup? - ruby

I am implementing a simple program in Celluloid that ideally will run a few actors in parallel, each of which will compute something, and then send its result back to a main actor, whose job is simply to aggregate results.
Following this FAQ, I introduced a SupervisionGroup, like this:
module Shuffling
class AggregatorActor
include Celluloid
def initialize(shufflers)
#shufflerset = shufflers
#results = {}
end
def add_result(result)
#results.merge! result
#shufflerset = #shufflerset - result.keys
if #shufflerset.empty?
self.output
self.terminate
end
end
def output
puts #results
end
end
class EvalActor
include Celluloid
def initialize(shufflerClass)
#shuffler = shufflerClass.new
self.async.runEvaluation
end
def runEvaluation
# computation here, which yields result
Celluloid::Actor[:aggregator].async.add_result(result)
self.terminate
end
end
class ShufflerSupervisionGroup < Celluloid::SupervisionGroup
shufflers = [RubyShuffler, PileShuffle, VariablePileShuffle, VariablePileShuffleHuman].to_set
supervise AggregatorActor, as: :aggregator, args: [shufflers.map { |sh| sh.new.name }]
shufflers.each do |shuffler|
supervise EvalActor, as: shuffler.name.to_sym, args: [shuffler]
end
end
ShufflerSupervisionGroup.run
end
I terminate the EvalActors after they're done, and I also terminate the AggregatorActor when all of the workers are done.
However, the supervision thread stays alive and keeps the main thread alive. The program never terminates.
If I send .run! to the group, then the main thread terminates right after it, and nothing works.
What can I do to terminate the group (or, in group terminology, finalize, I suppose) after the AggregatorActor terminates?

What I did after all, is change the AggregatorActor to have a wait_for_results:
class AggregatorActor
include Celluloid
def initialize(shufflers)
#shufflerset = shufflers
#results = {}
end
def wait_for_results
sleep 5 while not #shufflerset.empty?
self.output
self.terminate
end
def add_result(result)
#results.merge! result
#shufflerset = #shufflerset - result.keys
puts "Results for #{result.keys.inspect} recorded, remaining: #{#shufflerset.inspect}"
end
def output
puts #results
end
end
And then I got rid of the SupervisionGroup (since I didn't need supervision, ie rerunning of actors that failed), and I used it like this:
shufflers = [RubyShuffler, PileShuffle, VariablePileShuffle, VariablePileShuffleHuman, RiffleShuffle].to_set
Celluloid::Actor[:aggregator] = AggregatorActor.new(shufflers.map { |sh| sh.new.name })
shufflers.each do |shuffler|
Celluloid::Actor[shuffler.name.to_sym] = EvalActor.new shuffler
end
Celluloid::Actor[:aggregator].wait_for_results
That doesn't feel very clean, it would be nice if there was a cleaner way, but at least this works.

Related

How to make sure each Minitest unit test is fast enough?

I have a large amount of Minitest unit tests (methods), over 300. They all take some time, from a few milliseconds to a few seconds. Some of them hang up, sporadically. I can't understand which one and when.
I want to apply Timeout to each of them, to make sure anyone fails if it takes longer than, say, 5 seconds. Is it achievable?
For example:
class FooTest < Minitest::Test
def test_calculates_something
# Something potentially too slow
end
end
You can use the Minitest PLugin loader to load a plugin. This is, by far, the cleanest solution. The plugin system is not very well documented, though.
Luckily, Adam Sanderson wrote an article on the plugin system.
The best news is that this article explains the plugin system by writing a sample plugin that reports slow tests. Try out minitest-snail, it is probably almost what you want.
With a little modification we can use the Reporter to mark a test as failed if it is too slow, like so (untested):
File minitest/snail_reporter.rb:
module Minitest
class SnailReporter < Reporter
attr_reader :max_duration
def self.options
#default_options ||= {
:max_duration => 2
}
end
def self.enable!(options = {})
#enabled = true
self.options.merge!(options)
end
def self.enabled?
#enabled ||= false
end
def initialize(io = STDOUT, options = self.class.options)
super
#max_duration = options.fetch(:max_duration)
end
def record result
#passed = result.time < max_duration
slow_tests << result if !#passed
end
def passed?
#passed
end
def report
return if slow_tests.empty?
slow_tests.sort_by!{|r| -r.time}
io.puts
io.puts "#{slow_tests.length} slow tests."
slow_tests.each_with_index do |result, i|
io.puts "%3d) %s: %.2f s" % [i+1, result.location, result.time]
end
end
end
end
File minitest/snail_plugin.rb:
require_relative './snail_reporter'
module Minitest
def self.plugin_snail_options(opts, options)
opts.on "--max-duration TIME", "Report tests that take longer than TIME seconds." do |max_duration|
SnailReporter.enable! :max_duration => max_duration.to_f
end
end
def self.plugin_snail_init(options)
if SnailReporter.enabled?
io = options[:io]
Minitest.reporter.reporters << SnailReporter.new(io)
end
end
end

How to write an integration test for a loop?

I am having difficulty writing integration (no stubbing) tests for the following scenario: a process (rake task) that runs in a loop, emitting some values. Below is an approximation of the use case.
The test will succeed if I control-C it, but I would like it to catch the success condition and stop.
Anyone has some good suggestions? (stubbing/mocking are not good suggestions). I guess may be there is a way to instruct RSpec to stop a process after a matcher returns success?
describe 'rake reactor' do
it 'eventually returns 0.3' do
expect { Rake::Task['reactor'].execute }.to output(/^0\.3.*/).to_stdout
end
end
class Reactor
def initialize
#stop = false
end
def call
loop do
break if stop?
sleep random_interval
yield random_interval
end
end
def stop
#stop = true
end
def stop?
#stop == true
end
def random_interval
rand(0.1..0.4)
end
end
desc 'Start reactor'
task reactor: :environment do
reactor = Reactor.new
trap(:INT) do
reactor.stop
end
reactor.call { |m| p m }
end
A naïve way to handle it is to start a new thread and send INT from there after some predefined timeout:
before do
Thread.new do
sleep 0.5
Process.kill('INT', Process.pid)
end
end

Ruby: Synchronizing fork pool output

I am trying to create a generic way of iterating Enumerables using multiple processors. I am spawning a given number of workers using fork, and feeding them data to process reusing idle workers. However, I would like to synchronize the input and output order. If job 1 and job 2 are started simultaneously and job 2 is completed before job 1, then the result order is out of sync. I would like to cache the output on the fly somehow to synchronize the output order, but I fail to see how this can be done?
#!/usr/bin/env ruby
require 'pp'
DEBUG = false
CPUS = 2
module Enumerable
# Fork each (feach) creates a fork pool with a specified number of processes
# to iterate over the Enumerable object processing the specified block.
# Calling feach with :processes => 0 disables forking for debugging purposes.
# It is possible to disable synchronized output with :synchronize => false
# which will save some overhead.
#
# #example - process 10 elements using 4 processes:
#
# (0 ... 10).feach(:processes => 4) { |i| puts i; sleep 1 }
def feach(options = {}, &block)
$stderr.puts "Parent pid: #{Process.pid}" if DEBUG
procs = options[:processes] || 0
sync = options[:synchronize] || true
if procs > 0
workers = spawn_workers(procs, &block)
threads = []
self.each_with_index do |elem, index|
$stderr.puts "elem: #{elem} index: #{index}" if DEBUG
threads << Thread.new do
worker = workers[index % procs]
worker.process(elem)
end
if threads.size == procs
threads.each { |thread| thread.join }
threads = []
end
end
threads.each { |thread| thread.join }
workers.each { |worker| worker.terminate }
else
self.each do |elem|
block.call(elem)
end
end
end
def spawn_workers(procs, &block)
workers = []
procs.times do
child_read, parent_write = IO.pipe
parent_read, child_write = IO.pipe
pid = Process.fork do
begin
parent_write.close
parent_read.close
call(child_read, child_write, &block)
ensure
child_read.close
child_write.close
end
end
child_read.close
child_write.close
$stderr.puts "Spawning worker with pid: #{pid}" if DEBUG
workers << Worker.new(parent_read, parent_write, pid)
end
workers
end
def call(child_read, child_write, &block)
while not child_read.eof?
elem = Marshal.load(child_read)
$stderr.puts " call with Process.pid: #{Process.pid}" if DEBUG
result = block.call(elem)
Marshal.dump(result, child_write)
end
end
class Worker
attr_reader :parent_read, :parent_write, :pid
def initialize(parent_read, parent_write, pid)
#parent_read = parent_read
#parent_write = parent_write
#pid = pid
end
def process(elem)
Marshal.dump(elem, #parent_write)
$stderr.puts " process with worker pid: #{#pid} and parent pid: #{Process.pid}" if DEBUG
Marshal.load(#parent_read)
end
def terminate
$stderr.puts "Terminating worker with pid: #{#pid}" if DEBUG
Process.wait(#pid, Process::WNOHANG)
#parent_read.close
#parent_write.close
end
end
end
def fib(n) n < 2 ? n : fib(n-1)+fib(n-2); end # Lousy Fibonacci calculator <- heavy job
(0 ... 10).feach(processes: CPUS) { |i| puts "#{i}: #{fib(35)}" }
There is no way to sync the output unless you force all the child processes to send their output to the parent and have it sort the results, or you enforce some kind of I/O locking between processes.
Without knowing what your long term goal is it's difficult to suggest a solution. In general, you'll need a lot of work in each process to gain any signficant speedup using fork and there is not a simple way to get results back to the main program.
Native Threads( pthreads on linux) might make more sense to accomplish what you are trying to do, however not all versions of Ruby support threads at that level. See :
Does ruby have real multithreading?

How to rspec threaded code?

Starting using rspec I have difficulties trying to test threaded code.
Here is a simplicfication of a code founded, and I made it cause i need a Queue with Timeout capabilities
require "thread"
class TimeoutQueue
def initialize
#lock = Mutex.new
#items = []
#new_item = ConditionVariable.new
end
def push(obj)
#lock.synchronize do
#items.push(obj)
#new_item.signal
end
end
def pop(timeout = :never)
timeout += Time.now unless timeout == :never
#lock.synchronize do
loop do
time_left = timeout == :never ? nil : timeout - Time.now
if #items.empty? and time_left.to_f >= 0
#new_item.wait(#lock, time_left)
end
return #items.shift unless #items.empty?
next if timeout == :never or timeout > Time.now
return nil
end
end
end
alias_method :<<, :push
end
But I can't find a way to test it using rspec. Is there any effective documentation on testing threaded code? Any gem that can helps me?
I'm a bit blocked, thanks in advance
When unit-testing we don't want any non-deterministic behavior to affect our tests, so when testing threading we should not run anything in parallel.
Instead, we should isolate our code, and simulate the cases we want to test, by stubbing #lock, #new_item, and perhaps even Time.now (to be more readable I've taken the liberty to imagine you also have attr_reader :lock, :new_item):
it 'should signal after push' do
allow(subject.lock).to receive(:synchronize).and_yield
expect(subject.new_item).to receive(:signal)
subject.push('object')
expect(subject.items).to include('object')
end
it 'should time out if taken to long to enter synchronize loop' do
#now = Time.now
allow(Time).to receive(:now).and_return(#now, #now + 10.seconds)
allow(subject.items).to receive(:empty?).and_return true
allow(subject.lock).to receive(:synchronize).and_yield
expect(subject.new_item).to_not receive(:wait)
expect(subject.pop(5.seconds)).to be_nil
end
etc...

Ruby EventMachine & functions

I'm reading a Redis set within an EventMachine reactor loop using a suitable Redis EM gem ('em-hiredis' in my case) and have to check if some Redis sets contain members in a cascade. My aim is to get the name of the set which is not empty:
require 'eventmachine'
require 'em-hiredis'
def fetch_queue
#redis.scard('todo').callback do |scard_todo|
if scard_todo.zero?
#redis.scard('failed_1').callback do |scard_failed_1|
if scard_failed_1.zero?
#redis.scard('failed_2').callback do |scard_failed_2|
if scard_failed_2.zero?
#redis.scard('failed_3').callback do |scard_failed_3|
if scard_failed_3.zero?
EM.stop
else
queue = 'failed_3'
end
end
else
queue = 'failed_2'
end
end
else
queue = 'failed_1'
end
end
else
queue = 'todo'
end
end
end
EM.run do
#redis = EM::Hiredis.connect "redis://#{HOST}:#{PORT}"
# How to get the value of fetch_queue?
foo = fetch_queue
puts foo
end
My question is: how can I tell EM to return the value of 'queue' in 'fetch_queue' to use it in the reactor loop? a simple "return queue = 'todo'", "return queue = 'failed_1'" etc. in fetch_queue results in "unexpected return (LocalJumpError)" error message.
Please for the love of debugging use some more methods, you wouldn't factor other code like this, would you?
Anyway, this is essentially what you probably want to do, so you can both factor and test your code:
require 'eventmachine'
require 'em-hiredis'
# This is a simple class that represents an extremely simple, linear state
# machine. It just walks the "from" parameter one by one, until it finds a
# non-empty set by that name. When a non-empty set is found, the given callback
# is called with the name of the set.
class Finder
def initialize(redis, from, &callback)
#redis = redis
#from = from.dup
#callback = callback
end
def do_next
# If the from list is empty, we terminate, as we have no more steps
unless #current = #from.shift
EM.stop # or callback.call :error, whatever
end
#redis.scard(#current).callback do |scard|
if scard.zero?
do_next
else
#callback.call #current
end
end
end
alias go do_next
end
EM.run do
#redis = EM::Hiredis.connect "redis://#{HOST}:#{PORT}"
finder = Finder.new(redis, %w[todo failed_1 failed_2 failed_3]) do |name|
puts "Found non-empty set: #{name}"
end
finder.go
end

Resources