How to gracefully shutdown a thread in Ruby - ruby

I have been experimenting multi-threading concept in Ruby for the past a week.
For practising, I am designing a file downloader that makes parallel requests for a collection of URLs. Currently I need to safely shutdown threads when interrupt signal is triggered. I have read the theory of multi-threading and catching a signal at runtime. Yet despite the whole those theoretical knowledge, I still don't have any idea about how to use them in practice.
I am leaving my proof of concept work below, anyhow.
class MultiThread
attr_reader :limit, :threads, :queue
def initialize(limit)
#limit = limit
#threads = []
#queue = Queue.new
end
def add(*args, &block)
queue << [block, args]
end
def invoke
1.upto(limit).each { threads << spawn_thread }
threads.each(&:join)
end
private
def spawn_thread
Thread.new do
Thread.handle_interrupt(RuntimeError => :on_blocking) do
# Nothing to do
end
until queue.empty?
block, args = queue.pop
block&.call(*args)
end
end
end
end
urls = %w[https://example.com]
thread = MultiThread.new(2)
urls.each do |url|
thread.add do
puts "Downloading #{url}..."
sleep 1
end
end
thread.invoke

Yeah, the docs for handle_interrupt are confusing. Try this, which I based on the connection_pool gem used by e.g. puma.
$stdout.sync = true
threads = 3.times.map { |i|
Thread.new {
Thread.handle_interrupt(Exception => :never) do
begin
Thread.handle_interrupt(Exception => :immediate) do
puts "Thread #{i} doing work"
sleep 1000
end
ensure
puts "Thread #{i} cleaning up"
end
end
}
}
Signal.trap("INT") {
puts 'Exiting gracefully'
threads.each { |t|
puts 'killing thread'
t.kill
}
exit
}
threads.each { |t| t.join }
Output:
Thread 1 doing work
Thread 2 doing work
Thread 0 doing work
^CExiting gracefully
killing thread
killing thread
killing thread
Thread 0 cleaning up
Thread 1 cleaning up
Thread 2 cleaning up

Related

Ruby synchronisation: How to make threads work one after another in proper order?

My problem is that I don't know how synchronise multiple threads using Ruby. The task is to create six threads and start them immediately. All of them should do some work (for example puts "Thread 1" Hi") one after another in the order I need it to work.
I've tried to work with Mutex, Monitor and Condition Variable, but all of them worked in random order. Could anybody explain how to achieve my goal?
After some time of struggling with Mutex and Condition Variable I've achieved my goal.
This code is a little bit messy, and I intentionally did't use cycles for "clearer view".
cv = ConditionVariable.new
mutex = Mutex.new
mutex2 = Mutex.new
cv2 = ConditionVariable.new
mutex3 = Mutex.new
cv3 = ConditionVariable.new
mutex4 = Mutex.new
cv4 = ConditionVariable.new
mutex5 = Mutex.new
cv5 = ConditionVariable.new
mutex6 = Mutex.new
cv6 = ConditionVariable.new
Thread.new do
mutex.synchronize {
puts 'First: Hi'
cv.wait(mutex)
puts 'First: Bye'
#cv.wait(mutex)
cv.signal
puts 'First: One more time'
}
end
Thread.new do
mutex.synchronize {
puts 'Second: Hi'
cv.signal
cv.wait(mutex)
puts 'Second:Bye'
cv.signal
}
mutex2.synchronize {
puts 'Second: Starting third'
cv2.signal
}
end
Thread.new do
mutex2.synchronize {
cv2.wait(mutex2)
puts 'Third: Hi'
}
mutex3.synchronize {
puts 'Third: Starting forth'
cv3.signal
}
end
Thread.new do
mutex3.synchronize {
cv3.wait(mutex3)
puts 'Forth: Hi'
}
mutex4.synchronize {
puts 'Forth: Starting fifth'
cv4.signal
}
end
Thread.new do
mutex4.synchronize {
cv4.wait(mutex4)
puts 'Fifth: Hi'
}
mutex5.synchronize {
puts 'Fifth: Starting sixth'
cv5.signal
}
end
Thread.new {
mutex5.synchronize {
cv5.wait(mutex5)
puts 'Sixth:Hi'
}
}
sleep 2
Using Queue as a PV Semaphore
You can abuse Queue, using it like a traditional PV Semaphore. To do this, you create an instance of Queue:
require 'thread'
...
sem = Queue.new
When a thread needs to wait, it calls Queue#deq:
# waiting thread
sem.deq
When some other thread wants to unblock the waiting thread, it pushes something (anything) onto the queue:
# another thread that wants to unblock the waiting thread
sem.enq :go
A Worker class
Here's a worker class that uses Queue to synchronize its start and stop:
class Worker
def initialize(worker_number)
#start = Queue.new
Thread.new do
#start.deq
puts "Thread #{worker_number}"
#when_done.call
end
end
def start
#start.enq :start
end
def when_done(&block)
#when_done = block
end
end
When constructed, a worker creates a thread, but that thread then waits on the #start queue. Not until #start is called will the thread unblock.
When done, the thread will execute the block that was called to #when_done. We'll see how this is used in just a moment.
Creating workers
First, let's make sure that if any threads raise an exception, we get to find out about it:
Thread.abort_on_exception = true
We'll need six workers:
workers = (1..6).map { |i| Worker.new(i) }
Telling each worker what to do when it's done
Here's where #when_done comes into play:
workers.each_cons(2) do |w1, w2|
w1.when_done { w2.start }
end
This takes each pair of workers in turn. Each worker except the last is told, that when it finishes, it should start the worker after it. That just leaves the last worker. When it finishes, we want it to notify this thread:
all_done = Queue.new
workers.last.when_done { all_done.enq :done }
Let's Go!
Now all that remains is to start the first thread:
workers.first.start
and wait for the last thread to finish:
all_done.deq
The output:
Thread 1
Thread 2
Thread 3
Thread 4
Thread 5
Thread 6
If you're just getting started with threads, you might want to try something simple. Let the 1st thread sleep for 1 second, the 2nd for 2 seconds, the 3rd for 3 seconds and so on:
$stdout.sync = true
threads = []
(1..6).each do |i|
threads << Thread.new {
sleep i
puts "Hi from thread #{i}"
}
end
threads.each(&:join)
Output (takes 6 seconds because the threads run in parallel):
Hi from thread 1
Hi from thread 2
Hi from thread 3
Hi from thread 4
Hi from thread 5
Hi from thread 6
You can assign each a number, which will denote its place in the queue, and check it to see whose turn it is:
class QueuedWorker
def initialize(mutex, condition_variable, my_turn)
#mutex = mutex
#my_turn = my_turn
#condition_variable = condition_variable
end
def self.turn
#turn ||= 0
end
def self.done
#turn = turn + 1
end
def run
loop do
#mutex.synchronize do
if QueuedWorker.turn == #my_turn
# do actual work
QueuedWorker.done
#condition_variable.signal
return
end
#condition_variable.signal
#condition_variable.wait(#mutex)
end
end
end
end
mutex = Mutex.new
cv = ConditionVariable.new
(0..10).each do |i|
Thread.new do
QueueWorker.new(mutex, cv, i).run
end
end
That being said, the implementation is awkward, since threading are specifically not built for serial work. If you need something to work serially, do it in a single thread.

Ruby: Synchronizing fork pool output

I am trying to create a generic way of iterating Enumerables using multiple processors. I am spawning a given number of workers using fork, and feeding them data to process reusing idle workers. However, I would like to synchronize the input and output order. If job 1 and job 2 are started simultaneously and job 2 is completed before job 1, then the result order is out of sync. I would like to cache the output on the fly somehow to synchronize the output order, but I fail to see how this can be done?
#!/usr/bin/env ruby
require 'pp'
DEBUG = false
CPUS = 2
module Enumerable
# Fork each (feach) creates a fork pool with a specified number of processes
# to iterate over the Enumerable object processing the specified block.
# Calling feach with :processes => 0 disables forking for debugging purposes.
# It is possible to disable synchronized output with :synchronize => false
# which will save some overhead.
#
# #example - process 10 elements using 4 processes:
#
# (0 ... 10).feach(:processes => 4) { |i| puts i; sleep 1 }
def feach(options = {}, &block)
$stderr.puts "Parent pid: #{Process.pid}" if DEBUG
procs = options[:processes] || 0
sync = options[:synchronize] || true
if procs > 0
workers = spawn_workers(procs, &block)
threads = []
self.each_with_index do |elem, index|
$stderr.puts "elem: #{elem} index: #{index}" if DEBUG
threads << Thread.new do
worker = workers[index % procs]
worker.process(elem)
end
if threads.size == procs
threads.each { |thread| thread.join }
threads = []
end
end
threads.each { |thread| thread.join }
workers.each { |worker| worker.terminate }
else
self.each do |elem|
block.call(elem)
end
end
end
def spawn_workers(procs, &block)
workers = []
procs.times do
child_read, parent_write = IO.pipe
parent_read, child_write = IO.pipe
pid = Process.fork do
begin
parent_write.close
parent_read.close
call(child_read, child_write, &block)
ensure
child_read.close
child_write.close
end
end
child_read.close
child_write.close
$stderr.puts "Spawning worker with pid: #{pid}" if DEBUG
workers << Worker.new(parent_read, parent_write, pid)
end
workers
end
def call(child_read, child_write, &block)
while not child_read.eof?
elem = Marshal.load(child_read)
$stderr.puts " call with Process.pid: #{Process.pid}" if DEBUG
result = block.call(elem)
Marshal.dump(result, child_write)
end
end
class Worker
attr_reader :parent_read, :parent_write, :pid
def initialize(parent_read, parent_write, pid)
#parent_read = parent_read
#parent_write = parent_write
#pid = pid
end
def process(elem)
Marshal.dump(elem, #parent_write)
$stderr.puts " process with worker pid: #{#pid} and parent pid: #{Process.pid}" if DEBUG
Marshal.load(#parent_read)
end
def terminate
$stderr.puts "Terminating worker with pid: #{#pid}" if DEBUG
Process.wait(#pid, Process::WNOHANG)
#parent_read.close
#parent_write.close
end
end
end
def fib(n) n < 2 ? n : fib(n-1)+fib(n-2); end # Lousy Fibonacci calculator <- heavy job
(0 ... 10).feach(processes: CPUS) { |i| puts "#{i}: #{fib(35)}" }
There is no way to sync the output unless you force all the child processes to send their output to the parent and have it sort the results, or you enforce some kind of I/O locking between processes.
Without knowing what your long term goal is it's difficult to suggest a solution. In general, you'll need a lot of work in each process to gain any signficant speedup using fork and there is not a simple way to get results back to the main program.
Native Threads( pthreads on linux) might make more sense to accomplish what you are trying to do, however not all versions of Ruby support threads at that level. See :
Does ruby have real multithreading?

EventMachine with em-synchrony I need to correctly throttle my http requests

I have a consumer which pulls messages off of a queue via an evented subscription. It takes those messages and then connects with a rather slow http interface. I have a worker pool of 8 and once those are all filled up I need to stop pulling requests from the queue and have the fibers that are working on the http jobs keep working. Here is an example I've thrown together.
def send_request(callback)
EM.synchrony do
while $available <= 0
sleep 2
puts "sleeping"
end
url = 'http://example.com/api/Restaurant/11111/images/?image%5Bremote_url%5D=https%3A%2F%2Firs2.4sqi.net%2Fimg%2Fgeneral%2Foriginal%2F8NMM4yhwsLfxF-wgW0GA8IJRJO8pY4qbmCXuOPEsUTU.jpg&image%5Bsource_type_enum%5D=3'
result = EM::Synchrony.sync EventMachine::HttpRequest.new(url, :inactivity_timeout => 0).send("apost", :head => {:Accept => 'services.v1'})
callback.call(result.response)
end
end
def display(value)
$available += 1
puts value.inspect
end
$available = 8
EM.run do
EM.add_periodic_timer(0.001) do
$available -= 1
puts "Available: #{$available}"
puts "Tick ..."
puts send_request(method(:display))
end
end
I have found that if I call sleep within a while loop in the synchrony block, the reactor loop gets stuck. If I call sleep within an if statement(sleeping just once) then most times it is enough time for the requests to finish but it is unreliable at best. If I use EM::Synchrony.sleep, then the main reactor loop will keep creating new requests.
Is there a way to pause the main loop but have the fibers finish their execution?
sleep 2
...
add_periodic_timer(0.001)
Are you serious?
Have you ever though how many send_request's are sleeping in the loop? And it's adding 1000 every second.
What about this:
require 'eventmachine'
require 'em-http'
require 'fiber'
class Worker
URL = 'http://example.com/api/whatever'
def initialize callback
#callback = callback
end
def work
f = Fiber.current
loop do
http = EventMachine::HttpRequest.new(URL).get :timeout => 20
http.callback do
#callback.call http.response
f.resume
end
http.errback do
f.resume
end
Fiber.yield
end
end
end
def display(value)
puts "Done: #{value.size}"
end
EventMachine.run do
8.times do
Fiber.new do
Worker.new(method(:display)).work
end.resume
end
end

What use can I give to Ruby threads, if they are not really parallel?

When I first discovered threads, I tried checking that they actually worked as expected by calling sleep in many threads, versus calling sleep normally. It worked, and I was very happy.
But then a friend of mine told me that these threads weren't really parallel, and that sleep must be faking it.
So now I wrote this test to do some real processing:
class Test
ITERATIONS = 1000
def run_threads
start = Time.now
t1 = Thread.new do
do_iterations
end
t2 = Thread.new do
do_iterations
end
t3 = Thread.new do
do_iterations
end
t4 = Thread.new do
do_iterations
end
t1.join
t2.join
t3.join
t4.join
puts Time.now - start
end
def run_normal
start = Time.now
do_iterations
do_iterations
do_iterations
do_iterations
puts Time.now - start
end
def do_iterations
1.upto ITERATIONS do |i|
999.downto(1).inject(:*) # 999!
end
end
end
And now I'm very sad, because run_threads() not only didn't perform better than run_normal, it was even slower!
Then why should I complicate my application with threads, if they aren't really parallel?
** UPDATE **
#fl00r said that I could take advantage of threads if I used them for IO tasks, so I wrote two more variations of do_iterations:
def do_iterations
# filesystem IO
1.upto ITERATIONS do |i|
5.times do
# create file
content = "some content #{i}"
file_name = "#{Rails.root}/tmp/do-iterations-#{UUIDTools::UUID.timestamp_create.hexdigest}"
file = ::File.new file_name, 'w'
file.write content
file.close
# read and delete file
file = ::File.new file_name, 'r'
content = file.read
file.close
::File.delete file_name
end
end
end
def do_iterations
# MongoDB IO (through MongoID)
1.upto ITERATIONS do |i|
TestModel.create! :name => "some-name-#{i}"
end
TestModel.delete_all
end
The performance results are still the same: normal > threads.
But now I'm not sure if my VM is able to use all the cores. Will be back when I have tested that.
Threads could be faster only if you have got some slow IO.
In Ruby you have got Global Interpreter Lock, so only one Thread can work at a time. So, Ruby spend many time to manage which Thread should be fired at a moment (thread scheduling). So in your case, when there is no any IO it will be slower!
You can use Rubinius or JRuby to use real Threads.
Example with IO:
module Test
extend self
def run_threads(method)
start = Time.now
threads = []
4.times do
threads << Thread.new{ send(method) }
end
threads.each(&:join)
puts Time.now - start
end
def run_forks(method)
start = Time.now
4.times do
fork do
send(method)
end
end
Process.waitall
puts Time.now - start
end
def run_normal(method)
start = Time.now
4.times{ send(method) }
puts Time.now - start
end
def do_io
system "sleep 1"
end
def do_non_io
1000.times do |i|
999.downto(1).inject(:*) # 999!
end
end
end
Test.run_threads(:do_io)
#=> ~ 1 sec
Test.run_forks(:do_io)
#=> ~ 1 sec
Test.run_normal(:do_io)
#=> ~ 4 sec
Test.run_threads(:do_non_io)
#=> ~ 7.6 sec
Test.run_forks(:do_non_io)
#=> ~ 3.5 sec
Test.run_normal(:do_non_io)
#=> ~ 7.2 sec
IO jobs are 4 times faster in Threads and Processes while non-IO jobs in Processes a twice as fast then Threads and sync methods.
Also in Ruby presents Fibers lightweight "corutines" and awesome em-synchrony gem to handle asynchronous processes
fl00r is right, the global interpretor lock prevents multiple threads running at the same time in ruby, except for IO.
The parallel library is a very simple library that is useful for truly parallel operations. Install with gem install parallel. Here is your example rewritten to use it:
require 'parallel'
class Test
ITERATIONS = 1000
def run_parallel()
start = Time.now
results = Parallel.map([1,2,3,4]) do |val|
do_iterations
end
# do what you want with the results ...
puts Time.now - start
end
def run_normal
start = Time.now
do_iterations
do_iterations
do_iterations
do_iterations
puts Time.now - start
end
def do_iterations
1.upto ITERATIONS do |i|
999.downto(1).inject(:*) # 999!
end
end
end
On my computer (4 cpus), Test.new.run_normal takes 4.6 seconds, while Test.new.run_parallel takes 1.65 seconds.
The behavior of threads is defined by the implementation. JRuby, for example, implements threads with JVM threads, which in turn uses real threads.
The Global Interpreter Lock is only there for historic reasons. If Ruby 1.9 had simply introduced real threads out of nowhere, backwards compatibility would have been broken, and it would have slowed down its adoption even more.
This answer by Jörg W Mittag provides an excellent comparison between the threading models of various Ruby implementations. Choose one which is appropriate for your needs.
With that said, threads can be used to wait for a child process to finish:
pid = Process.spawn 'program'
thread = Process.detach pid
# Later...
status = thread.value.exitstatus
Even if Threads don't execute in parallel they can be a very effective, simple way of accomplishing some tasks, such as in-process cron-type jobs. For example:
Thread.new{ loop{ download_nightly_logfile_data; sleep TWENTY_FOUR_HOURS } }
Thread.new{ loop{ send_email_from_queue; sleep ONE_MINUTE } }
# web server app that queues mail on actions and shows current log file data
I also use Threads in a DRb server to handle long-running calculations for one of my web applications. The web server starts a calculation in a thread and immediately continues responding to web requests. It can periodically peek in on the status of the job and see how it's progressing. For more details, read DRb Server for Long-Running Web Processes.
For a simple way to see the difference, use Sleep instead of the IO which also relies on too many variables:
class Test
ITERATIONS = 1000
def run_threads
start = Time.now
threads = []
20.times do
threads << Thread.new do
do_iterations
end
end
threads.each {|t| t.join } # also can be written: threads.each &:join
puts Time.now - start
end
def run_normal
start = Time.now
20.times do
do_iterations
end
puts Time.now - start
end
def do_iterations
sleep(10)
end
end
this will have a difference between the threaded solution even on MRB, with the GIL

Stop Ruby - jRuby - thread after a certain time

I'm trying to create a simple multithreaded program with jRuby. It needs to start and stop threads based on a specified amount of time e.g. run for five seconds then stop. I'm pretty new to this sort of stuff, so it's probably pretty basic but I can't get it to work.
The relevant code looks like this:
require 'java'
require 'timeout'
require './lib/t1.rb'
require './lib/t2.rb'
class Threads
[...]
def manage_threads
thread2 = T2.new
# Wait for 5 seconds before the thread starts running..
thread2.run(wait_time = 5)
Timeout::timeout(10) do
thread1 = T1.new {}
end
end
class T1 < Thread
def initialize
while super.status != "sleep"
puts "Thread 1"
sleep(1)
end
end
end
class T2
include java.lang.Runnable
def run wait_time
thread = Thread.new do
sleep(wait_time)
loop do
puts "Thread 2"
sleep(1)
end
end
end
def stop_thread(after_run_time)
sleep(after_run_time)
end
end
I have already tried a couple if things, for example:
# Used timeout
Timeout::timeout(10) do
thread1 = T1.new {}
end
# This kinda works, except that it terminates the program and therefore isn't the behavior
# I want.
Does anyone have a suggestion on how to 1. start a thread, run it for a while. 2. Start a new thread, run both thread in parallel. 2. Stop thread 1 but keep running thread 2. Any tips/suggestions would be appreciated.
I think I solved it.
This did the trick:
def run wait_time
thread = Thread.new do
sleep(wait_time)
second_counter = 0
loop do
puts "Thread 2"
second_counter += 1
if second_counter == 15
sleep
end
sleep(1)
end
end
end

Resources