Ruby threading pass control to main - ruby

I am programming an application in Ruby which creates a new thread for every new job. So this is like a queue manager, where I check how many threads can be started from a database. Now when a thread finishes, I want to call the method to start a new job (i.e. a new thread). I do not want to create nested threads, so is there any way to join/terminate/exit the calling thread and pass control over to the main thread? Just to make the situation clear, there can be other threads running at this time.
I tried simply joining the calling thread, if its not the main thread and I get the following error;
"thread 0x7f8cf8dcf438 tried to join itself"
Any suggestions will be highly appreciated.
Thanks in advance.

I'd propose two solutions:
the first one is effectively to join on a thread, but join has to be called from the main thread (assuming you started all of your worker threads from the main) :
def thread_proc(s)
sleep rand(5)
puts "#{Thread.current.inspect}: #{s}"
end
strings = ["word", "test", "again", "value", "fox", "car"]
threads = []
2.times {
threads << Thread.new(strings.shift) { |s| thread_proc(s) }
}
while !threads.empty?
threads.each { |t|
t.join
threads << Thread.new(strings.shift) { |s| thread_proc(s) } unless strings.empty?
threads.delete(t)
}
end
but that method is kind of inefficient, because creating threads over and over again induces memory and CPU overhead.
You should better synchronize a fixed pool of reused threads by using a Queue:
require 'thread'
strings = ["word", "test", "again", "value", "fox", "car"]
q = Queue.new
strings.each { |s| q << s }
threads = []
2.times { threads << Thread.new {
while !q.empty?
s = q.pop
sleep(rand(5))
puts "#{Thread.current.inspect}: #{s}"
end
}}
threads.each { |t| t.join }

t1 = Thread.new { Thread.current[:status] = "1"; sleep 10; Thread.pass; sleep 100 }
t2 = Thread.new { Thread.current[:status] = "2"; sleep 1000 }
t3 = Thread.new { Thread.current[:status] = "3"; sleep 1000 }
puts Thread.list.map {|X| x[:status] }
#=> 1,2,3
Thread.list.each do |x|
if x[:status] == 2
x.kill # kill the thread
break
end
end
puts Thread.list.map {|X| x[:status] }
#=> 1,3
"Thread::pass" will pass control to the scheduler which can now schedule any other thread. The thread has voluntarily given up control to the scheduler - we cannot specify to pass control onto a specific thread
"Thread#kill" will kill the instance the thread
"Thread::list" will return the list of threads
Threads are managed by the scheduler, if you want explicit control then checkout fibers. But it has some gotchas, fibers are not supported in JRuby.
also checkout thread local variables, it will help you to communicate the status or return value of the thread, without joining to the thread.
http://github.com/defunkt/resque is a good option for a queue, check it out. Also try JRuby if you are going make heavy use of threads. It' advantage is that it will wrap java threads in ruby goodness.

Related

Ruby and Threads - Why isn't my script finishing?

I have a script that provisions nodes for a cluster remotely using Thread so I can do all of them at once.
Is there a better way I can do this so I get a return code, a clean exit, and a script that does not seem to run for an inordinate amount of time?
#threads = []
def process_node(apps_to_close, node)
#threads << Thread.new do
puts "now processing node #{node}\n"
...
end
end
nodes.each do |node|
process_node(apps_to_close, node)
end
#threads.each {|thr| thr.join}
You can a limit to the amount of time you allow threads to run via join, which returns nil if the thread is still running when the timeout is reached. So you could rewrite this as:
...
results = #threads.collect { |thr| thr.join(1) } # wait 1 second for thread
#threads.each { |t| Thread.kill(t) }
results.any?(&:nil?) ? exit(1) : exit(0)

Ruby: Wait for all threads completed using join and ThreadsWait.all_waits - what the difference?

Consider the following example:
threads = []
(0..10).each do |_|
threads << Thread.new do
# do async staff there
sleep Random.rand(10)
end
end
Then there is 2 ways to wait when it's done:
Using join:
threads.each(&:join)
Using ThreadsWait:
ThreadsWait.all_waits(threads)
Is there any difference between these two ways of doing this?
I know that the ThreadsWait class has other useful methods.
And asking especially about all_waits method.
The documentation clearly states that all_waits will execute any passed block after each thread's execution; join doesn't offer anything like this.
require "thwait"
threads = [Thread.new { 1 }, Thread.new { 2 }]
ThreadsWait.all_waits(threads) do |t|
puts "#{t} complete."
end # will return nil
# output:
# #<Thread:0x00000002773268> complete.
# #<Thread:0x00000002772ea8> complete.
To accomplish the same with join, I imagine you would have to do this:
threads.each do |t|
t.join
puts "#{t} complete."
end # will return threads
Apart from this, the all_waits methods eventually calls the join_nowait method which processes each thread by calling join on it.
Without any block, I would imagine that directly using join would be faster since you would cut back on all ThreadsWait methods leading up to it. So I gave it a shot:
require "thwait"
require "benchmark"
loops = 100_000
Benchmark.bm do |x|
x.report do
loops.times do
threads = [Thread.new { 2 * 1000 }, Thread.new { 4 * 2000 }]
threads.each(&:join)
end
end
x.report do
loops.times do
threads = [Thread.new { 2 * 1000 }, Thread.new { 4 * 2000 }]
ThreadsWait.all_waits(threads)
end
end
end
# results:
# user system total real
# 4.030000 5.750000 9.780000 ( 5.929623 )
# 12.810000 17.060000 29.870000 ( 17.807242 )
Using map instead of each, will wait for them as it needs their values to build the map.
(0..10).map do |_|
Thread.new do
# do async staff there
sleep Random.rand(10)
end
end.map(&:join).map(&:value)

How to wait for all threads to complete before executing next line

I have something like below:
all_hosts.each do |hostname|
Thread.new {
...
}
end
# next line of execution
Each of the hosts above opens its own thread and executes the commands. I want to wait for all threads to finish executing before moving onto next part of file. Is there an easy way of doing this?
Use Thread#join which will wait termination of the thread.
To do that you need to save threads; so use map instead of each:
threads = all_hosts.map do |hostname|
Thread.new {
# commands
}
end
threads.each(&:join)
The Thread documentation explains it:
Alternatively, you can use an array for handling multiple threads at once, like in the following example:
threads = []
threads << Thread.new { puts "Whats the big deal" }
threads << Thread.new { 3.times { puts "Threads are fun!" } }
After creating a few threads we wait for them all to finish consecutively.
threads.each { |thr| thr.join }
Applied to your code:
threads = []
all_hosts.each do |hostname|
threads << Thread.new { ... }
end
threads.each(&:join)
# next line of execution

Disposing of a thread in ruby or jruby

I have a rabbitmq queue subscriber that spins up a new thread every time a new message is consumed:
AMQP.start(#conf) do |connection|
channel = AMQP::Channel.new(connection)
requests_queue = channel.queue("one")
requests_queue.subscribe(:ack => true) do |header, body|
puts "we have a message at #{Time.now} and n is #{n}"
url_search = MultiJson.decode(body)
Thread.new do
5.times do
lead = get_lead(n, (n == 5))
puts "message #{n} is_last = #{lead.is_last} at #{Time.now}";
AMQP::Exchange.default.publish(
MultiJson.encode(lead),
:routing_key => header.reply_to,
:correlation_id => header.correlation_id
)
n += 1
sleep(2)
end
end
end
end
My question is, how do I dispose of the thread after the message is handled? Should I be using the threadpool?
I am using JRuby. Does the above code create a Java JVM thread behind the scenes using the normal ruby syntax or should I be explicitly creating a Java thread?
You don't have to manually dispose the thread I think, and you should be using ruby threads, from what I gather they are java threads in jruby, which is from what jruby gets it's nice performance.
A common thing to do is to spin up a couple of threads and then join all of them before continuing if you want to be sure that all are complete before the next step, but it doesn't seem to be required here.
Here's a little test program:
# foo.rb
a = Thread.new { print "a"; sleep(1); print "b"; print "c" }
require 'pp'
pp Thread.list
puts "foo"
sleep(2);
pp Thread.list
puts "bar"
As you can see the spawned background thread is automatically removed. (Tested in jruby as well as 1.9.2
$ ruby foo.rb
a[#<Thread:0x00000100887678 run>, #<Thread:0x0000010086c7d8 sleep>]
foo
bc[#<Thread:0x00000100887678 run>]
bar

Thread and Queue

I am interested in knowing what would be the best way to implement a thread based queue.
For example:
I have 10 actions which I want to execute with only 4 threads. I would like to create a queue with all the 10 actions placed linearly and start the first 4 action with 4 threads, once one of the thread is done executing, the next one will start etc - So at a time, the number of thread is either 4 or less than 4.
There is a Queue class in thread in the standard library. Using that you can do something like this:
require 'thread'
queue = Queue.new
threads = []
# add work to the queue
queue << work_unit
4.times do
threads << Thread.new do
# loop until there are no more things to do
until queue.empty?
# pop with the non-blocking flag set, this raises
# an exception if the queue is empty, in which case
# work_unit will be set to nil
work_unit = queue.pop(true) rescue nil
if work_unit
# do work
end
end
# when there is no more work, the thread will stop
end
end
# wait until all threads have completed processing
threads.each { |t| t.join }
The reason I pop with the non-blocking flag is that between the until queue.empty? and the pop another thread may have pop'ed the queue, so unless the non-blocking flag is set we could get stuck at that line forever.
If you're using MRI, the default Ruby interpreter, bear in mind that threads will not be absolutely concurrent. If your work is CPU bound you may just as well run single threaded. If you have some operation that blocks on IO you may get some parallelism, but YMMV. Alternatively, you can use an interpreter that allows full concurrency, such as jRuby or Rubinius.
There area a few gems that implement this pattern for you; parallel, peach,and mine is called threach (or jruby_threach under jruby). It's a drop-in replacement for #each but allows you to specify how many threads to run with, using a SizedQueue underneath to keep things from spiraling out of control.
So...
(1..10).threach(4) {|i| do_my_work(i) }
Not pushing my own stuff; there are plenty of good implementations out there to make things easier.
If you're using JRuby, jruby_threach is a much better implementation -- Java just offers a much richer set of threading primatives and data structures to use.
Executable descriptive example:
require 'thread'
p tasks = [
{:file => 'task1'},
{:file => 'task2'},
{:file => 'task3'},
{:file => 'task4'},
{:file => 'task5'}
]
tasks_queue = Queue.new
tasks.each {|task| tasks_queue << task}
# run workers
workers_count = 3
workers = []
workers_count.times do |n|
workers << Thread.new(n+1) do |my_n|
while (task = tasks_queue.shift(true) rescue nil) do
delay = rand(0)
sleep delay
task[:result] = "done by worker ##{my_n} (in #{delay})"
p task
end
end
end
# wait for all threads
workers.each(&:join)
# output results
puts "all done"
p tasks
You could use a thread pool. It's a fairly common pattern for this type of problem.
http://en.wikipedia.org/wiki/Thread_pool_pattern
Github seems to have a few implementations you could try out:
https://github.com/search?type=Everything&language=Ruby&q=thread+pool
Celluloid have a worker pool example that does this.
I use a gem called work_queue. Its really practic.
Example:
require 'work_queue'
wq = WorkQueue.new 4, 10
(1..10).each do |number|
wq.enqueue_b("Thread#{number}") do |thread_name|
puts "Hello from the #{thread_name}"
end
end
wq.join

Resources