How to wait for all threads to complete before executing next line - ruby

I have something like below:
all_hosts.each do |hostname|
Thread.new {
...
}
end
# next line of execution
Each of the hosts above opens its own thread and executes the commands. I want to wait for all threads to finish executing before moving onto next part of file. Is there an easy way of doing this?

Use Thread#join which will wait termination of the thread.
To do that you need to save threads; so use map instead of each:
threads = all_hosts.map do |hostname|
Thread.new {
# commands
}
end
threads.each(&:join)

The Thread documentation explains it:
Alternatively, you can use an array for handling multiple threads at once, like in the following example:
threads = []
threads << Thread.new { puts "Whats the big deal" }
threads << Thread.new { 3.times { puts "Threads are fun!" } }
After creating a few threads we wait for them all to finish consecutively.
threads.each { |thr| thr.join }
Applied to your code:
threads = []
all_hosts.each do |hostname|
threads << Thread.new { ... }
end
threads.each(&:join)
# next line of execution

Related

Ruby and Threads - Why isn't my script finishing?

I have a script that provisions nodes for a cluster remotely using Thread so I can do all of them at once.
Is there a better way I can do this so I get a return code, a clean exit, and a script that does not seem to run for an inordinate amount of time?
#threads = []
def process_node(apps_to_close, node)
#threads << Thread.new do
puts "now processing node #{node}\n"
...
end
end
nodes.each do |node|
process_node(apps_to_close, node)
end
#threads.each {|thr| thr.join}
You can a limit to the amount of time you allow threads to run via join, which returns nil if the thread is still running when the timeout is reached. So you could rewrite this as:
...
results = #threads.collect { |thr| thr.join(1) } # wait 1 second for thread
#threads.each { |t| Thread.kill(t) }
results.any?(&:nil?) ? exit(1) : exit(0)

Ruby: Wait for all threads completed using join and ThreadsWait.all_waits - what the difference?

Consider the following example:
threads = []
(0..10).each do |_|
threads << Thread.new do
# do async staff there
sleep Random.rand(10)
end
end
Then there is 2 ways to wait when it's done:
Using join:
threads.each(&:join)
Using ThreadsWait:
ThreadsWait.all_waits(threads)
Is there any difference between these two ways of doing this?
I know that the ThreadsWait class has other useful methods.
And asking especially about all_waits method.
The documentation clearly states that all_waits will execute any passed block after each thread's execution; join doesn't offer anything like this.
require "thwait"
threads = [Thread.new { 1 }, Thread.new { 2 }]
ThreadsWait.all_waits(threads) do |t|
puts "#{t} complete."
end # will return nil
# output:
# #<Thread:0x00000002773268> complete.
# #<Thread:0x00000002772ea8> complete.
To accomplish the same with join, I imagine you would have to do this:
threads.each do |t|
t.join
puts "#{t} complete."
end # will return threads
Apart from this, the all_waits methods eventually calls the join_nowait method which processes each thread by calling join on it.
Without any block, I would imagine that directly using join would be faster since you would cut back on all ThreadsWait methods leading up to it. So I gave it a shot:
require "thwait"
require "benchmark"
loops = 100_000
Benchmark.bm do |x|
x.report do
loops.times do
threads = [Thread.new { 2 * 1000 }, Thread.new { 4 * 2000 }]
threads.each(&:join)
end
end
x.report do
loops.times do
threads = [Thread.new { 2 * 1000 }, Thread.new { 4 * 2000 }]
ThreadsWait.all_waits(threads)
end
end
end
# results:
# user system total real
# 4.030000 5.750000 9.780000 ( 5.929623 )
# 12.810000 17.060000 29.870000 ( 17.807242 )
Using map instead of each, will wait for them as it needs their values to build the map.
(0..10).map do |_|
Thread.new do
# do async staff there
sleep Random.rand(10)
end
end.map(&:join).map(&:value)

Ruby Parallel each loop

I have a the following code:
FTP ... do |ftp|
files.each do |file|
...
ftp.put(file)
sleep 1
end
end
I'd like to run the each file in a separate thread or some parallel way. What's the correct way to do this? Would this be right?
Here's my try on the parallel gem
FTP ... do |ftp|
Parallel.map(files) do |file|
...
ftp.put(file)
sleep 1
end
end
The issue with parallel is puts/outputs can occur at the same time like so:
as = [1,2,3,4,5,6,7,8]
results = Parallel.map(as) do |a|
puts a
end
How can I force puts to occur like they normally would line separated.
The whole point of parallelization is to run at the same time. But if there's some part of the process that you'd like to run some of the code sequentially you could use a mutex like:
semaphore = Mutex.new
as = [1,2,3,4,5,6,7,8]
results = Parallel.map(as, in_threads: 3) do |a|
# Parallel stuff
sleep rand
semaphore.synchronize {
# Sequential stuff
puts a
}
# Parallel stuff
sleep rand
end
You'll see that it prints stuff correctly but not necesarily in the same order. I used in_threads instead of in_processes (default) because Mutex doesn't work with processes. See below for an alternative if you do need processes.
References:
http://ruby-doc.org/core-2.2.0/Mutex.html
http://dev.housetrip.com/2014/01/28/efficient-cross-processing-locking-in-ruby/
In the interest of keeping it simple, here's what I'd do with built-in Thread:
results = files.map do |file|
result = Thread.new do
ftp.put(file)
end
end
Note that this code assumes that ftp.put(file) returns safely. If that isn't guaranteed, you'll have to do that yourself by wrapping calls in a timeout block and have each thread return an exception if one is thrown and then at the very end of the loop have a blocking check to see that results does not contain any exceptions.

Ruby multithreading questions

I've started looking into multi-threading in Ruby.
So basically, I want to create a few threads, and have them all execute, but not display any of the output until the thread has successfully completed.
Example:
#!/usr/bin/env ruby
t1 = Thread.new {
puts "Hello_1"
sleep(5)
puts "Hello_1 after 5 seconds of sleep"
}
t2 = Thread.new {
puts "Hello_2"
sleep(5)
puts "Hello_2 after 5 seconds of sleep"
}
t1.join
t2.join
puts "Hello_3"
sleep(5)
puts "Hello_3 after 5 seconds of sleep"
The first Hello_1 / Hello_2 execute immediately. I wouldn't want any of the output to show until the thread has successfully completed.
Because puts prints to a single output stream (sysout) you can't use it if you want to capture the output each thread.
You will have to use separate buffered stream for each thread, write to that in each thread, and then dump them to sysout when the thread terminates to see the output.
Here is an example of a thread:
t = Thread.new() do
io = StringIO.new
io << "mary"
io.puts "fred"
io.puts "fred"
puts io.string
end
You will have to pass io to every method in the thread.
or have a look at this for creating a module that redirects stdout for a thread.
But in each thread that your start wrap your code with:
Thread.start do
# capture the STDOUT by storing a StringIO in the thread space
Thread.current[:stdout] = StringIO.new
# Do your stuff.. print using puts
puts 'redirected to StringIO'
# print everything before we exit
STDIO.puts Thread.current[:stdout].string
end.join
You can share a buffer but you should 'synchronize' access to it:
buffer = ""
lock = Mutex.new
t1 = Thread.new {
lock.synchronize{buffer << "Hello_1\n"}
sleep(5)
lock.synchronize{buffer << "Hello_1 after 5 seconds of sleep\n"}
}
t2 = Thread.new {
lock.synchronize{buffer << "Hello_2\n"}
sleep(5)
lock.synchronize{buffer << "Hello_2 after 5 seconds of sleep\n"}
}
t1.join
t2.join
puts buffer

Ruby threading pass control to main

I am programming an application in Ruby which creates a new thread for every new job. So this is like a queue manager, where I check how many threads can be started from a database. Now when a thread finishes, I want to call the method to start a new job (i.e. a new thread). I do not want to create nested threads, so is there any way to join/terminate/exit the calling thread and pass control over to the main thread? Just to make the situation clear, there can be other threads running at this time.
I tried simply joining the calling thread, if its not the main thread and I get the following error;
"thread 0x7f8cf8dcf438 tried to join itself"
Any suggestions will be highly appreciated.
Thanks in advance.
I'd propose two solutions:
the first one is effectively to join on a thread, but join has to be called from the main thread (assuming you started all of your worker threads from the main) :
def thread_proc(s)
sleep rand(5)
puts "#{Thread.current.inspect}: #{s}"
end
strings = ["word", "test", "again", "value", "fox", "car"]
threads = []
2.times {
threads << Thread.new(strings.shift) { |s| thread_proc(s) }
}
while !threads.empty?
threads.each { |t|
t.join
threads << Thread.new(strings.shift) { |s| thread_proc(s) } unless strings.empty?
threads.delete(t)
}
end
but that method is kind of inefficient, because creating threads over and over again induces memory and CPU overhead.
You should better synchronize a fixed pool of reused threads by using a Queue:
require 'thread'
strings = ["word", "test", "again", "value", "fox", "car"]
q = Queue.new
strings.each { |s| q << s }
threads = []
2.times { threads << Thread.new {
while !q.empty?
s = q.pop
sleep(rand(5))
puts "#{Thread.current.inspect}: #{s}"
end
}}
threads.each { |t| t.join }
t1 = Thread.new { Thread.current[:status] = "1"; sleep 10; Thread.pass; sleep 100 }
t2 = Thread.new { Thread.current[:status] = "2"; sleep 1000 }
t3 = Thread.new { Thread.current[:status] = "3"; sleep 1000 }
puts Thread.list.map {|X| x[:status] }
#=> 1,2,3
Thread.list.each do |x|
if x[:status] == 2
x.kill # kill the thread
break
end
end
puts Thread.list.map {|X| x[:status] }
#=> 1,3
"Thread::pass" will pass control to the scheduler which can now schedule any other thread. The thread has voluntarily given up control to the scheduler - we cannot specify to pass control onto a specific thread
"Thread#kill" will kill the instance the thread
"Thread::list" will return the list of threads
Threads are managed by the scheduler, if you want explicit control then checkout fibers. But it has some gotchas, fibers are not supported in JRuby.
also checkout thread local variables, it will help you to communicate the status or return value of the thread, without joining to the thread.
http://github.com/defunkt/resque is a good option for a queue, check it out. Also try JRuby if you are going make heavy use of threads. It' advantage is that it will wrap java threads in ruby goodness.

Resources