Ruby and Threads - Why isn't my script finishing? - ruby

I have a script that provisions nodes for a cluster remotely using Thread so I can do all of them at once.
Is there a better way I can do this so I get a return code, a clean exit, and a script that does not seem to run for an inordinate amount of time?
#threads = []
def process_node(apps_to_close, node)
#threads << Thread.new do
puts "now processing node #{node}\n"
...
end
end
nodes.each do |node|
process_node(apps_to_close, node)
end
#threads.each {|thr| thr.join}

You can a limit to the amount of time you allow threads to run via join, which returns nil if the thread is still running when the timeout is reached. So you could rewrite this as:
...
results = #threads.collect { |thr| thr.join(1) } # wait 1 second for thread
#threads.each { |t| Thread.kill(t) }
results.any?(&:nil?) ? exit(1) : exit(0)

Related

Limit the number of threads in an iteration ruby

When I have my code like this, I get "can't create thread, resource temporarily unavailable". There are over 24k files in the directory to process.
frames.each do |image|
Thread.new do
pipeline = ImageProcessing::MiniMagick.
source(File.open("original/#{image}"))
.append("-fuzz", "30%")
.append("-transparent", "#ff00fe")
result = pipeline.call
puts result.path
file_parts = image.split("_")
frame_number = file_parts[2]
FileUtils.cp(result.path, "transparent/image_transparent_#{frame_number}")
puts "Done with #{image}!"
puts "#{Dir.children("transparent").count.to_s} / #{Dir.children("original").count.to_s}"
puts "\n"
end
end.each{ |thread| thread.join }
So, I tried the first 1001 files by calling the index 0-1000, and did it this way:
frames[0..1000].each_with_index do |image, index|
thread = Thread.new do
pipeline = ImageProcessing::MiniMagick.
source(File.open("original/#{image}"))
.append("-fuzz", "30%")
.append("-transparent", "#ff00fe")
result = pipeline.call
puts result.path
file_parts = image.split("_")
frame_number = file_parts[2]
FileUtils.cp(result.path, "transparent/image_transparent_#{frame_number}")
puts "Done with #{image}!"
puts "#{Dir.children("transparent").count.to_s} / #{Dir.children("original").count.to_s}"
puts "\n"
end
thread.join
end
And while this is processing, the speed seems to be about the same as if it was on a single thread when I'm watching it in the Terminal.
But I want the code to be able to limit to whatever the OS will allow before it disallows, so that it can parse through them all faster.
Or at lease:
Find the maximum threads allowed
Get original directory's count, divided by the number of threads allowed.
Run this each in batches of that division.

Ruby: Wait for all threads completed using join and ThreadsWait.all_waits - what the difference?

Consider the following example:
threads = []
(0..10).each do |_|
threads << Thread.new do
# do async staff there
sleep Random.rand(10)
end
end
Then there is 2 ways to wait when it's done:
Using join:
threads.each(&:join)
Using ThreadsWait:
ThreadsWait.all_waits(threads)
Is there any difference between these two ways of doing this?
I know that the ThreadsWait class has other useful methods.
And asking especially about all_waits method.
The documentation clearly states that all_waits will execute any passed block after each thread's execution; join doesn't offer anything like this.
require "thwait"
threads = [Thread.new { 1 }, Thread.new { 2 }]
ThreadsWait.all_waits(threads) do |t|
puts "#{t} complete."
end # will return nil
# output:
# #<Thread:0x00000002773268> complete.
# #<Thread:0x00000002772ea8> complete.
To accomplish the same with join, I imagine you would have to do this:
threads.each do |t|
t.join
puts "#{t} complete."
end # will return threads
Apart from this, the all_waits methods eventually calls the join_nowait method which processes each thread by calling join on it.
Without any block, I would imagine that directly using join would be faster since you would cut back on all ThreadsWait methods leading up to it. So I gave it a shot:
require "thwait"
require "benchmark"
loops = 100_000
Benchmark.bm do |x|
x.report do
loops.times do
threads = [Thread.new { 2 * 1000 }, Thread.new { 4 * 2000 }]
threads.each(&:join)
end
end
x.report do
loops.times do
threads = [Thread.new { 2 * 1000 }, Thread.new { 4 * 2000 }]
ThreadsWait.all_waits(threads)
end
end
end
# results:
# user system total real
# 4.030000 5.750000 9.780000 ( 5.929623 )
# 12.810000 17.060000 29.870000 ( 17.807242 )
Using map instead of each, will wait for them as it needs their values to build the map.
(0..10).map do |_|
Thread.new do
# do async staff there
sleep Random.rand(10)
end
end.map(&:join).map(&:value)

How to wait for all threads to complete before executing next line

I have something like below:
all_hosts.each do |hostname|
Thread.new {
...
}
end
# next line of execution
Each of the hosts above opens its own thread and executes the commands. I want to wait for all threads to finish executing before moving onto next part of file. Is there an easy way of doing this?
Use Thread#join which will wait termination of the thread.
To do that you need to save threads; so use map instead of each:
threads = all_hosts.map do |hostname|
Thread.new {
# commands
}
end
threads.each(&:join)
The Thread documentation explains it:
Alternatively, you can use an array for handling multiple threads at once, like in the following example:
threads = []
threads << Thread.new { puts "Whats the big deal" }
threads << Thread.new { 3.times { puts "Threads are fun!" } }
After creating a few threads we wait for them all to finish consecutively.
threads.each { |thr| thr.join }
Applied to your code:
threads = []
all_hosts.each do |hostname|
threads << Thread.new { ... }
end
threads.each(&:join)
# next line of execution

Ruby Pause thread

In ruby, is it possible to cause a thread to pause from a different concurrently running thread.
Below is the code that I've written so far. I want the user to be able to type 'pause thread' and the sample500 thread to pause.
#!/usr/bin/env ruby
# Creates a new thread executes the block every intervalSec for durationSec.
def DoEvery(thread, intervalSec, durationSec)
thread = Thread.new do
start = Time.now
timeTakenToComplete = 0
loopCounter = 0
while(timeTakenToComplete < durationSec && loopCounter += 1)
yield
finish = Time.now
timeTakenToComplete = finish - start
sleep(intervalSec*loopCounter - timeTakenToComplete)
end
end
end
# User input loop.
exit = nil
while(!exit)
userInput = gets
case userInput
when "start thread\n"
sample500 = Thread
beginTime = Time.now
DoEvery(sample500, 0.5, 30) {File.open('abc', 'a') {|file| file.write("a\n")}}
when "pause thread\n"
sample500.stop
when "resume thread"
sample500.run
when "exit\n"
exit = TRUE
end
end
Passing Thread object as argument to DoEvery function makes no sense because you immediately overwrite it with Thread.new, check out this modified version:
def DoEvery(intervalSec, durationSec)
thread = Thread.new do
start = Time.now
Thread.current["stop"] = false
timeTakenToComplete = 0
loopCounter = 0
while(timeTakenToComplete < durationSec && loopCounter += 1)
if Thread.current["stop"]
Thread.current["stop"] = false
puts "paused"
Thread.stop
end
yield
finish = Time.now
timeTakenToComplete = finish - start
sleep(intervalSec*loopCounter - timeTakenToComplete)
end
end
thread
end
# User input loop.
exit = nil
while(!exit)
userInput = gets
case userInput
when "start thread\n"
sample500 = DoEvery(0.5, 30) {File.open('abc', 'a') {|file| file.write("a\n")} }
when "pause thread\n"
sample500["stop"] = true
when "resume thread\n"
sample500.run
when "exit\n"
exit = TRUE
end
end
Here DoEvery returns new thread object. Also note that Thread.stop called inside running thread, you can't directly stop one thread from another because it is not safe.
You may be able to better able to accomplish what you are attempting using Ruby Fiber object, and likely achieve better efficiency on the running system.
Fibers are primitives for implementing light weight cooperative
concurrency in Ruby. Basically they are a means of creating code
blocks that can be paused and resumed, much like threads. The main
difference is that they are never preempted and that the scheduling
must be done by the programmer and not the VM.
Keeping in mind the current implementation of MRI Ruby does not offer any concurrent running threads and the best you are able to accomplish is a green threaded program, the following is a nice example:
require "fiber"
f1 = Fiber.new { |f2| f2.resume Fiber.current; while true; puts "A"; f2.transfer; end }
f2 = Fiber.new { |f1| f1.transfer; while true; puts "B"; f1.transfer; end }
f1.resume f2 # =>
# A
# B
# A
# B
# .
# .
# .

Ruby threading pass control to main

I am programming an application in Ruby which creates a new thread for every new job. So this is like a queue manager, where I check how many threads can be started from a database. Now when a thread finishes, I want to call the method to start a new job (i.e. a new thread). I do not want to create nested threads, so is there any way to join/terminate/exit the calling thread and pass control over to the main thread? Just to make the situation clear, there can be other threads running at this time.
I tried simply joining the calling thread, if its not the main thread and I get the following error;
"thread 0x7f8cf8dcf438 tried to join itself"
Any suggestions will be highly appreciated.
Thanks in advance.
I'd propose two solutions:
the first one is effectively to join on a thread, but join has to be called from the main thread (assuming you started all of your worker threads from the main) :
def thread_proc(s)
sleep rand(5)
puts "#{Thread.current.inspect}: #{s}"
end
strings = ["word", "test", "again", "value", "fox", "car"]
threads = []
2.times {
threads << Thread.new(strings.shift) { |s| thread_proc(s) }
}
while !threads.empty?
threads.each { |t|
t.join
threads << Thread.new(strings.shift) { |s| thread_proc(s) } unless strings.empty?
threads.delete(t)
}
end
but that method is kind of inefficient, because creating threads over and over again induces memory and CPU overhead.
You should better synchronize a fixed pool of reused threads by using a Queue:
require 'thread'
strings = ["word", "test", "again", "value", "fox", "car"]
q = Queue.new
strings.each { |s| q << s }
threads = []
2.times { threads << Thread.new {
while !q.empty?
s = q.pop
sleep(rand(5))
puts "#{Thread.current.inspect}: #{s}"
end
}}
threads.each { |t| t.join }
t1 = Thread.new { Thread.current[:status] = "1"; sleep 10; Thread.pass; sleep 100 }
t2 = Thread.new { Thread.current[:status] = "2"; sleep 1000 }
t3 = Thread.new { Thread.current[:status] = "3"; sleep 1000 }
puts Thread.list.map {|X| x[:status] }
#=> 1,2,3
Thread.list.each do |x|
if x[:status] == 2
x.kill # kill the thread
break
end
end
puts Thread.list.map {|X| x[:status] }
#=> 1,3
"Thread::pass" will pass control to the scheduler which can now schedule any other thread. The thread has voluntarily given up control to the scheduler - we cannot specify to pass control onto a specific thread
"Thread#kill" will kill the instance the thread
"Thread::list" will return the list of threads
Threads are managed by the scheduler, if you want explicit control then checkout fibers. But it has some gotchas, fibers are not supported in JRuby.
also checkout thread local variables, it will help you to communicate the status or return value of the thread, without joining to the thread.
http://github.com/defunkt/resque is a good option for a queue, check it out. Also try JRuby if you are going make heavy use of threads. It' advantage is that it will wrap java threads in ruby goodness.

Resources