I main function with a basic loop inside it. I want to fire off a child process for every iteration of the loop (that goes off doing an HTTP request, more on that later).
If I am using processes, my problem is that it looks like each child process continues the execution of the main thread, whereas I want only the main process to go on after the loop, and the children to die after the HTTP req is finished. Main process is not interested in each child to finish before continuing.
Looks something like this now:
data.each do |k, v|
(pid = fork) ? Process.detach(pid) : doHttpQuery(v + ":" + "k")
end
# code after this comment should only get executed once
Also, when the processes finish, I get this
thread.rb:189:in `sleep': deadlock detected (fatal)
If I use threads like this
threads << Thread.new { doHttpQuery(v + ":" + "k")}
and then
threads.each { |thr| thr.join }
The threads are fired, but for some reason it is not actually doing the HTTP request, and the whole process just comes to a halt.
The child process must call exit or exit! to stop executing:
data.each do |k, v|
if pid = fork
Process.detach(pid)
doHttpQuery(v + ":" + "k")
exit
end
end
The difference between exit and exit! is
exit runs at_exit functions. Its default exit code is 0.
exit! does not run at_exit functions. Its default exit code is 1.
Related
This works as expected:
signals = %w[INT TERM]
signals.each do |signal|
Signal.trap(signal) do
puts "trapping the signal and sleeping for 5 seconds"
sleep 5
puts "done, exiting"
exit
end
end
sleep 100
so does this:
# ...signal trap code from above...
t = Thread.new{ sleep 10 }
t.join
so does this:
# ...signal trap code from above...
`sleep 10`
however, this does not:
# ...signal trap code from above...
t = Thread.new{ `sleep 10` }
t.join
For the first three, starting the code and then immediately sending control-c results in ruby waiting 5 seconds before exiting.
For the fourth, starting the code and then immediately sending control-c results in ruby immediately exiting. What's amazing is that the two puts messages, "trapping the..." and "done...", are both printed, but the sleep 5 in between them is seemingly skipped over.
using terrapin instead of backticks produces another clue - terrapin complains that the child processes exited with non-zero state. when attempting to print this status, it prints nothing, perhaps suggesting that the process was unceremoniously hard killed.
So it seems that ruby has some sort of default behavior regarding child processes created within threads that are not the main thread. I suspected this might be by design but I wasn't able to find any documentation or discussion about it.
I also tried
t = Thread.new{ `ls -R /` }
instead of sleep, just in case there was some sort of interaction with competing sleep implementations, same behavior.
Why is this happening?
More Experiments
This also behaves as expected. it waits 10 seconds for the subprocess within the thread to finish. so, the strange behavior only happens in the context of a signal trap.
thread = Thread.new { `sleep 10` }
thread.join
To test if something is happening involving a subthread trapping the signal at the same time, either by design or from a bug. But this behaves as expected:
#main_thread = Thread.current.object_id
puts "main thread: #{#main_thread}"
signals = %w[INT TERM]
signals.each do |signal|
Signal.trap(signal) do
puts Thread.current.object_id
next unless Thread.current.object_id == #main_thread
puts "thread from trap: #{Thread.current.object_id}"
puts "trapping the signal and sleeping for 5 seconds"
sleep 5
puts "done, exiting"
exit
end
end
I have a couple threads running and want to wait for them to finish:
[thread_a, thread_b].each(&:join)
Say that one of the threads suffers an immediate fatal exception as soon as it's started. Meanwhile, the other takes 10 minutes to complete. If we're lucky and thread_a is the thread that fails, we'll get its exception immediately, since it's being joined first. However, if we're unlucky, we have to wait 10 minutes for thread_a to finish, and only then does thread_b get to raise its exception to the parent thread.
I have an existing solution that uses an ensure block in each thread, to insert that thread into a queue when it exits. The parent thread can then poll the queue and join each thread as it finishes. However, I'm wondering if Ruby has a more idiomatic way to handle this?
Existing solution:
dead_threads = Queue.new
threads = 2.times.map do |i|
Thread.new do
begin
case i
when 0; sleep
when 1; raise "I'm the problem thread!"
end
ensure
dead_threads << Thread.current
end
end
end
live_threads_count = threads.size
until live_threads_count == 0
dead_threads.shift.join
live_threads_count -= 1
end
I'm new to ruby and object oriented languages, and I'm having trouble figuring out a way to accomplish forking a process inside a method and passing the delayed output to be used outside the method while also returning the process id.
def method(arg)
proc_id = fork do
var = `command #{arg}`
end
return both = [proc_id, var]
end
This doesn't work as var will return nil since the process has not yet finished. How could I accomplish something like this?
UPDATE:
Using IO.pipe I was able to accomplish Inter-Process Communication. However, trying to use this solution inside a method will not allow me to return both proc_id and var without first waiting for the process to finish which forces me to create new arrays and iterations that would be otherwise unnecessary. The objective here is to have freedom to execute code outside the method while the fork process inside the method is still working.
arg_array = ["arg1", "arg2", "arg3", "arg4"]
input = []
output = []
proc_id = []
arg_array.each_index do |i|
input[i], output[i] = IO.pipe
proc_id[i] = fork do
input[i].close
output[i].write `command #{arg_array[i]}`
end
output[i].close
end
command2
command3
include Process
waitpid(proc_id[0])
command4
Process.waitall
arg_array.each_index do |x|
puts input[x].read
end
You need to use a little more time studying the concept of fork. The parent and child process after a fork cannot communicate (exchange variables) each other without using IPC (Inter-Processs Communication) which is somewhat complicated.
But for your purpose (getting the child process id, and its output), it's easier with Open3.popen2 or Open3.popen3.
http://www.ruby-doc.org/stdlib-1.9.3/libdoc/open3/rdoc/Open3.html#method-c-popen2
if you want to kick something off and save the child pid, that's fairly simple.
pid = fork
if pid
return pid
else
system("command #{arg}")
exit
end
a little bit clumsy, but basically, fork returns the child pid to the parent process, and nil to the child process. Make sure you exit the child, it won't do that automatically.
Thanks to jaeheung's suggestion, I've solved using Open3.popen2 (requires version 1.9.3).
arguments = ["arg1", "arg2", "arg3", "arg4"]
require 'open3'
include Open3
def method(arg)
input, output, thread = Open3.popen2("command #{arg}")
input.close
return [thread.pid, output]
end
thread_output = []
arguments.each do |i|
thread_output << method("#{i}")
end
command1
command2
include Process
waitpid(thread_output[0][0])
command3
Process.waitall
thread_output.each do |x|
puts x[1].read
end
I have the following code:
data_set = [1,2,3,4,5,6]
results = []
data_set.each do |ds|
puts "Before fork #{ds}"
r,w = IO.pipe
if pid = Process.fork
w.close
child_result = r.read
results << child_result
else
puts "Child worker for #{ds}"
sleep(ds * 5)
r.close
w.write(ds * 2)
exit
end
end
Process.waitall
puts "Ended everything #{results}"
Basically, I want each child to do some work, and then pass the result to the parent. My code doesn't run in parallel now, and I don't know where exactly my problem lies, probably it's because I'm doing a read in the parent, but I'm not sure. What would I need to do to get it to run async?
EDIT: I changed the code to this, and it seems to work ok. Is there any problem that I'm not seeing?
data_set = [1,2,3,4,5,6]
child_pipes = []
results = []
data_set.each do |ds|
puts "Before fork #{ds}"
r,w = IO.pipe
if pid = Process.fork
w.close
child_pipes << r
else
puts "Child worker for #{ds}"
sleep(ds * 5)
r.close
w.write(ds * 2)
exit
end
end
Process.waitall
puts child_pipes.map(&:read)
It's possible for a child to block writing to the pipe to the parent if its output is larger than the pipe capacity. Ideally the parent would perform a select loop on the child pipes or spawn threads reading from the child pipes so as to consume data as it becomes available to prevent children from stalling on a full pipe and failing. In practice, if the child output is small, just doing the waitall and read will work.
Others have solved these problems in reusable ways, you might try the the parallel gem to avoid writing a bunch of unnecessary code.
I have a computation that can be divided into independent units and the way I'm dealing with it now is by creating a fixed number of threads and then handing off chunks of work to be done in each thread. So in pseudo code here's what it looks like
# main thread
work_units.take(10).each {|work_unit| spawn_thread_for work_unit}
def spawn_thread_for(work)
Thread.new do
do_some work
more_work = work_units.pop
spawn_thread_for more_work unless more_work.nil?
end
end
Basically once the initial number of threads is created each one does some work and then keeps taking stuff to be done from the work stack until nothing is left. Everything works fine when I run things in irb but when I execute the script using the interpreter things don't work out so well. I'm not sure how to make the main thread wait until all the work is finished. Is there a nice way of doing this or am I stuck with executing sleep 10 until work_units.empty? in the main thread
In ruby 1.9 (and 2.0), you can use ThreadsWait from the stdlib for this purpose:
require 'thread'
require 'thwait'
threads = []
threads << Thread.new { }
threads << Thread.new { }
ThreadsWait.all_waits(*threads)
If you modify spawn_thread_for to save a reference to your created Thread, then you can call Thread#join on the thread to wait for completion:
x = Thread.new { sleep 0.1; print "x"; print "y"; print "z" }
a = Thread.new { print "a"; print "b"; sleep 0.2; print "c" }
x.join # Let the threads finish before
a.join # main thread exits...
produces:
abxyzc
(Stolen from the ri Thread.new documentation. See the ri Thread.join documentation for some more details.)
So, if you amend spawn_thread_for to save the Thread references, you can join on them all:
(Untested, but ought to give the flavor)
# main thread
work_units = Queue.new # and fill the queue...
threads = []
10.downto(1) do
threads << Thread.new do
loop do
w = work_units.pop
Thread::exit() if w.nil?
do_some_work(w)
end
end
end
# main thread continues while work threads devour work
threads.each(&:join)
Thread.list.each{ |t| t.join unless t == Thread.current }
It seems like you are replicating what the Parallel Each (Peach) library provides.
You can use Thread#join
join(p1 = v1) public
The calling thread will suspend execution and run thr. Does not return until thr exits or until limit seconds have passed. If the time limit expires, nil will be returned, otherwise thr is returned.
Also you can use Enumerable#each_slice to iterate over the work units in batches
work_units.each_slice(10) do |batch|
# handle each work unit in a thread
threads = batch.map do |work_unit|
spawn_thread_for work_unit
end
# wait until current batch work units finish before handling the next batch
threads.each(&:join)
end