I'm new to ruby and object oriented languages, and I'm having trouble figuring out a way to accomplish forking a process inside a method and passing the delayed output to be used outside the method while also returning the process id.
def method(arg)
proc_id = fork do
var = `command #{arg}`
end
return both = [proc_id, var]
end
This doesn't work as var will return nil since the process has not yet finished. How could I accomplish something like this?
UPDATE:
Using IO.pipe I was able to accomplish Inter-Process Communication. However, trying to use this solution inside a method will not allow me to return both proc_id and var without first waiting for the process to finish which forces me to create new arrays and iterations that would be otherwise unnecessary. The objective here is to have freedom to execute code outside the method while the fork process inside the method is still working.
arg_array = ["arg1", "arg2", "arg3", "arg4"]
input = []
output = []
proc_id = []
arg_array.each_index do |i|
input[i], output[i] = IO.pipe
proc_id[i] = fork do
input[i].close
output[i].write `command #{arg_array[i]}`
end
output[i].close
end
command2
command3
include Process
waitpid(proc_id[0])
command4
Process.waitall
arg_array.each_index do |x|
puts input[x].read
end
You need to use a little more time studying the concept of fork. The parent and child process after a fork cannot communicate (exchange variables) each other without using IPC (Inter-Processs Communication) which is somewhat complicated.
But for your purpose (getting the child process id, and its output), it's easier with Open3.popen2 or Open3.popen3.
http://www.ruby-doc.org/stdlib-1.9.3/libdoc/open3/rdoc/Open3.html#method-c-popen2
if you want to kick something off and save the child pid, that's fairly simple.
pid = fork
if pid
return pid
else
system("command #{arg}")
exit
end
a little bit clumsy, but basically, fork returns the child pid to the parent process, and nil to the child process. Make sure you exit the child, it won't do that automatically.
Thanks to jaeheung's suggestion, I've solved using Open3.popen2 (requires version 1.9.3).
arguments = ["arg1", "arg2", "arg3", "arg4"]
require 'open3'
include Open3
def method(arg)
input, output, thread = Open3.popen2("command #{arg}")
input.close
return [thread.pid, output]
end
thread_output = []
arguments.each do |i|
thread_output << method("#{i}")
end
command1
command2
include Process
waitpid(thread_output[0][0])
command3
Process.waitall
thread_output.each do |x|
puts x[1].read
end
Related
I have the following code snippet (a simplified representation of what I'm trying to do - training wheels). The sleep(2) would represent some network operation in my real code:
arr = []
5.times do |i|
rd, wr = IO.pipe
fork do
sleep(2) # I'm still waiting for the sleep to happen on each process ... not good, there is no parallelism here
rd.close
arr[i] = i
Marshal.dump(arr, wr)
end
wr.close
result = rd.read
arr = Marshal.load(result)
end
# Process.waitall
p arr
Q: is it possible to somehow create new processes in a loop, pass the results back but not waiting on each iteration. I'm pretty rusty and don't know / remember a great deal about IPC ... especially in Ruby.
Actual result is wait time of 2s*5 = 10s
Expected ~2s tootal (async processing of the sleep())
So a good comment clarifying things, explaining the theory would help a lot. Thanks.
In your loop you wait for each child process to write its results to the pipe before starting the next iteration.
The simplest fix would be to save the read ends of the pipes in an array and don’t read any of them until the loop is finished and you’ve started all the child processes:
arr = []
# array to store the IOs
pipes = []
5.times do |i|
rd, wr = IO.pipe
fork do
sleep(2)
rd.close
# Note only returning the value of i here, not the whole array
Marshal.dump(i, wr)
end
wr.close
#store the IO for later
pipes[i] = rd
end
# Now all child processes are started, we can read the results in turn
# Remember each child is returng i, not the whole array
pipes.each_with_index do |rd, i|
arr[i] = Marshal.load(rd.read)
end
A more complex solution if the wait/network times for different child processes variad might be to look at select, so you could read from whichever pipe was ready first.
I'm a frontend developer, somewhat familiar with Ruby. I only know how to do Ruby in a synchronous/sequential manner, while in JS i'm used to async/non-blocking callbacks.
Here's sample Ruby code:
results = []
rounds = 5
callback = ->(item) {
# This imitates that the callback may take time to complete
sleep rand(1..5)
results.push item
if results.size == rounds
puts "All #{rounds} requests have completed! Here they are:", *results
end
}
1.upto(rounds) { |item| callback.call(item) }
puts "Hello"
The goal is to have the callbacks run without blocking main script execution. In other words, i want "Hello" line to appear in output above the "All 5 requests..." line. Also, the callbacks should run concurrently, so that the callback fastest to finish makes it into the resulting array first.
With JavaScript, i would simply wrap the callback call into a setTimeout with zero delay:
setTimeout( function() { callback(item); }, 0);
This JS approach does not implement true multithreading/concurrency/parallel execution. Under the hood, the callbacks would run all in one thread sequentially, or rather interlaced on the low level.
But on practical level it would appear as concurrent execution: the resulting array would be populated in an order corresponding to the amount of time spent by each callback, i. e. the resulting array would appear sorted by the time it took each callback to finish.
Note that i only want the asynchronous feature of setTimeout(). I don't need the sleep feature built into setTimeout() (not to be confused with a sleep used in the callback example to imitate a time-consuming operation).
I tried to inquire into how to do that JS-style async approach with Ruby and was given suggestions to use:
Multithreading. This is probably THE approach for Ruby, but it requires a substantial amount of scaffolding:
Manually define an array for threads.
Manually define a mutex.
Start a new thread for each callback, add it to the array.
Pass the mutex into each callback.
Use mutex in the callback for thread synchronization.
Ensure all threads are completed before program completion.
Compared to JavaScript's setTimeout(), this is just too much. As i don't need true parallel execution, i don't want to build that much scaffolding every time i want to execute a proc asynchronously.
A sophisticated Ruby library like Celluloid and Event Machine. They look like it will take weeks to learn them.
A custom solution like this one (the author, apeiros#freenode, claims it to be very close to what setTimeout does under the hood). It requires almost no scaffolding to build and it does not involve threads. But it seems to run callbacks synchronously, in the order they've been executed.
I have always considered Ruby to be a programming language most close to my ideal, and JS to be a poor man's programming language. And it kinda discourages me that Ruby is not able to do a thing which is trivial with JS, without involving heavy machinery.
So the question is: what is the simplest, most intuitive way to do do async/non-blocking callback with Ruby, without involving complicated machinery like threads or complex libraries?
PS If there will be no satisfying answer during the bounty period, i will dig into #3 by apeiros and probably make it the accepted answer.
Like people said, it's not possible to achieve what you want without using Threads or a library that abstracts their functionality. But, if it's just the setTimeout functionality you want, then the implementation is actually very small.
Here's my attempt at emulating Javascript's setTimeout in ruby:
require 'thread'
require 'set'
module Timeout
#timeouts = Set[]
#exiting = false
#exitm = Mutex.new
#mutex = Mutex.new
at_exit { wait_for_timeouts }
def self.set(delay, &blk)
thrd = Thread.start do
sleep delay
blk.call
#exitm.synchronize do
unless #exiting
#mutex.synchronize { #timeouts.delete thrd }
end
end
end
#mutex.synchronize { #timeouts << thrd }
end
def self.wait_for_timeouts
#exitm.synchronize { #exiting = true }
#timeouts.each(&:join)
#exitm.synchronize { #exiting = false }
end
end
Here's how to use it:
$results = []
$rounds = 5
mutex = Mutex.new
def callback(n, mutex)
-> {
sleep rand(1..5)
mutex.synchronize {
$results << n
puts "Fin: #{$results}" if $results.size == $rounds
}
}
end
1.upto($rounds) { |i| Timeout.set(0, &callback(i, mutex)) }
puts "Hello"
This outputs:
Hello
Fin: [1, 2, 3, 5, 4]
As you can see, the way you use it is essentially the same, the only thing I've changed is I've added a mutex to prevent race conditions on the results array.
Aside: Why we need the mutex in the usage example
Even if javascript is only running on a single core, that does not prevent race conditions due to atomicity of operations. Pushing to an array is not an atomic operation, so more than one instruction is executed.
Suppose it is two instructions, putting the element at the end, and incrementing the size. (SET, INC).
Consider all the ways two pushes can be interleaved (taking symmetry into account):
SET1 INC1 SET2 INC2
SET1 SET2 INC1 INC2
The first one is what we want, but the second one results in the second append overwriting the first.
Okay, after some fiddling with threads and studying contributions by apeiros and asQuirreL, i came up with a solution that suits me.
I'll show sample usage first, source code in the end.
Example 1: simple non-blocking execution
First, a JS example that i'm trying to mimic:
setTimeout( function() {
console.log("world");
}, 0);
console.log("hello");
// 'Will print "hello" first, then "world"'.
Here's how i can do it with my tiny Ruby library:
# You wrap all your code into this...
Branch.new do
# ...and you gain access to the `branch` method that accepts a block.
# This block runs non-blockingly, just like in JS `setTimeout(callback, 0)`.
branch { puts "world!" }
print "Hello, "
end
# Will print "Hello, world!"
Note how you don't have to take care of creating threads, waiting for them to finish. The only scaffolding required is the Branch.new { ... } wrapper.
Example 2: synchronizing threads with a mutex
Now we'll assume that we're working with some input and output shared among threads.
JS code i'm trying to reproduce with Ruby:
var
results = [],
rounds = 5;
for (var i = 1; i <= rounds; i++) {
console.log("Starting thread #" + i + ".");
// "Creating local scope"
(function(local_i) {
setTimeout( function() {
// "Assuming there's a time-consuming operation here."
results.push(local_i);
console.log("Thread #" + local_i + " has finished.");
if (results.length === rounds)
console.log("All " + rounds + " threads have completed! Bye!");
}, 0);
})(i);
}
console.log("All threads started!");
This code produces the following output:
Starting thread #1.
Starting thread #2.
Starting thread #3.
Starting thread #4.
Starting thread #5.
All threads started!
Thread #5 has finished.
Thread #4 has finished.
Thread #3 has finished.
Thread #2 has finished.
Thread #1 has finished.
All 5 threads have completed! Bye!
Notice that the callbacks finish in reverse order.
We're also gonna assume that working the results array may produce a race condition. In JS this is never an issue, but in multithreaded Ruby this has to be addressed with a mutex.
Ruby equivalent of the above:
Branch.new 1 do
# Setting up an array to be filled with that many values.
results = []
rounds = 5
# Running `branch` N times:
1.upto(rounds) do |item|
puts "Starting thread ##{item}."
# The block passed to `branch` accepts a hash with mutexes
# that you can use to synchronize threads.
branch do |mutexes|
# This imitates that the callback may take time to complete.
# Threads will finish in reverse order.
sleep (6.0 - item) / 10
# When you need a mutex, you simply request one from the hash.
# For each unique key, a new mutex will be created lazily.
mutexes[:array_and_output].synchronize do
puts "Thread ##{item} has finished!"
results.push item
if results.size == rounds
puts "All #{rounds} threads have completed! Bye!"
end
end
end
end
puts "All threads started."
end
puts "All threads finished!"
Note how you don't have to take care of creating threads, waiting for them to finish, creating mutexes and passing them into the block.
Example 3: delaying execution of the block
If you need the delay feature of setTimeout, you can do it like this.
JS:
setTimeout(function(){ console.log('Foo'); }, 2000);
Ruby:
branch(2) { puts 'Foo' }
Example 4: waiting for all threads to finish
With JS, there's no simple way to have the script wait for all threads to finish. You'll need an await/defer library for that.
But in Ruby it's possible, and Branch makes it even simpler. If you write code after the Branch.new{} wrapper, it will be executed after all branches within the wrapper have been completed. You don't need to manually ensure that all threads have finished, Branch does that for you.
Branch.new do
branch { sleep 10 }
branch { sleep 5 }
# This will be printed immediately
puts "All threads started!"
end
# This will be printed after 10 seconds (the duration of the slowest branch).
puts "All threads finished!"
Sequential Branch.new{} wrappers will be executed sequentially.
Source
# (c) lolmaus (Andrey Mikhaylov), 2014
# MIT license http://choosealicense.com/licenses/mit/
class Branch
def initialize(mutexes = 0, &block)
#threads = []
#mutexes = Hash.new { |hash, key| hash[key] = Mutex.new }
# Executing the passed block within the context
# of this class' instance.
instance_eval &block
# Waiting for all threads to finish
#threads.each { |thr| thr.join }
end
# This method will be available within a block
# passed to `Branch.new`.
def branch(delay = false, &block)
# Starting a new thread
#threads << Thread.new do
# Implementing the timeout functionality
sleep delay if delay.is_a? Numeric
# Executing the block passed to `branch`,
# providing mutexes into the block.
block.call #mutexes
end
end
end
I wonder how can I interact with never-ending(eternal looping) child process.
source code of loop_puts.rb, child process :
loop do
str = gets
puts str.upcase
end
main.rb :
Process.spawn("ruby loop_puts.rb",{:out=>$stdout, :in=>$stdin})
I want to put some letter, not by my hand typing, and get result(not previous result) in variable.
how can I do this?
thanks
There are a number of ways to do this and it's hard to recommend one without more context.
Here's one way using a forked process and a pipe:
# When given '-' as the first param, IO#popen forks a new ruby interpreter.
# Both parent and child processes continue after the return to the #popen
# call which returns an IO object to the parent process and nil to the child.
pipe = IO.popen('-', 'w+')
if pipe
# in the parent process
%w(please upcase these words).each do |s|
STDERR.puts "sending: #{s}"
pipe.puts s # pipe communicates with the child process
STDERR.puts "received: #{pipe.gets}"
end
pipe.puts '!quit' # a custom signal to end the child process
else
# in the child process
until (str = gets.chomp) == '!quit'
# std in/out here are connected to the parent's pipe
puts str.upcase
end
end
Some documentation for IO#popen here. Note that this may not work on all platforms.
Other possible ways to approach this include Named Pipes, drb, and message queues.
I main function with a basic loop inside it. I want to fire off a child process for every iteration of the loop (that goes off doing an HTTP request, more on that later).
If I am using processes, my problem is that it looks like each child process continues the execution of the main thread, whereas I want only the main process to go on after the loop, and the children to die after the HTTP req is finished. Main process is not interested in each child to finish before continuing.
Looks something like this now:
data.each do |k, v|
(pid = fork) ? Process.detach(pid) : doHttpQuery(v + ":" + "k")
end
# code after this comment should only get executed once
Also, when the processes finish, I get this
thread.rb:189:in `sleep': deadlock detected (fatal)
If I use threads like this
threads << Thread.new { doHttpQuery(v + ":" + "k")}
and then
threads.each { |thr| thr.join }
The threads are fired, but for some reason it is not actually doing the HTTP request, and the whole process just comes to a halt.
The child process must call exit or exit! to stop executing:
data.each do |k, v|
if pid = fork
Process.detach(pid)
doHttpQuery(v + ":" + "k")
exit
end
end
The difference between exit and exit! is
exit runs at_exit functions. Its default exit code is 0.
exit! does not run at_exit functions. Its default exit code is 1.
I have the following code:
data_set = [1,2,3,4,5,6]
results = []
data_set.each do |ds|
puts "Before fork #{ds}"
r,w = IO.pipe
if pid = Process.fork
w.close
child_result = r.read
results << child_result
else
puts "Child worker for #{ds}"
sleep(ds * 5)
r.close
w.write(ds * 2)
exit
end
end
Process.waitall
puts "Ended everything #{results}"
Basically, I want each child to do some work, and then pass the result to the parent. My code doesn't run in parallel now, and I don't know where exactly my problem lies, probably it's because I'm doing a read in the parent, but I'm not sure. What would I need to do to get it to run async?
EDIT: I changed the code to this, and it seems to work ok. Is there any problem that I'm not seeing?
data_set = [1,2,3,4,5,6]
child_pipes = []
results = []
data_set.each do |ds|
puts "Before fork #{ds}"
r,w = IO.pipe
if pid = Process.fork
w.close
child_pipes << r
else
puts "Child worker for #{ds}"
sleep(ds * 5)
r.close
w.write(ds * 2)
exit
end
end
Process.waitall
puts child_pipes.map(&:read)
It's possible for a child to block writing to the pipe to the parent if its output is larger than the pipe capacity. Ideally the parent would perform a select loop on the child pipes or spawn threads reading from the child pipes so as to consume data as it becomes available to prevent children from stalling on a full pipe and failing. In practice, if the child output is small, just doing the waitall and read will work.
Others have solved these problems in reusable ways, you might try the the parallel gem to avoid writing a bunch of unnecessary code.