How do I see the output of multiple forked processes simultaneously - ruby

I am using the "fork" option in ruby as follows:
pid1 = fork do
pid1_output = `ruby scrape1.rb`
puts "#{pid1_output}"
puts ""
exit
end
pid2 = fork do
pid2_output = `ruby scrape2.rb`
puts "#{pid2_output}"
puts ""
exit
end
pid3 = fork do
pid3_output = `ruby scrape3.rb`
puts "#{pid3_output}"
puts ""
exit
end
pid4 = fork do
pid4_output = `ruby scrape4.rb`
puts "#{pid4_output}"
puts ""
exit
end
Process.waitall
The problem here is that sometimes one of the processes (eg: ruby scrape1.rb) might fail or end up returning ginormous amounts of text that cannot be captured in a variable... How do I still simultaneously run 4 processes and see all their outputs in one terminal window in realtime? I understand the order of output might be mushed up but that is alright.. I basically want to re-route the STDOUT and STDERR of each forked process to the main program.. That way I can see what is being scraped by each of my scrapers and follow their progress and errors as they happen.

fork do
exec("ruby scrape1.rb")
end
fork do
exec("ruby scrape2.rb")
end
fork do
exec("ruby scrape3.rb")
end
fork do
exec("ruby scrape4.rb")
end
Process.waitall

Related

How to terminate a child process as part of terminating the thread that created it

I have a Ruby application that spawns a thread on-demand which in turn does a system call to execute a native binary.
I want to abort this call before the native call completes.
I tried using all options the Thread documentation provided, like kill, raise and terminate, but nothing seems to help.
This is what I'm trying to do:
class Myserver < Grape::API
##thr = nil
post "/start" do
puts "Starting script"
##thr = Thread.new do
op=`sh chumma_sh.sh`
puts op
end
puts ##thr.status
end
put "/stop" do
##thr.terminate
##thr.raise
Thread.kill(##thr)
puts ##thr.status
end
end
The thread appears to enter a sleep state as an IO operation is in process, but how do I kill the thread so that all child processes it created are terminated and not attached to root.
Doing ps-ef | grep for the script returns the pid, and I could try Process.kill pid but wanted to know if there are better options.
I don't have the option at this moment of modifying how the script is executed as it is part of an inherited library.
Using ps is the only approach I've found that works. If you also want to kill child threads, you could use something like this:
def child_pids_recursive(pid)
# get children
pipe = IO.popen("ps -ef | grep #{pid}")
child_pids = pipe.readlines.map do |line|
parts = line.split(/\s+/)
parts[2] if parts[3] == pid.to_s && parts[2] != pipe.pid.to_s
end.compact
pipe.close
# get grandchildren
grandchild_pids = child_pids.map do |cpid|
child_pids_recursive(cpid)
end.flatten
child_pids + grandchild_pids
end
def kill_all(pid)
child_pids_recursive(pid).reverse.each do |p|
begin
Process.kill('TERM', p.to_i)
rescue
# ignore
end
end
end

Sinatra 1.3 Streaming w/ Ruby stdout redirection

I would like to use Sinatra's Streaming capability introduced in 1.3 coupled with some stdout redirection. It would basically be a live streaming output of a long running job. I looked into this question and the Sinatra streaming sample in the README.
Running 1.8.7 on OSX:
require 'stringio'
require 'sinatra'
$stdout.sync = true
module Kernel
def capture_stdout
out = StringIO.new
$stdout = out
yield out
ensure
$stdout = STDOUT
end
end
get '/' do
stream do |out|
out << "Part one of a three part series... <br>\n"
sleep 1
out << "...part two... <br>\n"
sleep 1
out << "...and now the conclusion...\n"
Kernel.capture_stdout do |stream|
Thread.new do
until (line = stream.gets).nil? do
out << line
end
end
method_that_prints_text
end
end
end
def method_that_prints_text
puts "starting long running job..."
sleep 3
puts "almost there..."
sleep 3
puts "work complete!"
end
So this bit of code prints out the first three strings properly, and blocks while the method_that_prints_text executes and does not print anything to the browser. My feeling is that stdout is empty on the first call and it never outputs to the out buffer. I'm not quite sure what the proper ordering would be and would appreciate any suggestions.
I tried a few of the EventMachine implementations mentioned in the question above, but couldn't get them to work.
UPDATE
I tried something slightly different to where I had the method run in a new thread, and override STDOUT for that thread as described here...
Instead of Kernel.capture_stdout above...
s = StringIO.new
Thread.start do
Thread.current[:stdout] = s
method_that_prints_text
end.join
while line = s.gets do
out << line
end
out << s.string
With the ThreadOut module listed in the link above, this seems to work a bit better. However it doesn't stream. The only time something is printed to the browser is on the final line out << s.string. Does StringIO not have the capability to stream?
I ended up solving this by discovering that s.string was updated periodically as time went on, so I just captured the output in a separate thread and grabbed the differences and streamed them out. It appears as though string redirection doesn't behave like a normal IO object.
s = StringIO.new
t = Thread.start do
Thread.current[:stdout] = s
method_that_prints_text
sleep 2
end
displayed_text = ''
while t.alive? do
current_text = s.string
unless (current_text.eql?(displayed_text))
new_text = current_text[displayed_text.length..current_text.length]
out << new_text
displayed_text = current_text * 1
end
sleep 2
end

Change STDIN with a pipe and it's a directory

I have this
pipe_in, pipe_out = IO.pipe
fork do
# child 1
pipe_in.close
STDOUT.reopen pipe_out
STDERR.reopen pipe_out
puts "Hello World"
pipe_out.close
end
fork do
# child 2
pipe_out.close
STDIN.reopen pipe_in
while line = gets
puts 'child2:' + line
end
pipe_in.close
end
Process.wait
Process.wait
get will always raise an error saying "gets: Is a directory", which doesn't make sense to me. If I change gets to pipe_in.gets it works. What I want to know is, why doesn't STDIN.reopen pipe_in and gets not work?
It works for me, with the following change:
pipe_in.close
end
+pipe_in.close
+pipe_out.close
+
Process.wait
Process.wait
Without this change, you still have the pipes open in the original process, so the reader will never see an end of file. That is, process doing the wait still had the write pipe open leading to a deadlock.

Why is IO::WaitReadable being raised differently for STDOUT than STDERR?

Given that I wish to test non-blocking reads from a long command, I created the following script, saved it as long, made it executable with chmod 755, and placed it in my path (saved as ~/bin/long where ~/bin is in my path).
I am on a *nix variant with ruby 1.9.2p290 (2011-07-09 revision 32553) [x86_64-darwin11.0.0] compiled with RVM defaults. I do not use Windows, and am therefore unsure if the test script will work for you if you do.
#!/usr/bin/env ruby
3.times do
STDOUT.puts 'message on stdout'
STDERR.puts 'message on stderr'
sleep 1
end
Why does long_err produce each STDERR message as it is printed by "long"
def long_err( bash_cmd = 'long', maxlen = 4096)
stdin, stdout, stderr = Open3.popen3(bash_cmd)
begin
begin
puts 'err -> ' + stderr.read_nonblock(maxlen)
end while true
rescue IO::WaitReadable
IO.select([stderr])
retry
rescue EOFError
puts 'EOF'
end
end
while long_out remains blocked until all STDOUT messages are printed?
def long_out( bash_cmd = 'long', maxlen = 4096)
stdin, stdout, stderr = Open3.popen3(bash_cmd)
begin
begin
puts 'out -> ' + stdout.read_nonblock(maxlen)
end while true
rescue IO::WaitReadable
IO.select([stdout])
retry
rescue EOFError
puts 'EOF'
end
end
I assume you will require 'open3' before testing either function.
Why is IO::WaitReadable being raised differently for STDOUT than STDERR?
Workarounds using other ways to start subprocesses also appreciated if you have them.
In most OS's STDOUT is buffered while STDERR is not. What popen3 does is basically open a pipe between the exeutable you launch and Ruby.
Any output that is in buffered mode is not sent through this pipe until either:
The buffer is filled (thereby forcing a flush).
The sending application exits (EOF is reached, forcing a flush).
The stream is explicitly flushed.
The reason STDERR is not buffered is that it's usually considered important for error messages to appear instantly, rather than go for for efficiency through buffering.
So, knowing this, you can emulate STDERR behaviour with STDOUT like this:
#!/usr/bin/env ruby
3.times do
STDOUT.puts 'message on stdout'
STDOUT.flush
STDERR.puts 'message on stderr'
sleep 1
end
and you will see the difference.
You might also want to check "Understanding Ruby and OS I/O buffering".
Here's the best I've got so far for starting subprocesses. I launch a lot of network commands so I needed a way to time them out if they take too long to come back. This should be handy in any situation where you want to remain in control of your execution path.
I adapted this from a Gist, adding code to test the exit status of the command for 3 outcomes:
Successful completion (exit status 0)
Error completion (exit status is non-zero) - raises an exception
Command timed out and was killed - raises an exception
Also fixed a race condition, simplified parameters, added a few more comments, and added debug code to help me understand what was happening with exits and signals.
Call the function like this:
output = run_with_timeout("command that might time out", 15)
output will contain the combined STDOUT and STDERR of the command if it completes successfully. If the command doesn't complete within 15 seconds it will be killed and an exception raised.
Here's the function (2 constants you'll need defined at the top):
DEBUG = false # change to true for some debugging info
BUFFER_SIZE = 4096 # in bytes, this should be fine for many applications
def run_with_timeout(command, timeout)
output = ''
tick = 1
begin
# Start task in another thread, which spawns a process
stdin, stderrout, thread = Open3.popen2e(command)
# Get the pid of the spawned process
pid = thread[:pid]
start = Time.now
while (Time.now - start) < timeout and thread.alive?
# Wait up to `tick' seconds for output/error data
Kernel.select([stderrout], nil, nil, tick)
# Try to read the data
begin
output << stderrout.read_nonblock(BUFFER_SIZE)
puts "we read some data..." if DEBUG
rescue IO::WaitReadable
# No data was ready to be read during the `tick' which is fine
print "." # give feedback each tick that we're waiting
rescue EOFError
# Command has completed, not really an error...
puts "got EOF." if DEBUG
# Wait briefly for the thread to exit...
# We don't want to kill the process if it's about to exit on its
# own. We decide success or failure based on whether the process
# completes successfully.
sleep 1
break
end
end
if thread.alive?
# The timeout has been reached and the process is still running so
# we need to kill the process, because killing the thread leaves
# the process alive but detached.
Process.kill("TERM", pid)
end
ensure
stdin.close if stdin
stderrout.close if stderrout
end
status = thread.value # returns Process::Status when process ends
if DEBUG
puts "thread.alive?: #{thread.alive?}"
puts "status: #{status}"
puts "status.class: #{status.class}"
puts "status.exited?: #{status.exited?}"
puts "status.exitstatus: #{status.exitstatus}"
puts "status.signaled?: #{status.signaled?}"
puts "status.termsig: #{status.termsig}"
puts "status.stopsig: #{status.stopsig}"
puts "status.stopped?: #{status.stopped?}"
puts "status.success?: #{status.success?}"
end
# See how process ended: .success? => true, false or nil if exited? !true
if status.success? == true # process exited (0)
return output
elsif status.success? == false # process exited (non-zero)
raise "command `#{command}' returned non-zero exit status (#{status.exitstatus}), see below output\n#{output}"
elsif status.signaled? # we killed the process (timeout reached)
raise "shell command `#{command}' timed out and was killed (timeout = #{timeout}s): #{status}"
else
raise "process didn't exit and wasn't signaled. We shouldn't get to here."
end
end
Hope this is useful.

How to proxy a shell process in ruby

I'm creating a script to wrap jdb (java debugger). I essentially want to wrap this process and proxy the user interaction. So I want it to:
start jdb from my script
send the output of jdb to stdout
pause and wait for input when jdb does
when the user enters commands, pass it to jdb
At the moment I really want a pass thru to jdb. The reason for this is to initialize the process with specific parameters and potentially add more commands in the future.
Update:
Here's the shell of what ended up working for me using expect:
PTY.spawn("jdb -attach 1234") do |read,write,pid|
write.sync = true
while (true) do
read.expect(/\r\r\n> /) do |s|
s = s[0].split(/\r\r\n/)
s.pop # get rid of prompt
s.each { |line| puts line }
print '> '
STDOUT.flush
write.print(STDIN.gets)
end
end
end
Use Open3.popen3(). e.g.:
Open3.popen3("jdb args") { |stdin, stdout, stderr|
# stdin = jdb's input stream
# stdout = jdb's output stream
# stderr = jdb's stderr stream
threads = []
threads << Thread.new(stderr) do |terr|
while (line = terr.gets)
puts "stderr: #{line}"
end
end
threads << Thread.new(stdout) do |terr|
while (line = terr.gets)
puts "stdout: #{line}"
end
end
stdin.puts "blah"
threads.each{|t| t.join()} #in order to cleanup when you're done.
}
I've given you examples for threads, but you of course want to be responsive to what jdb is doing. The above is merely a skeleton for how you open the process and handle communication with it.
The Ruby standard library includes expect, which is designed for just this type of problem. See the documentation for more information.

Resources