How can we lock an IO that has been shared by multiple ruby process?
Consider this script:
#!/usr/bin/ruby -w
# vim: ts=2 sw=2 et
if ARGV.length != 2
$stderr.puts "Usage: test-io-fork.rb num_child num_iteration"
exit 1
end
CHILD = ARGV[0].to_i
ITERATION = ARGV[1].to_i
def now
t = Time.now
"#{t.strftime('%H:%M:%S')}.#{t.usec}"
end
MAP = %w(nol satu dua tiga empat lima enam tujuh delapan sembilan)
IO.popen('-', 'w') {|pipe|
unless pipe
# Logger child
File.open('test-io-fork.log', 'w') {|log|
log.puts "#{now} Program start"
$stdin.each {|line|
log.puts "#{now} #{line}"
}
log.puts "#{now} Program end"
}
exit!
end
pipe.sync = true
pipe.puts "Before fork"
CHILD.times {|c|
fork {
pid = Process.pid
srand
ITERATION.times {|i|
n = rand(9)
sleep(n / 100000.0)
pipe.puts "##{c}:#{i} #{MAP[n]} => #{n}, #{n} => #{MAP[n]} ##{c}:#{i}"
}
}
}
}
And try it like this:
./test-io-fork.rb 200 50
Like expected, the test-io-fork.log files would contains sign of IO race condition.
What I want to achieve is to make a TCP server for custom GPS protocol that will save the GPS points to database. Because this server would handle 1000 concurrent clients, I would like to restrict database connection to only one child instead opening 1000 database connection simultaneously. This server would run on linux.
UPDATE
It may be bad form to update after the answer was accepted, but the original is a bit misleading. Whether or not ruby makes a separate write(2) call for the automatically-appended newline is dependent upon the buffering state of the output IO object.
$stdout (when connected to a tty) is generally line-buffered, so the effect of a puts() -- given reasonably sized string -- with implicitly added newline is a single call to write(2). Not so, however, with IO.pipe and $stderr, as the OP discovered.
ORIGINAL ANSWER
Change your chief pipe.puts() argument to be a newline terminated string:
pipe.puts "##{c} ... #{i}\n" # <-- note the newline
Why? You set pipe.sync hoping that the pipe writes would be atomic and non-interleaved, since they are (presumably) less than PIPE_BUF bytes. But it didn't work, because ruby's pipe puts() implementation makes a separate call to write(2) to append the trailing newline, and that's why your writes are sometimes interleaved where you expected a newline.
Here's a corroborating excerpt from a fork-following strace of your script:
$ strace -s 2048 -fe trace=write ./so-1326067.rb
....
4574 write(4, "#0:12 tiga => 3, 3 => tiga #0:12", 32) = 32
4574 write(4, "\n", 1)
....
But putting in your own newline solves the problem, making sure that your entire record is transmitted in one syscall:
....
5190 write(4, "#194:41 tujuh => 7, 7 => tujuh #194:41\n", 39 <unfinished ...>
5179 write(4, "#183:38 enam => 6, 6 => enam #183:38\n", 37 <unfinished ...>
....
If for some reason that cannot work for you, you'll have to coordinate an interprocess mutex (like File.flock()).
Related
I'm using a command line program, it works as mentioned below:
$ ROUTE_TO_FOLDER/app < "long text"
If "long text" is written using the parameters "app" needs, then it will fill a text file with results. If not, it will fill the text file with dots continuously (I can't handle or modify the code of "app" in order to avoid this).
In a ruby script there's a line like this:
text = "long text that will be used by app"
output = system("ROUTE_TO_FOLDER/app < #{text}")
Now, if text is well written, there won't be problems and I will get an output file as mentioned before. The problem comes when text is not well written. What happens next is that my ruby script hangs and I'm not sure how to kill it.
I've found Open3 and I've used the method like this:
irb> cmd = "ROUTE_TO_FOLDER/app < #{text}"
irb> stdin, stdout, stderr, wait_thr = Open3.popen3(cmd)
=> [#<IO:fd 10>, #<IO:fd 11>, #<IO:fd 13>, #<Thread:0x007f3a1a6f8820 run>]
When I do:
irb> wait_thr.value
it also hangs, and :
irb> wait_thr.status
=> "sleep"
How can I avoid these problems? Is it not recognizing that "app" has failed?
wait_thr.pid provides you the pid of the started process. Just do
Process.kill("KILL",wait_thr.pid)
when you need to kill it.
You can combine it with detecting if the process is hung (continuously outputs dots) in one of the two ways.
1) Set a timeout for waiting for the process:
get '/process' do
text = "long text that will be used by app"
cmd = "ROUTE_TO_FOLDER/app < #{text}"
Open3.popen3(cmd) do |i,o,e,w|
begin
Timeout.timeout(10) do # timeout set to 10 sec, change if needed
# process output of the process. it will produce EOF when done.
until o.eof? do
# o.read_nonblock(N) ...
end
end
rescue Timeout::Error
# here you know that the process took longer than 10 seconds
Process.kill("KILL", w.pid)
# do whatever other error processing you need
end
end
end
2) Check the process output. (The code below is simplified - you probably don't want to read the output of your process into a single String buf first and then process, but I guess you get the idea).
get '/process' do
text = "long text that will be used by app"
cmd = "ROUTE_TO_FOLDER/app < #{text}"
Open3.popen3(cmd) do |i,o,e,w|
# process output of the process. it will produce EOF when done.
# If you get 16 dots in a row - the process is in the continuous loop
# (you may want to deal with stderr instead - depending on where these dots are sent to)
buf = ""
error = false
until o.eof? do
buf << o.read_nonblock(16)
if buf.size>=16 && buf[-16..-1] == '.'*16
# ok, the process is hung
Process.kill("KILL", w.pid)
error = true
# you should also get o.eof? the next time you check (or after flushing the pipe buffer),
# so you will get out of the until o.eof? loop
end
end
if error
# do whatever error processing you need
else
# process buf, it contains all the output
end
end
end
I wrote this code to run my process in a daemon. The goal is to make this process running even if I close its parent. Now, i would like to be able to write something in its stdin. What should I do ? Here's the code.
def daemonize(cmd, options = {})
rd, wr = IO.pipe
p1 = Process.fork {
Process.setsid
p2 = Process.fork {
$0 = cmd #Name of the command
pidfile = File.new(options[:pid_file], 'w')
pidfile.chmod( 0644 )
pidfile.puts "#{Process.pid}"
pidfile.close
Dir.chdir(ENV["PWD"] = options[:working_dir].to_s) if options[:working_dir]
File.umask 0000
STDIN.reopen '/dev/null'
STDOUT.reopen '/dev/null', 'a'
STDERR.reopen STDOUT
Signal.trap("USR1") do
Console.show 'I just received a USR1', 'warning'
end
::Kernel.exec(*Shellwords.shellwords(cmd)) #Executing the command in the parent process
exit
}
raise 'Fork failed!' if p2 == -1
Process.detach(p2) # divorce p2 from parent process (p1)
rd.close
wr.write p2
wr.close
exit
}
raise 'Fork failed!' if p1 == -1
Process.detach(p1) # divorce p1 from parent process (shell)
wr.close
daemon_id = rd.read.to_i
rd.close
daemon_id
end
Is there a way to reopen stdin in something like a pipe instead of /dev/null in which I would be able to write ?
How about a fifo? In linux, you can use the mkfifo command:
$ mkfifo /tmp/mypipe
Then you can reopen STDIN on that pipe:
STDIN.reopen '/tmp/mypipe'
# Do read-y things
Anything else can write to that pipe:
$ echo "roflcopter" > /tmp/mypipe
allowing that data to be read by the ruby process.
(Update) Caveat with blocking
Since fifos block until there's a read and write (e.g. a read is blocked unless there's a write, and vice-versa), it's best handled with multiple threads. One thread should do the reading, passing the data to a queue, and another should handle that input. Here's an example of that situation:
require 'thread'
input = Queue.new
threads = []
# Read from the fifo and add to an input queue (glorified array)
threads << Thread.new(input) do |ip|
STDIN.reopen 'mypipe'
loop do
if line = STDIN.gets
puts "Read: #{line}"
ip.push line
end
end
end
# Handle the input passed by the reader thread
threads << Thread.new(input) do |ip|
loop do
puts "Ouput: #{ip.pop}"
end
end
threads.map(&:join)
I'm aware that there are great gems like Parallel, but I came up with the class below as an exercise.
It's working fine, but when doing a lot of iterations it happens sometimes that Ruby will get "stuck". When pressing CTRL+C I can see from the backtrace that it's always in lines 38 or 45 (the both Marshal lines).
Can you see anything that is wrong here? It seems to be that the Pipes are "hanging", so I thought I might be using them in a wrong way.
My goal was to iterate through an array (which I pass as 'objects') with a limited number of forks (max_forks) and to return some values. Additionally I wanted to guarantee that all childs get killed when the parent gets killed (even in case of kill -9), this is why I introduced the "life_line" Pipe (I've read here on Stackoverflow that this might do the trick).
class Parallel
def self.do_fork(max_forks, objects)
waiter_threads = []
fork_counter = []
life_line = {}
comm_line = {}
objects.each do |object|
key = rand(24 ** 24).to_s(36)
sleep(0.01) while fork_counter.size >= max_forks
if fork_counter.size < max_forks
fork_counter << true
life_line[key] = {}
life_line[key][:r], life_line[key][:w] = IO.pipe
comm_line[key] = {}
comm_line[key][:r], comm_line[key][:w] = IO.pipe
pid = fork {
life_line[key][:w].close
comm_line[key][:r].close
Thread.new {
begin
life_line[key][:r].read
rescue SignalException, SystemExit => e
raise e
rescue Exception => e
Kernel.exit
end
}
Marshal.dump(yield(object), comm_line[key][:w]) # return yield
}
waiter_threads << Thread.new {
Process.wait(pid)
comm_line[key][:w].close
reply = Marshal.load(comm_line[key][:r])
# process reply here
comm_line[key][:r].close
life_line[key][:r].close
life_line[key][:w].close
life_line[key] = nil
fork_counter.pop
}
end
end
waiter_threads.each { |k| k.join } # wait for all threads to finish
end
end
The bug was this:
A pipe can handle only a certain amount of data (e.g. 64 KB).
Once you write more than that, the Pipe will get "stuck" forever.
An easy solution is to read the pipe in a thread before you start writing to it.
comm_line = IO.pipe
# Buffered Pipe Reading (in case bigger than 64 KB)
reply = ""
read_buffer = Thread.new {
while !comm_line[0].eof?
reply = Marshal.load(comm_line[0])
end
}
child_pid = fork {
comm_line[0].close
comm_line[0].write "HUGE DATA LARGER THAN 64 KB"
}
Process.wait(child_pid)
comm_line[1].close
read_buffer.join
comm_line[0].close
puts reply # outputs the "HUGE DATA"
I don't think the problem is with Marshal. The more obvious one seems to be that your fork may finish execution before the waiter thread gets to it (leading to the latter to wait forever).
Try changing Process.wait(pid) to Process.wait(pid, Process::WNOHANG). The Process::WNOHANG flag instructs Ruby to not hang if there are no children (matching the given PID, if any) available. Note that this may not be available on all platforms but at the very least should work on Linux.
There's a number of other potential problems with your code but if you just came up with it "as an exercise", they probably don't matter. For example, Marshal.load does not like to encounter EOFs, so I'd probably guard against those by saying something like Marshal.load(comm_line[key][:r]) unless comm_line[key][:r].eof? or loop until comm_line[key][:r].eof? if you expect there to be several objects to be read.
I've got a simple test case for reading from a pipe that passes in MRI and fails in JRuby 1.7+. It's reported as a critical bug in JRUBY-6986.
tl;dr Why would a 16kb (16384 bytes) file be read perfectly from a pipe and a 16kb + 1 byte file fail? On another system, the threshold was found to be 72kb (73728 bytes).
$ ruby testpipe.rb 16384
Starting...didn't hang!
$ ruby testpipe.rb 16385
Starting...^C
There, the script hung at 16kb + 1 byte.
require 'open3'
path = "test.txt"
File.open(path, 'w') { |f| f.write('Z' * ARGV[0].to_i) }
STDOUT.write "Starting..."
Open3.popen3('cat', path) do |stdin, stdout, stderr|
stdin.close
readers = [stdout, stderr]
while readers.any?
ready = IO.select(readers, [], readers)
# no writers
# ready[1].each { ... }
ready[0].each do |fd|
if fd.eof?
fd.close
readers.delete fd
else
# read from pipe one byte at a time
fd.readpartial 1
end
end
end
end
puts "didn't hang!"
Could some operating system or JVM buffer size be causing different signals to be sent to IO.select or something?
PS. I doubled-checked with Jesse Storimer, author of Working With UNIX processes and he believes my usage of IO.select, etc. is correct.
I would like to use Sinatra's Streaming capability introduced in 1.3 coupled with some stdout redirection. It would basically be a live streaming output of a long running job. I looked into this question and the Sinatra streaming sample in the README.
Running 1.8.7 on OSX:
require 'stringio'
require 'sinatra'
$stdout.sync = true
module Kernel
def capture_stdout
out = StringIO.new
$stdout = out
yield out
ensure
$stdout = STDOUT
end
end
get '/' do
stream do |out|
out << "Part one of a three part series... <br>\n"
sleep 1
out << "...part two... <br>\n"
sleep 1
out << "...and now the conclusion...\n"
Kernel.capture_stdout do |stream|
Thread.new do
until (line = stream.gets).nil? do
out << line
end
end
method_that_prints_text
end
end
end
def method_that_prints_text
puts "starting long running job..."
sleep 3
puts "almost there..."
sleep 3
puts "work complete!"
end
So this bit of code prints out the first three strings properly, and blocks while the method_that_prints_text executes and does not print anything to the browser. My feeling is that stdout is empty on the first call and it never outputs to the out buffer. I'm not quite sure what the proper ordering would be and would appreciate any suggestions.
I tried a few of the EventMachine implementations mentioned in the question above, but couldn't get them to work.
UPDATE
I tried something slightly different to where I had the method run in a new thread, and override STDOUT for that thread as described here...
Instead of Kernel.capture_stdout above...
s = StringIO.new
Thread.start do
Thread.current[:stdout] = s
method_that_prints_text
end.join
while line = s.gets do
out << line
end
out << s.string
With the ThreadOut module listed in the link above, this seems to work a bit better. However it doesn't stream. The only time something is printed to the browser is on the final line out << s.string. Does StringIO not have the capability to stream?
I ended up solving this by discovering that s.string was updated periodically as time went on, so I just captured the output in a separate thread and grabbed the differences and streamed them out. It appears as though string redirection doesn't behave like a normal IO object.
s = StringIO.new
t = Thread.start do
Thread.current[:stdout] = s
method_that_prints_text
sleep 2
end
displayed_text = ''
while t.alive? do
current_text = s.string
unless (current_text.eql?(displayed_text))
new_text = current_text[displayed_text.length..current_text.length]
out << new_text
displayed_text = current_text * 1
end
sleep 2
end