IO.copy_stream performance in ruby - ruby

I am trying to continously read a file in ruby (which is growing over time and needs to be processed in a separate process). Currently I am archiving this with the following bit of code:
r,w = IO.pipe
pid = Process.spawn('ffmpeg'+ffmpeg_args, {STDIN => r, STDERR => STDOUT})
Process.detach pid
while true do
IO.copy_stream(open(#options['filename']), w)
sleep 1
end
However - while working - I can't imagine that this is the most performant way of doing it. An alternative would be to use the following variation:
step = 1024*4
copied = 0
pid = Process.spawn('ffmpeg'+ffmpeg_args, {STDIN => r, STDERR => STDOUT})
Process.detach pid
while true do
IO.copy_stream(open(#options['filename']), w, step, copied)
copied += step
sleep 1
end
which would only continously copy parts of the file (the issue here being if the step should ever overreach the end of the file). Other approaches such a simple read-file led to ffmpeg failing when there was no new data. With this solution the frames are dropped if no new data is available (which is what I need).
Is there a better way (more performant) to archive something like that?
EDIT:
Using the method proposed by #RaVeN I am now using the following code:
open(#options['filename'], 'rb') do |stream|
stream.seek(0, IO::SEEK_END)
queue = INotify::Notifier.new
queue.watch(#options['filename'], :modify) do
w.write stream.read
end
queue.run
end
However now ffmpeg complaints about invalid data. Is there another method than read?

Related

Ruby: intercept popen system call and log stdout and stderr to same file

In ruby code I am running a system call with Open3.popen3 and using the resultant IO for stdout and stderr to do some log message formatting before writing to one log file. I was wondering what would be the best way to do this so log messages will maintain the correct order, note I need to do separate formatting for error messages as for stdout messages.
Here's my current code (Assume logger is thread safe)
Open3.popen3("my_custom_script with_some_args") do |_in, stdout, stderr|
stdout_thr = Thread.new do
while line = stdout.gets.chomp
logger.info(format(:info, line))
end
end
stderr_thr = Thread.new do
while line = stderr.gets.chomp
logger.error(format(:error, line))
end
end
[stdout_thr, stderr_thr].each(&:join)
end
This has worked for me so far, but I'm not so confident that I can guarantee the correct order of the log messages. Is there a better way?
What you're trying to achieve is not possible with a guarantee. First thing to note is that your code could only possibly order based on the time that the data was received, not when it was produced, which is not quite the same. The only way to guarantee this would be to do something on the source which will add some guaranteed ordering between the two systems.
The below code should make it "more likely" to be correct by removing the threads. Assuming that you're using MRI, the threads are "green" so technically can't be running at the same time. That means you're beholden upon the scheduler choosing to run your thread at the "right" time.
Open3.popen3("my_custom_script with_some_args") do |_in, stdout, stderr|
for_reading = [stdout, stderr]
until(for_reading.empty?) do
wait_timeout = 1
# IO.select blocks until one of the streams is has something to read
# or the wait timeout is reached
readable, _writable, errors = IO.select(for_reading, [], [], wait_timeout)
# readable is nil in the case of a timeout - loop back again
if readable.nil?
Thread.pass
else
# In the case that both streams are readable (and thus have content)
# read from each of them. In this case, we cannot guarantee any order
# because we recieve the items at essentially the same time.
# We can still ensure that we don't mix data incorrectly.
readable.each do |stream|
buffer = ''
# loop through reading data until there is an EOF (value is nil)
# or there is no more data to read (value is empty)
while(true) do
tmp = stream.read_nonblock(4096, buffer, exception: false)
if tmp.nil?
# stream is EOF - nothing more to read on that one..
for_reading -= [stream]
break
elsif tmp.empty? || tmp == :wait_readable
# nothing more to read right now...
# continue on to process the buffer into lines and log them
break
end
end
if stream == stdout
buffer.split("\n").each { |line| logger.info(format(:info, line)) }
elsif stream == stderr
buffer.split("\n").each { |line| logger.info(format(:error, line)) }
end
end
end
end
end
Note that in a system generating a lot of output in a very short period of time there is more likely to be an overlap where things get out of order. This likelihood increases with the amount time taken to read the stream and process it. It would be best to ensure that the absolute minimum processing is done inside the loop. If the formatting (and writing) are expensive, consider moving those items into a separate thread reading from a single queue, and have the code inside the loop only push the buffer (and source identifier) onto the queue.

Handling zombie processes when using waitpid

I have the following method, the idea is to run a shell command and both stream output to stdout as its recieved and store the information as a variable so I can return a hash of the information, I found no standard way of doing this (you either get streaming or captured output).
It does this by creating a forks to stream the output and append to an IO pipe that I can read in at a later date.
def self.run_cmd(cmd)
stdout_rd, stdout_wr = IO.pipe
stderr_rd, stderr_wr = IO.pipe
status = Open4::popen4(cmd) do |_pid, _stdin, _stdout, _stderr|
pids = []
pids << fork do
_stdout.each_line do |l|
print l
stdout_wr.puts l
end
end
pids << fork do
_stderr.each_line do |l|
print l
stderr_wr.puts l
end
end
pids.each{|pid| Process.waitpid(pid)}
end
stdout_wr.close
stderr_wr.close
out = stdout_rd.gets
out = '' if out.nil?
err = stderr_rd.gets
err = '' if err.nil?
{ stdout: out, stderr: err, status: status.exitstatus }
end
This works great in most scenarios but specifically unzip doesn't play well with this approach, what happens is after a fixed amount of output from zip it will stall at stdout_wr.puts l
I've observed that when the ruby process has stalled that a zombie unzip is visible when running ps
Is there any way I can make this work?
Is there a better way of doing this? I appreciate that its a complex solution and it must be easier.
My potential idea is that my IO pipe is running out of buffered space but I'm able to print 10,000 lines of output without issue.

How to timeout named pipes in ruby?

I saw an article which suggests the following code for a writer:
output = open("my_pipe", "w+") # the w+ means we don't block
output.puts "hello world"
output.flush # do this when we're done writing data
and a reader:
input = open("my_pipe", "r+") # the r+ means we don't block
puts input.gets # will block if there's nothing in the pipe
But could it happen that open, puts, gets will block the program? Is there some kind of timeout in place? Can one change it? Also, how come w+ means non-blocking call? Which open system call flags is it converted to?
Okay, let me share with you my picture of the world. As rogerdpack said, there are two options: 1) using select in blocking mode, 2) using non-blocking mode (O_NONBLOCK flag, read_nonblock, write_nonblock, select methods). I haven't tried, so these are just speculations.
As to why open, puts and gets may block the thread. open call blocks until there are at least one reader and at least one writer. And that must be the reason why we need to specify r+, w+ for open call. Judging from strace output they both are converted to O_RDWR flag. Then there must be some buffer, where not yet received data are stored. And that must be the reason why write methods may block. Read methods may block because they expect more data to be available, than it really is.
UPD
If a process attempts to read from an empty pipe, then read(2) will block until data is available. If a process attempts to write to a full pipe (see below), then write(2) blocks until sufficient data has been read from the pipe to allow the write to complete.
-- http://linux.die.net/man/7/pipe
The FIFO must be opened on both ends (reading and writing) before data can be passed. Normally, opening the FIFO blocks until the other end is opened also.
Under Linux, opening a FIFO for read and write will succeed both in blocking and nonblocking mode. POSIX leaves this behavior undefined. This can be used to open a FIFO for writing while there are no readers available.
-- http://linux.die.net/man/7/fifo
And here's the implementation I came up with:
#!/home/yuri/.rbenv/shims/ruby
require 'timeout'
data = ((0..15).to_a.map { |v|
(v < 10 ? '0'.ord + v : 'a'.ord + v - 10).chr
} * 4096 * 2).reduce('', :+)
timeout = 10
start = Time.now
open('1.fifo', File::WRONLY | File::NONBLOCK) { |out|
out.flock(File::LOCK_EX)
nwritten = 0
data_len = data.length
begin
delta = out.write_nonblock data
data = data[delta..-1]
nwritten += delta
rescue IO::WaitWritable, Errno::EINTR
timeout_left = timeout - (Time.now - start)
if timeout_left < 0
puts Time.now - start
raise Timeout::Error
end
IO.select nil, [out], nil, timeout_left
retry
end while nwritten < data_len
}
puts Time.now - start
But for my problem at hand I decided to ignore this timeout thing. It probably will suffice to handle just situations when there is no reader on the other end of the pipe (Errno::ENXIO):
open('1.fifo', File::WRONLY | File::NONBLOCK) { |out|
out.flock(File::LOCK_EX)
nwritten = 0
data_len = data.length
begin
delta = out.write_nonblock data
data = data[delta..-1]
nwritten += delta
rescue IO::WaitWritable, Errno::EINTR
IO.select nil, [out]
retry
end while nwritten < data_len
}
P.S. Your feedback is appreciated.
This page should answer all your questions... http://www.ruby-doc.org/core-2.0.0/IO.html
In general, puts can always block the current thread, since they may have to wait for IO to complete for it to return. gets can also block the current thread because it will read and read forever until it hits the first newline, then it will return everything it read. HTH.

Using parfor and labSend/labRecieve

I want to run two matlab scripts in parallel for a project and communicate between them. The purpose of this is to have one script do image analysis and sending the results to the other which will use it for more calculations (time consuming, but not related to the task of finding stuff in the images). Since both tasks are time consuming, and should preferably be done in real time, I believe that parallelization is necessary.
To get a feel for how this should be done I created a test script to find out how to communicate between the two scripts.
The first script takes a user input using the built in function input, and then using labSend sends it to the other, which recieves it, and prints it.
function [blarg] = inputStuff(blarg)
mpiInit(); %added because of error message, but do not work...
for i=1:2
labBarrier; % added because of error message
inp = input('Enter a number to write');
labSend(inp);
if (inp == 0)
break;
else
i = 1;
end
end
end
function [ blarg ] = testWrite( blarg )
mpiInit(); % added because of error message, but does not help
par = 0;
if ( blarg == 0)
par = 1;
end
for i = 1:10
if (par == 1)
labBarrier
delta = labReceive();
i = 1;
else
delta = input('Enter number to write');
end
if (delta == 0)
break;
end
s = strcat('This lab no', num2str(labindex), '. Delta is = ')
delta
end
end
%%This is the file test_parfor.m
funlist = {#inputStuff, #testWrite};
matlabpool(2);
mpiInit(); % added because of error message, but does not help
parfor i=1:2
funlist{i}(0);
end
matlabpool close;
Then, when the code is run, the following error message appears:
Starting matlabpool using the 'local' profile ... connected to 2 labs.
Error using parallel_function (line 589)
The MPI implementation has not yet been loaded. Please
call mpiInit.
Error stack:
testWrite.m at 11
Error in test_parfor (line 8)
parfor i=1:2
Calling the method mpiInit does not help... (Called as shown in the code above.)
And nowhere in the examples that mathworks have in the documentation, or on their website, show this error or what to do with it.
Any help is appreciated!
You would typically use constructs such as labSend, labRecieve and labBarrier within an spmd block, rather than a parfor block.
parfor is intended for implementing embarrassingly parallel algorithms, in other words algorithms that consist of multiple independent tasks that can be run in parallel, and do not require communication between tasks.
I'm stretching my knowledge here (perhaps someone more expert can correct me), but as I understand things, it does not set up an MPI ring for communication between workers, which is probably the explanation for the (rather uninformative) error message you're getting.
An spmd block enables communication between workers using labSend, labRecieve and labBarrier. There are quite a few examples of using them all in the documentation.
Sam is right that the MPI functionality is not enabled during parfor, only during spmd. You need to do something more like this:
spmd
funlist{labindex}(0);
end
(Sam is also quite right that the error message you saw is pretty unhelpful)

How do I block on reading a named pipe in Ruby?

I'm trying to set up a Ruby script that reads from a named pipe in a loop, blocking until input is available in the pipe.
I have a process that periodically puts debugging events into a named pipe:
# Open the logging pipe
log = File.open("log_pipe", "w+") #'log_pipe' created in shell using mkfifo
...
# An interesting event happens
log.puts "Interesting event #4291 occurred"
log.flush
...
I then want a separate process that will read from this pipe and print events to the console as they happen. I've tried using code like this:
input = File.open("log_pipe", "r+")
while true
puts input.gets #I expect this to block and wait for input
end
# Kill loop with ctrl+c when done
I want the input.gets to block, waiting patiently until new input arrives in the fifo; but instead it immediately reads nil and loops again, scrolling off the top of the console window.
Two things I've tried:
I've opened the input fifo with both "r" and "r+"--I have the same problem either way;
I've tried to determine if my writing process is sending EOF (which I've heard will cause the read fifo to close)--AFAIK it isn't.
SOME CONTEXT:
If it helps, here's a 'big picture' view of what I'm trying to do:
I'm working on a game that runs in RGSS, a Ruby based game engine. Since it doesn't have good integrated debugging, I want to set up a real-time log as the game runs--as events happen in the game, I want messages to show up in a console window on the side. I can send events in the Ruby game code to a named pipe using code similar to the writer code above; I'm now trying to set up a separate process that will wait for events to show up in the pipe and show them on the console as they arrive. I'm not even sure I need Ruby to do this, but it was the first solution I could think of.
Note that I'm using mkfifo from cygwin, which I happened to have installed anyway; I wonder if that might be the source of my trouble.
If it helps anyone, here's exactly what I see in irb with my 'reader' process:
irb(main):001:0> input = File.open("mypipe", "r")
=> #<File:mypipe>
irb(main):002:0> x = input.gets
=> nil
irb(main):003:0> x = input.gets
=> nil
I don't expect the input.gets at 002 and 003 to return immediately--I expect them to block.
I found a solution that avoids using Cygwin's unreliable named pipe implementation entirely. Windows has its own named pipe facility, and there is even a Ruby Gem called win32-pipe that uses it.
Unfortunately, there appears to be no way to use Ruby Gems in an RGSS script; but by dissecting the win32-pipe gem, I was able to incorporate the same idea into an RGSS game. This code is the bare minimum needed to log game events in real time to a back channel, but it can be very useful for deep debugging.
I added a new script page right before 'Main' and added this:
module PipeLogger
# -- Change THIS to change the name of the pipe!
PIPE_NAME = "RGSSPipe"
# Constant Defines
PIPE_DEFAULT_MODE = 0 # Pipe operation mode
PIPE_ACCESS_DUPLEX = 0x00000003 # Pipe open mode
PIPE_UNLIMITED_INSTANCES = 255 # Number of concurrent instances
PIPE_BUFFER_SIZE = 1024 # Size of I/O buffer (1K)
PIPE_TIMEOUT = 5000 # Wait time for buffer (5 secs)
INVALID_HANDLE_VALUE = 0xFFFFFFFF # Retval for bad pipe handle
#-----------------------------------------------------------------------
# make_APIs
#-----------------------------------------------------------------------
def self.make_APIs
$CreateNamedPipe = Win32API.new('kernel32', 'CreateNamedPipe', 'PLLLLLLL', 'L')
$FlushFileBuffers = Win32API.new('kernel32', 'FlushFileBuffers', 'L', 'B')
$DisconnectNamedPipe = Win32API.new('kernel32', 'DisconnectNamedPipe', 'L', 'B')
$WriteFile = Win32API.new('kernel32', 'WriteFile', 'LPLPP', 'B')
$CloseHandle = Win32API.new('kernel32', 'CloseHandle', 'L', 'B')
end
#-----------------------------------------------------------------------
# setup_pipe
#-----------------------------------------------------------------------
def self.setup_pipe
make_APIs
##name = "\\\\.\\pipe\\" + PIPE_NAME
##pipe_mode = PIPE_DEFAULT_MODE
##open_mode = PIPE_ACCESS_DUPLEX
##pipe = nil
##buffer = 0.chr * PIPE_BUFFER_SIZE
##size = 0
##bytes = [0].pack('L')
##pipe = $CreateNamedPipe.call(
##name,
##open_mode,
##pipe_mode,
PIPE_UNLIMITED_INSTANCES,
PIPE_BUFFER_SIZE,
PIPE_BUFFER_SIZE,
PIPE_TIMEOUT,
0
)
if ##pipe == INVALID_HANDLE_VALUE
# If we could not open the pipe, notify the user
# and proceed quietly
print "WARNING -- Unable to create named pipe: " + PIPE_NAME
##pipe = nil
else
# Prompt the user to open the pipe
print "Please launch the RGSSMonitor.rb script"
end
end
#-----------------------------------------------------------------------
# write_to_pipe ('msg' must be a string)
#-----------------------------------------------------------------------
def self.write_to_pipe(msg)
if ##pipe
# Format data
##buffer = msg
##size = msg.size
$WriteFile.call(##pipe, ##buffer, ##buffer.size, ##bytes, 0)
end
end
#------------------------------------------------------------------------
# close_pipe
#------------------------------------------------------------------------
def self.close_pipe
if ##pipe
# Send kill message to RGSSMonitor
##buffer = "!!GAMEOVER!!"
##size = ##buffer.size
$WriteFile.call(##pipe, ##buffer, ##buffer.size, ##bytes, 0)
# Close down the pipe
$FlushFileBuffers.call(##pipe)
$DisconnectNamedPipe.call(##pipe)
$CloseHandle.call(##pipe)
##pipe = nil
end
end
end
To use this, you only need to make sure to call PipeLogger::setup_pipe before writing an event; and call PipeLogger::close_pipe before game exit. (I put the setup call at the start of 'Main', and add an ensure clause to call close_pipe.) After that, you can add a call to PipeLogger::write_to_pipe("msg") at any point in any script with any string for "msg" and write into the pipe.
I have tested this code with RPG Maker XP; it should also work with RPG Maker VX and later.
You will also need something to read FROM the pipe. There are any number of ways to do this, but a simple one is to use a standard Ruby installation, the win32-pipe Ruby Gem, and this script:
require 'rubygems'
require 'win32/pipe'
include Win32
# -- Change THIS to change the name of the pipe!
PIPE_NAME = "RGSSPipe"
Thread.new { loop { sleep 0.01 } } # Allow Ctrl+C
pipe = Pipe::Client.new(PIPE_NAME)
continue = true
while continue
msg = pipe.read.to_s
puts msg
continue = false if msg.chomp == "!!GAMEOVER!!"
end
I use Ruby 1.8.7 for Windows and the win32-pipe gem mentioned above (see here for a good reference on installing gems). Save the above as "RGSSMonitor.rb" and invoke it from the command line as ruby RGSSMonitor.rb.
CAVEATS:
The RGSS code listed above is fragile; in particular, it does not handle failure to open the named pipe. This is not usually an issue on your own development machine, but I would not recommend shipping this code.
I haven't tested it, but I suspect you'll have problems if you write a lot of things to the log without running a process to read the pipe (e.g. RGSSMonitor.rb). A Windows named pipe has a fixed size (I set it here to 1K), and by default writes will block once the pipe is filled (because no process is 'relieving the pressure' by reading from it). Unfortunately, the RPGXP engine will kill a Ruby script that has stopped running for 10 seconds. (I'm told that RPGVX has eliminated this watchdog function--in which case, the game will hang instead of abruptly terminating.)
What's probably happening is the writing process is exiting, and as there are no other writing processes, EOF is sent to the pipe which causes gets to return nil, and so your code loops continually.
To get around this you can usually just open the pipe read-write at the reader end. This works for me (on a Mac), but isn't working for you (you've tried "r" and "r+"). I'm guessing this is to due with Cygwin (POSIX says opening a FIFO read-write is undefined).
An alternative is to open the pipe twice, once read-only and once write-only. You don't use the write-only IO for anything, it's just so that there's always an active writer attached to the pipe so it doesn't get closed.
input = File.open("log_pipe", "r") # note 'r', not 'r+'
keep_open = File.open("log_pipe", "w") # ensure there's always a writer
while true
puts input.gets
end

Resources