What's the meaning of `0x0100` for a `Process::Status` value? - ruby

I have code:
Process.spawn(RbConfig.ruby, "a ruby file", "arg")
and I wait and check its status by:
Process.wait
$?.success?
Most of the time, it works well. But sometimes, $?.success? is false and $?.to_i is 0x0100. It seems the failed process didn't get a chance to run any code before 0x0100 was returned (I didn't send any signal to the process). I wonder the meaning of 0x0100. I further want to know if Ruby's spawn may fail when the command is all right. Could anyone help?

Here is a quote from the Process::Status class documentation:
Posix systems record information on processes using a 16-bit integer. The lower bits record the process status (stopped, exited, signaled) and the upper bits possibly contain additional information (for example the program's return code in the case of exited processes). Pre Ruby 1.8, these bits were exposed directly to the Ruby program. Ruby now encapsulates these in a Process::Status object. To maximize compatibility, however, these objects retain a bit-oriented interface. In the descriptions that follow, when we talk about the integer value of stat, we're referring to this 16 bit value.
The method Process::Status#to_i returns this stat as a Fixnum.

OK, finally I got the answer: when a ruby process throws an uncaught exception, the process' exit code will be 0x0100. This is from my observation on Ubuntu 14.04 and Ruby 2.2. For example: there's ruby file a.rb, and in another file, say src.rb, there's a code snippet:
Process.spawn(RbConfig.ruby, "a.rb", "arg")
Process.wait
If a.rb throws an uncaught exception, then $?.to_i will be 0x0100. What's more, I also observed that a.rb sometimes didn't get executed before its process failed with 0x0100. So I guess it may have something to do with the Ruby interpreter since I'm sure a.rb is OK.
Anyway, there' no official document mentioning the exact behavior. So my experience is for your reference.

Related

Job control in Ruby - SIGCONT handlers not working, SIGTSTP handler working only for irb. What am I missing?

I was working on trying to implement some kind of shell job control for a custom event loop handler with the GLib2 API in Ruby-GNOME. Ideally, this would be able to handle SIGTSTP and SIGCONT signals, to background the process at a TTY when running under a shell and to resume the background process on 'fg' from the shell.
I've not been able to figure out how to completely approach this with the API available in Ruby.
For a simpler usage case, I thought that I'd try adding a similar job support for IRB. I've added the following to my ~/.irbrc. The SIGTSTP handler seems to work, but the process remains suspended even after SIGCONT from fg in BASH.
## conditional section for ~/.irbrc
## can be activated with `IRB_JOBS_TEST=Defined irb`
if ENV['IRB_JOBS_TEST']
module Jobs
TSTP_HDLR_ORIG ||= Signal.trap("TSTP") do
STDERR.puts "\nJobs: backgrounding #{Process.pid} (#{TSTP_HDLR_ORIG.inspect}, #{CONT_HDLR_ORIG.inspect})"
Process.setpgid(0, Process.ppid)
TSTP_HDLR_ORIG.call if TSTP_HDLR_ORIG.respond_to?(:call)
end
CONT_HDLR_ORIG ||= Signal.trap("CONT") do
Process.setpgid(0, Process.pid)
STDERR.puts "Continuing in #{Process.pid}" ## not reached, not shown
IRB.CurrentContext.thread.wakeup ## no effect
CONT_HDLR_ORIG.call if CONT_HDLR_ORIG.respond_to?(:call)
end
end
end
I'm testing this on FreeBSD 13.1. I've read the FreeBSD termios(4), tcsetpgrp(3), and fcntl(2) manual pages. I'm not sure how much of the terminal-related API is available in Ruby.
The TSTP handler here seems to work, but the CONT handler is apparently not ever reached. I'm not sure if the TSTP handler is actually doing enough for - in effect - backgrounding the process in the shell's process group and relinquishing the controlling terminal.
With that TSTP handler, I can then background the IRB process in the shell with Ctrl-z. I can also foreground the process with 'fg' or BASH '%', but then the process is unresponsive. FreeBSD's Ctl-t handler shows the process as suspended. Apparently nothing in my CONT handler is reached.
I'm really stumped about what's failing in this approach - what my TSTP/CONT handlers are missing, what's available in Ruby, and why the process stays suspended after 'fg' in the shell.
In a more complex example, with the code I've written for glib2 it was apparently not enough to just call
Process.setpgid(0, Process.ppid)
as the process was not being backgrounded then. This would probably need another question though, as the example code for it isn't quite so short. So, I thought I'd try starting with IRB ...
After trying to foreground the process, then with Ctrl-t at the TTY on FreeBSD, I'm seeing the following
$ %
IRB_JOBS_TEST=Defined irb
load: 0.16 cmd: ruby31 4076 [suspended] 2.36r 0.19u 0.03s 1% 23828k
mi_switch+0xc2 thread_suspend_check+0x260 sleepq_catch_signals+0x113 sleepq_wait_sig+0x9 _cv_wait_sig+0xec tty_wait_background+0x30d ttydev_ioctl+0x14b devfs_ioctl+0xc6 vn_ioctl+0x1a4 devfs_ioctl_f+0x1e kern_ioctl+0x25b sys_ioctl+0xf1 amd64_syscall+0x10c fast_syscall_common+0xf8
So, it's blocking in an ioctl on resume?
Update
After a few hours of ineffectual hacking about this, I've removed the SIGTSTP and SIGCONT signal handlers from my GLib example code and now it "Just Works". I can background the example app at the console ... at least when it's not running under IRB ... and I can bring it back to the process group foreground with the shell. It resumes running on SIGCONT and everything looks alright in the logging from its main event loop.
I'm still not certain what the missing parts may have been, in may handlers/hacks for SIGTSTP and SIGCONT with IRB. Of course, with the input history recording in IRB it's typically simple enough to just restart the process..
Looking at how other applications have approached job control at the console, I think Emacs wraps its TTY I/O streams in some kind of an encapsulated struct? looking at Emacs' terminal.c mainly.
Glad to see if there's job control available in Ruby though, and it does not even need a custom signal handler for some applications?

Ruby - fork, exec, detach .... do we have a race condition here?

Simple example, which doesn't work on my platform (Ruby 2.2, Cygwin):
#!/usr/bin/ruby
backtt = fork { exec('mintty','/usr/bin/zsh','-i') }
Process.detach(backtt)
exit
This tiny program (when started from the shell) is supposed to span a terminal window (mintty) and then get me back to the shell prompt.
However, while it DOES create the mintty window, I don't have a shell prompt afterwards, and I can't type anything in the calling shell.
But when I introduce a small delay before the detach, either using 'sleep', or by printing something on stdout, it works as expected:
#!/usr/bin/ruby
backtt = fork { exec('mintty','/usr/bin/zsh','-i') }
sleep 1
Process.detach(backtt)
exit
Why is this necessary?
BTW, I'm well aware that I could (from the shell) do a
mintty /usr/bin/zsh -i &
directly, or I could use system(...... &) from inside Ruby, but this is not the point here. I'm particularily interested in the fork/exec/detach behaviour in Ruby. Any insights?
Posting as an answer, because it is too long for a comment
Although I am no specialist in Ruby, and do not know Cygwin at all, this situation sounds very familiar to me, coming from C/C++.
This script is too short, so the parent of the parent completes, while the grandchild tries to start.
What would happen if you put the sleep after detach and before exit?
If my theory is correct, it should work too. Your program exits before any (or enough) thread-switching happens.
I call such problems "interrupted hand shaking". Although this is psychology terminology, it describes what happens.
Sleep "gives up the time slice", leading to thread-switching,
Console output (any file I/O) runs into semaphores, also leading to thread switching.
If my idea is correct, it should also work, if you dont "sleep", just count to 1e9 (depending on the speed of computation) because then preemptive multitasking makes even the thread-switch itself not giving up the CPU.
So it is an error in programming (IMHO: race condition is philosophical in that case), but it will get hard to find "who" is responsible. There are many things involved.
According to the documentation:
Process::detach prevents this by setting up a separate Ruby thread whose sole job is to reap the status of the process pid when it terminates.
NB: I can’t reproduce this behaviour on any of available to me operating systems, and I’m posting this as an answer just for the sake of formatting.
Since Process.detach(backtt) transparently creates a thread, I would suggest you to try:
#!/usr/bin/ruby
backtt = fork { exec('mintty','/usr/bin/zsh','-i') }
# ⇓⇓⇓⇓⇓
Process.detach(backtt).join
exit
This is no hack by any mean (as opposite to silly sleep,) since you are likely aware of that the underlying command should return more-or-less immediately. I am not a guru in cygwin, but it might have some specific issues with threads, so, let this thread to be handled.
I'm neither a Ruby nor a Cygwin guy, so what I propose here may not work at all. Anyways: I guess, you're not even hitting a Ruby or Cygwin specific bug here. In a program called "start" I've written in C many years ago, I hit the same issue. Here is a comment from the start of the function void daemonize_now():
/*
* This is a little bit trickier than I expected: If we simply call
* setsid(), it may fail! We have to fork() and exit(), and let our
* child call setsid().
*
* Now the problem: If we fork() and exit() immediatelly, our child
* will be killed before it ever had been run. So we need to sleep a
* little bit. Now the question: How long? I don't know an answer. So
* let us being killed by our child :-)
*/
So, he strategy is this: Let the parent wait on it's child (that can be done immediately before the child actually had a chance to do anything) and then let the child do the detaching part. How? Let it create a new process group (it will be reparented to the init process). That's the setsid() call for, I'm talking about in the comment. It will work something like this (C-Syntax, you should be able to lookup the correct usage for Ruby and apply the needed changes yourself):
parentspid = getpid();
Fork = fork();
if (Fork) {
if (Fork == -1) { // fork() failed
handle error
} else { // parent, Fork is the pid of the child
int tmp; waitpid(0, &tmp, 0);
}
} else { // child
if (setsid() == -1) {
handle error - possibly by doing nothing
and just let the parent wait ...
} else {
kill(parentspid, SIGUSR1);
}
exec(...);
}
You can use any signal, that terminates the process (i.e. SIGKILL). I used SIGUSR1 and installed a signal handler that exit(0)s the parent process, so the caller gets a success message. Only caveat: You get a success even if the exec fails. However, that is a problem that can't really be worked around, since after a successful exec you can't signal your parent anything anymore. And since you don't know when the exec will have failed (if it fails), you're back at the race condition part.

EINTR and non-blocking calls

As is known, some blocking calls like read and write would return -1 and set errno to EINTR, and we need handle this.
My question is: Does this apply for non-blocking calls, e.g, set socket to O_NONBLOCK?
Since some articles and sources I have read said non-blocking calls don't need bother with this, but I have found no authoritative reference about it. If so, does it apply cross different implementations?
I cannot give you a definitive answer to this question, and the answer may further vary from system to system, but I would expect a non-blocking socket to never fail with EINTR. If you take a look at the man pages of various systems for the following socket functions bind(), connect(), send(), and receive(), or look those up in the POSIX standard, you'll notice something interesting: All these functions except one may return -1 and set errno to EINTR. The one function that is not documented to ever fail with EINTR is bind(). And bind() is also the only function of that list that will never block by default. So it seems that only blocking functions may fail because of EINTR, including read() and write(), yet if these functions never block, they also will never fail with EINTR and if you use O_NONBLOCK, those functions will never block.
It would also make no sense from a logical perspective. E.g. consider you are using blocking I/O and you call read() and this call has to block, but while it was blocking, a signal is sent to your process and thus the read request is unblocked. How should the system handle this situation? Claiming that read() did succeed? That would be a lie, it did not succeed because no data was read. Claiming it did succeed, but zero bytes data were read? This wouldn't be correct either, since a "zero read result" is used to indicate end-of-stream (or end-of-file), so your process would to assume that no data was read, because the end of a file has been reached (or a socket/pipe has been closed at other end), which simply isn't the case. The end-of-file (or end-of-stream) has not been reached, if you call read() again, it will be able to return more data. So that would also be a lie. You expectation is that this read call either succeeds and reads data or fails with an error. Thus the read call has to fail and return -1 in that case, but what errno value shall the system set? All the other error values indicate a critical error with the file descriptor, yet there was no critical error and indicating such an error would also be a lie. That's why errno is set to EINTR, which means: "There was nothing wrong with the stream. Your read call just failed, because it was interrupted by a signal. If it wasn't interrupted, it may still have succeeded, so if you still care for the data, please try again."
If you now switch to non-blocking I/O, the situation of above never arises. The read call will never block and if it cannot read data immediately, it will fail with an error EAGAIN (POSIX) or EWOULDBLOCK (unofficial, on Linux both are the same error, just alternative names for it), which means: "There is no data available right now and thus your read call would have to block and wait for data arriving, but blocking is not allowed, so it failed instead." So there is an error for every situation that may arise.
Of course, even with non-blocking I/O, the read call may have temporarily interrupted by a signal but why would the system have to indicate that? Every function call, whether this is a system function or one written by the user, may be temporarily interrupted by a signal, really every single one, no exception. If the system would have to inform the user whenever that happens, all system functions could possibly fail because of EINTR. However, even if there was a signal interruption, the functions usually perform their task all the way to the end, that's why this interruption is irrelevant. The error EINTR is used to tell the caller that the action he has requested was not performed because of a signal interruption, but in case of non-blocking I/O, there is no reason why the function should not perform the read or the write request, unless it cannot be performed right now, but then this can be indicated by an appropriate error.
To confirm my theory, I took a look at the kernel of MacOS (10.8), which is still largely based on the FreeBSD kernel and it seems to confirm the suspicion. If a read call is currently not possible, as no data are available, the kernel checks for the O_NONBLOCK flag in the file descriptor flags. If this flag is set, it fails immediately with EAGAIN. If it is not set, it puts the current thread to sleep by calling a function named msleep(). The function is documented here (as I said, OS X uses plenty of FreeBSD code in its kernel). This function causes the current thread to sleep until it is explicitly woken up (which is the case if data becomes ready for reading) or a timeout has been hit (e.g. you can set a receive timeout on sockets). Yet the thread is also woken up, if a signal is delivered, in which case msleep() itself returns EINTR and the next higher layer just passes this error through. So it is msleep() that produces the EINTR error, but if the O_NONBLOCK flag is set, msleep() is never called in the first place, hence this error cannot be returned.
Of course that was MacOS/FreeBSD, other systems may be different, but since most systems try to keep at least a certain level of consistency among these APIs, if a system breaks the assumption, that non-blocking I/O calls can never fail because of EINTR, this is probably not by intention and may even get fixed if your report it.
#Mecki Great explanation. To add to the accepted answer, the book "Unix Network Programming - Volume 1, Third Edition" (Stevens) makes a distinction between slow system call and others in chapter/section 5.9 - "Handling Interrupted System Calls". I am quoting from the book -
We used the term "slow system call" to describe accept, and we use
this term for any system call that can block forever. That is, the
system call need never return.
In the next para of the same section -
The basic rule that applies here is that when a process is blocked in
a slow system call and the process catches a signal and the signal
handler returns, the system call can return an error of EINTR.
Going by this explanation, a read / write on a non-blocking socket is not a slow system call and hence should not return an error of EINTR.
Just to add some evidence to #Mecki's answer, I found this discussion about fixing a bug in Linux where a patch caused non-blocking recvmsg to return EINTR. It was stated:
EINTR always means that you asked for a blocking operation, and a
signal arrived meanwhile.
Once you invert the "blocking" part of that set of conditions, EINTR
becomes an impossible event.
Also:
Look at what we do for AF_INET. We handle this the proper way.
If we are 'interrupted' by a signal while sleeping in lock_sock(),
recvmsg() on a non blocking socket, we return -EAGAIN properly, not
-EINTR.
Fact that we potentially sleep to get the socket lock is hidden for
the user, its an implementation detail of the kernel.
We never return -EINTR, as stated in manpage for non blocking sockets.
Source here: https://patchwork.ozlabs.org/project/netdev/patch/1395798147.12610.196.camel#edumazet-glaptop2.roam.corp.google.com/#741015

Ruby: Read large data from stdout and stderr of an external process on Windows

Greetings, all,
I need to run a potentially long-running process from Ruby 1.9.2 on Windows and subsequently capture and parse the data from the external process's standard output and error. A large amount of data can be sent to each, but I am only necessarily interested in one line at a time (not capturing and storing the whole of the output).
After a bit of research, I found that the Open3 class would take care of executing the process and giving me IO objects connected to the process's standard output and error (via popen3).
Open3.popen3("external-program.bat") do |stdin, out, err, thread|
# Step3.profit() ?
end
However, I'm not sure how to continually read from both streams without blocking the program. Since calling IO#readlines on out or err when a lot of data has been sent results in a memory allocation error, I'm trying to continuously check both streams for available input, but not having much luck with any of my implementations.
Thanks in advance for any advice!
After a lot of different trial and error attempts, I eventually came up with using two threads, one to read from each stream (generator.rb is just a script I wrote to output things to standard out and err):
require 'open3'
data = {}
Open3.popen3("ruby generator.rb") do |stdin, out, err, external|
# Create a thread to read from each stream
{ :out => out, :err => err }.each do |key, stream|
Thread.new do
until (line = stream.gets).nil? do
data[key] = line
end
end
end
# Don't exit until the external process is done
external.join
end
puts data[:out]
puts data[:err]
It simply outputs the last line sent to standard output and error by the calling program, but could obviously be extended to do additional processing (with different logic in each thread). A method I was using before I finally came up with this was resulting in some failures due to race conditions; I don't know if this code is still vulnerable, but I've yet to experience a similar failure.

EWOULDBLOCK equivalent errno under Windows Perl

G'day Stackoverflowers,
I'm the author of Perl's autodie pragma, which changes Perl's built-ins to throw exceptions on failure. It's similar to Fatal, but with lexical scope, an extensible exception model, more intelligent return checking, and much, much nicer error messages. It will be replacing the Fatal module in future releases of Perl (provisionally 5.10.1+), but can currently be downloaded from the CPAN for Perl 5.8.0 and above.
The next release of autodie will add special handling for calls to flock with the LOCK_NB (non-blocking) option. While a failed flock call would normally result in an exception under autodie, a failed call to flock using LOCK_NB will merely return false if the returned errno ($!) is EWOULDBLOCK.
The reason for this is so people can continue to write code like:
use Fcntl qw(:flock);
use autodie; # All perl built-ins now succeed or die.
open(my $fh, '<', 'some_file.txt');
my $lock = flock($fh, LOCK_EX | LOCK_NB); # Lock the file if we can.
if ($lock) {
# Opportuntistically do something with the locked file.
}
In the above code, a lock that fails because someone else has the file locked already (EWOULDBLOCK) is not considered to be a hard error, so autodying flock merely returns a false value. In the situation that we're working with a filesystem that doesn't support file-locks, or a network filesystem and the network just died, then autodying flock generates an appropriate exception when it sees that our errno is not EWOULDBLOCK.
This works just fine in my dev version on Unix-flavoured systems, but it fails horribly under Windows. It appears that while Perl under Windows supports the LOCK_NB option, it doesn't define EWOULDBLOCK. Instead, the errno returned is 33 ("Domain error") when blocking would occur.
Obviously I can hard-code this as a constant into autodie, but that's not what I want to do here, because it means that I'm screwed if the errno ever changes (or has changed). I would love to compare it to the Windows equivalent of POSIX::EWOULDBLOCK, but I can't for the life of me find where such a thing would be defined. If you can help, let me know.
Answers I specifically don't want:
Suggestions to hard-code it as a constant (or worse still, leave a magic number floating about).
Not supporting LOCK_NB functionality at all under Windows.
Assuming that any failure from a LOCK_NB call to flock should return merely false.
Suggestions that I ask on p5p or perlmonks. I already know about them.
An explanation of how flock, or exceptions, or Fatal work. I already know. Intimately.
Under Win32 "native" Perl, note that $^E is more descriptive at 33, "The process cannot access the file because another process locked a portion of the file" which is ERROR_LOCK_VIOLATION (available from Win32::WinError).
For the Windows-specific error code, you want to use $^E. In this case, it's 33: "The process cannot access the file because another process has locked a portion of the file" (ERROR_LOCK_VIOLATION in winerror.h).
Unfortunately, I don't think Win32::WinError is in core. On the other hand, if Microsoft ever renumbered the Windows error codes, pretty much every Windows program ever written would stop working, so I don't think there'll be a problem with hardcoding it.

Resources