Is there a way to get a process's child process status based on its PID in Ruby?
For example, in Python you can do psutil.Process(pid).status
I don't know of a portable ruby method to get process state of a running process. You can do Process.wait and check $?.exitstatus, but that doesn't look like what you want. For a posix solution, you could use
`ps -o state -p #{pid}`.chomp
to get the letter code ps produces for process state
PROCESS STATE CODES
Here are the different values that the s, stat and state output specifiers
(header "STAT" or "S") will display to describe the state of a process.
D Uninterruptible sleep (usually IO)
R Running or runnable (on run queue)
S Interruptible sleep (waiting for an event to complete)
T Stopped, either by a job control signal or because it is being traced.
W paging (not valid since the 2.6.xx kernel)
X dead (should never be seen)
Z Defunct ("zombie") process, terminated but not reaped by its parent.
I was looking for the same thing. It's a shame ProcessStatus doesn't seem to be able to get initialized from a live pid. This is vital stuff if you want to do anything like a safe timed kill of a child process.
In any case,
it's the second line in /proc/$pid/status if you're on Linux.:
status_line = File.open("/proc/#{pid}/status") {|f| f.gets; f.gets }
Most likely much much faster than anything involving an external program.
On OS X, I setup a string:
outputstring="ps -O=S -p #{mypid}"
then execute it in a %x call:
termoutput=%x[#{outputstring}]
I can display that if needed, or just keep the output clean and act on the State I found with the call.
Related
I have this code:
#!/bin/bash
pids=()
for i in $(seq 1 999); do
sleep 1 &
pids+=( "$!" )
done
for pid in "${pids[#]}"; do
wait "$pid"
done
I expect the following behavior:
spin through the first loop
wait about a second on the first pid
spin through the second loop
Instead, I get this error:
./foo.sh: line 8: wait: pid 24752 is not a child of this shell
(repeated 171 times with different pids)
If I run the script with shorter loop (50 instead of 999), then I get no errors.
What's going on?
Edit: I am using GNU bash 4.4.23 on Windows.
POSIX says:
The implementation need not retain more than the {CHILD_MAX} most recent entries in its list of known process IDs in the current shell execution environment.
{CHILD_MAX} here refers to the maximum number of simultaneous processes allowed per user. You can get the value of this limit using the getconf utility:
$ getconf CHILD_MAX
13195
Bash stores the statuses of at most twice as that many exited background processes in a circular buffer, and says not a child of this shell when you call wait on the PID of an old one that's been overwritten. You can see how it's implemented here.
The way you might reasonably expect this to work, as it would if you wrote a similar program in most other languages, is:
sleep is executed in the background via a fork+exec.
At some point, sleep exits leaving behind a zombie.
That zombie remains in place, holding its PID, until its parent calls wait to retrieve its exit code.
However, shells such as bash actually do this a little differently. They proactively reap their zombie children and store their exit codes in memory so that they can deallocate the system resources those processes were using. Then when you wait the shell just hands you whatever value is stored in memory, but the zombie could be long gone by then.
Now, because all of these exit statuses are being stored in memory, there is a practical limit to how many background processes can exit without you calling wait before you've filled up all the memory you have available for this in the shell. I expect that you're hitting this limit somewhere in the several hundreds of processes in your environment, while other users manage to make it into the several thousands in theirs. Regardless, the outcome is the same - eventually there's nowhere to store information about your children and so that information is lost.
I can reproduce on ArchLinux with docker run -ti --rm bash:5.0.18 bash -c 'pids=; for ((i=1;i<550;++i)); do true & pids+=" $!"; done; wait $pids' and any earlier. I can't reproduce with bash:5.1.0 .
What's going on?
It looks like a bug in your version of Bash. There were a couple of improvements in jobs.c and wait.def in Bash:5.1 and Make sure SIGCHLD is blocked in all cases where waitchld() is not called from a signal handler is mentioned in the changelog. From the look of it, it looks like an issue with handling a SIGCHLD signal while already handling another SIGCHLD signal.
This is being done on windows
I am getting error: The process cannot access the file because it is being used by another process. It seems that even after the child is exiting(exit 0) and the parent is waiting for the child to complete (waitpid($lkpid, 0)),the child's subprocesses are not being killed. Hence, when the next iteration (test case) is running, it is finding the process already running, and hence gives the error message.
Code Snippet ($bashexe and $bePath are defined):
my $MSROO = "/home/abc";
if (my $fpid = fork()) {
for (my $i=1; $i<=1200; $i++) {
sleep 1;
if (-e "$MSROO/logs/Complete") {
last;
}
}
elsif (defined ($fpid)) {
&runAndMonitor (\#ForRun, "$MSROO/logs/Test.log"); ### #ForRun has the list of test cases
system("touch $MSROO/logs/Complete");
exit 0;
}
sub runAndMonitor {
my #ForRunPerProduct = #{$_[0]};
my $logFile = $_[1];
foreach my $TestVar (#ForRunPerProduct) {
my $TestVarDirName = $TestVar;
$TestVarDirName = dirname ($TestVarDirName);
my $lkpid;
my $filehandle;
if ( !($pid = open( $filehandle, "-|" , " $bashexe -c \" echo abc \; perl.exe reg_script.pl $TestVarDirName -t wint\" >> $logFile "))) {
die( "Failed to start process: $!" );
}
else {
print "$pid is pid of shell running: $TestVar\n"; ### Issue (error message above) is coming here after piped open is launched for a new test
my $taskInfo=`tasklist | grep "$pid"`;
chomp ($taskInfo);
print "$taskInfo is taskInfo\n";
}
if ($lkpid = fork()) {
sleep 1;
chomp ($lkpid);
LabelToCheck:
my $pidExistingOrNotInParent = kill 0, $pid;
if ($pidExistingOrNotInParent) {
sleep 10;
goto LabelToCheck;
}
}
elsif (defined ($lkpid)) {
sleep 12;
my $pidExistingOrNot = kill 0, $pid;
if ($pidExistingOrNot){
print "$pid still exists\n";
my $taskInfoVar1 =`tasklist | grep "$pid"`;
chomp ($taskInfoVar1);
my $killPID = kill 15, $pid;
print "$killPID is the value of PID\n"; ### Here, I am getting output 1 (value of $killPID). Also, I tried with signal 9, and seeing same behavior
my $taskInfoVar2 =`tasklist | grep "$pid"`;
sleep 10;
exit 0;
}
}
system("TASKKILL /F /T /PID $lkpid") if ($lkpid); ### Here, child pid is not being killed . Saying "ERROR: The process "-1472" not found"
sleep 2;
print "$lkpid is lkpid\n"; ## Here, though I am getting message "-1472 is lkpid"
#waitpid($lkpid, 0);
return;
}
Why is it that even after "exit 0 in child" and then "waitpid in parent", child subprocesses are not being killed? What can be done to fully clean child process and its subprocesses?
The exit doesn't touch child processes; it's not meant to. It just exits the process. In order to shut down its child processes as well you'd need to signal them.†
However, since this is Windows, where fork is merely emulated, here is what perlfork says
Behavior of other Perl features in forked pseudo-processes
...
kill() "kill('KILL', ...)" can be used to terminate a pseudo-process by passing it the ID returned by fork(). The outcome of kill on a
pseudo-process is unpredictable and it should not be used except under dire circumstances, because the operating system may not
guarantee integrity of the process resources when a running thread is terminated
...
exit() exit() always exits just the executing pseudo-process, after automatically wait()-ing for any outstanding child pseudo-processes. Note
that this means that the process as a whole will not exit unless all running pseudo-processes have exited. See below for some
limitations with open filehandles.
So don't do kill, while exit behaves nearly opposite to what you need.
But the Windows command TASKKILL can terminate a process and its tree
system("TASKKILL /F /T /PID $pid");
This should terminate a process with $pid and its children processes. (The command can use a process's name instead, TASKKILL /F /T /IM $name, but using names on a busy modern system, with a lot going on, can be tricky.) See taskkill on MS docs.
A more reliable way about this, altogether, is probably to use dedicated modules for Windows process management.
A few other comments
I also notice that you use pipe-open, while perlfork says for that
Forking pipe open() not yet implemented
The open(FOO, "|-") and open(BAR, "-|") constructs are not yet implemented.
So I am confused, does that pipe-open work in your code? But perlfork continues with
This limitation can be easily worked around in new code by creating a pipe explicitly. The following example shows how to write to a forked child: [full code follows]
That C-style loop, for (my $i=1; $i<=1200; $i++), is better written as
for my $i (1..1200) { ... }
(or foreach, synonyms) A C-style loop is very rarely needed in Perl.
† A kill with a negative signal (name or number) OR process-id generally terminates the whole tree under the signaled process. This is on Linux.
So one way would be to signal that child from its parent when ready, instead of exit-ing from it. (Then the child would have signal the parent in some way when it's ready.)
Or, the child can send a negative terminate signal to all its direct children process, then exit.
You didn't say which perl you are using. On Windows with Strawberry Perl (and presumably Active State), fork() emulation is ... very problematic, (maybe just "broken") as #zdim mentioned. If you want a longer explanation, see Proc::Background::Win32 - Perl Fork Limitations
Meanwhile, if you use Cygwin's Perl, fork works perfectly. This is because Cygwin does a full emulation of Unix fork() semantics, so anything built against cygwin works just like it does on Unix. The downside is that file paths show up weird, like /cygdrive/c/Program Files. This may or may not trip up code you've already written.
But, you might also have confusion about process trees. Even on Unix, killing a parent process does not kill the child processes. This usually happens for various reasons, but it is not enforced. For example, most child processes have a pipe open to the parent, and when the parent exits that pipe closes and then reading/writing the pipe gives SIGPIPE that kills the child. In other cases, the parent catches SIGTERM and then re-broadcasts that to its children before exiting gracefully. In other cases, monitors like Systemd or Docker create a container inherited by all children of the main process, and when the main process exits the monitor kills off everything else in the container.
Since it looks like you're writing your own task monitor, I'll give some advice from one that I wrote for Windows (and is running along happily years later). I ended up with a design using Proc::Background where the parent starts a task that writes to a file as STDOUT/STDERR. Then it opens that same log file and wakes up every few seconds to try reading more of the log file to see what the task is doing, and check with the Proc::Background object to see if the task exited. When the task exits, it appends the exit code and timestamp to the log file. The monitor has a timeout setting that if the child exceeds, it just un-gracefully runs TerminateProcess. (you could improve on that by leaving STDIN open as a pipe between monitor and worker, and then have the worker check STDIN every now and then, but on Windows that will block, so you have to use PeekNamedPipe, which gets messy)
Meanwhile, the monitor parses any new lines of the log file to read status information and send updates to the database. The other parts of the system can watch the database to see the status of background tasks, including a web admin interface that can also open and read the log file. If the monitor sees that a child has run for too long, it can use TerminateProcess to stop it. Missing from this design is any way for the monitor to know when it's being asked to exit, and clean up, which is a notable deficiency, and one you're probably looking for. However, there actually isn't any way to intercept a TerminateProcess aimed at the parent! Windows does have some Message Queue API stuff where you can set up to receive notifications about termination, but I never chased down the full details there. If you do, please come back and drop a comment for me :-)
I'm having python application which needs to execute proprietary application (which crashes from time to time) about 20 000 times a day.
The problem is when application crashes, Windows automatically triggers WerFault which will keep program hanging, thus python's subprocess.call() will wait forever for user input (that application has to run on weekends, on holidays, 24/7... so this is not acceptable).
If though about using sleep; poll; kill; terminate but that would mean losing ability to use communicate(), application can run from few miliseconds to 2 hours, so setting fixed timeout will be ineffective
I also tried turning on automatic debugging (use a script which would take a crash dump of an application and terminate id), but somehow this howto doesn't work on my server (WerFault still appears and waits for user input).
Several other tutorials like this didn't take any effect either.
Question:
is there a way how to prevent WerFault from displaying (waiting for user input)? this is more system then programming question
Alternative question: is there a graceful way in python how to detect application crash (whether WerFault was displayed)
Simple (and ugly) answer, monitor for WerFault.exe instances from time to time, specially the one associated with the PID of the offending application. And kill it. Dealing with WerFault.exe is complicated but you don't want to disable it -- see Windows Error Reporting service.
Get a list of processes by name that match WerFault.exe. I use psutil package. Be careful with psutil because processes are cached, use psutil.get_pid_list().
Decode its command line by using argparse. This might be overkill but it leverages existing python libraries.
Identify the process that is holding your application according to its PID.
This is a simple implementation.
def kill_proc_kidnapper(self, child_pid, kidnapper_name='WerFault.exe'):
"""
Look among all instances of 'WerFault.exe' process for an specific one
that took control of another faulting process.
When 'WerFault.exe' is launched it is specified the PID using -p argument:
'C:\\Windows\\SysWOW64\\WerFault.exe -u -p 5012 -s 68'
| |
+-> kidnapper +-> child_pid
Function uses `argparse` to properly decode process command line and get
PID. If PID matches `child_pid` then we have found the correct parent
process and can kill it.
"""
parser = argparse.ArgumentParser()
parser.add_argument('-u', action='store_false', help='User name')
parser.add_argument('-p', type=int, help='Process ID')
parser.add_argument('-s', help='??')
kidnapper_p = None
child_p = None
for proc in psutil.get_pid_list():
if kidnapper_name in proc.name:
args, unknown_args = parser.parse_known_args(proc.cmdline)
print proc.name, proc.cmdline
if args.p == child_pid:
# We found the kidnapper, aim.
print 'kidnapper found: {0}'.format(proc.pid)
kidnapper_p = proc
if psutil.pid_exists(child_pid):
child_p = psutil.Process(child_pid)
if kidnapper_p and child_pid:
print 'Killing "{0}" ({1}) that kidnapped "{2}" ({3})'.format(
kidnapper_p.name, kidnapper_p.pid, child_p.name, child_p.pid)
self.taskkill(kidnapper_p.pid)
return 1
else:
if not kidnapper_p:
print 'Kidnapper process "{0}" not found'.format(kidnapper_name)
if not child_p:
print 'Child process "({0})" not found'.format(child_pid)
return 0
Now, taskkill function invokes taskkill commmand with correct PID.
def taskkill(self, pid):
"""
Kill task and entire process tree for this process
"""
print('Task kill for PID {0}'.format(pid))
cmd = 'taskkill /f /t /pid {0}'.format(pid)
subprocess.call(cmd.split())
I see no reason as to why your program needs to crash, find the offending piece of code, and put it into a try-statement.
http://docs.python.org/3.2/tutorial/errors.html#handling-exceptions
My scripts cdist-deploy-to and cdist-mass-deploy (from cdist configuration management) run interactively (i.e. are called by a user).
These scripts call a lot of scripts, which again call some scripts:
cdist-mass-deploy ...
cdist-deploy-to ...
cdist-explorer-run-global ...
cdist-dir ....
What I want is to exit / kill all scripts, as soon as cdist-mass-deploy is either stopped by control C (SIGINT) or killed with SIGTERM.
cdist-deploy-to can also be called interactively and should exhibit the same behaviour.
Using ps -ef... and co variants to find out all processes with the ppid looks like it could be quite unportable. Using $! does not work as in the deeper levels the children are no background processes.
I tried using the following code:
__cdist_kill_on_interrupt()
{
__cdist_tmp_removal
kill 0
exit 1
}
trap __cdist_kill_on_interrupt INT TERM
But this leads to ugly Terminated messages as well as to a segfault in the shells (dash, bash, zsh) and seems not to stop everything instantly anyway:
# cdist-mass-deploy -p ikq04.ethz.ch ikq05.ethz.ch
core: Waiting for cdist-deploy-to jobs to finish
^CTerminated
Terminated
Terminated
Terminated
Segmentation fault
So the question is, how to cleanly exit including all (sub-)children in a portable manner (bourne shell, no csh support needed)?
You don't need to handle ^C, that will result in a signal being sent to the whole process group, which will kill all the processes that are not in the background. So you don't need to catch INT.
The only reason you get a Terminated when you kill them is that kill sends TERM by default, but that's reasonable if you are handling a TERM in the first place. You could use kill -INT 0 if you want to avoid the messages.
(responding with extra info)
If the child processes are run in the background, you can get their process ids just after you start them, using the $! special shell variable. Gather these together in a variable and just kill them all when you need to terminate.
Given a very simple ruby script:
child = fork do
system 'sleep 10000'
end
5.times do
sleep 1
puts "send kill to #{child}"
Process.kill("QUIT", child)
end
QUIT signal is just lost. Where does it go? Something with default handler which just ignores it?
How to send signal to all processes created by that fork? Is it possible to do that without searching for all child processes?
The problem is that the system call creates yet another child process running the given command in a subshell, so there are actually three processes running in your example. Additionally, the Ruby Kernel#system command is implemented via the standard C function system(3), which calls fork and exec to create the new process and (on most systems) ignores SIGINT and SIGQUIT, and blocks SIGCHLD.
If you simply call sleep(10000) instead of system("sleep 10000") then things should work as you expect. You can also trap SIGQUIT in the child to handle it gracefully:
child = fork do
Signal.trap("QUIT") { puts "CHILD: ok, quitting time!"; exit }
sleep(10000)
end
If you really need to use a "system" call from the child process then you might be better off using an explicit fork/exec pair (instead of the implicit ones in the system call), so that you can perform your own signal handling in the third forked child.
I think that you are sending signal to fork process corectly. I think that the problem is with the system command. System command creates new fork and waits until it ends and I think that this waiting is blocking your quit signal. If you run your example as test.rb you'll see three processes:
test.rb
test.rb
sleep 10000
If you send signal "TERM" or "KILL" instead of "QUIT" the second test.rb will die but sleep 10000 will continue!