Perl: Child subprocesses are not being killed when child is being killed - windows

This is being done on windows
I am getting error: The process cannot access the file because it is being used by another process. It seems that even after the child is exiting(exit 0) and the parent is waiting for the child to complete (waitpid($lkpid, 0)),the child's subprocesses are not being killed. Hence, when the next iteration (test case) is running, it is finding the process already running, and hence gives the error message.
Code Snippet ($bashexe and $bePath are defined):
my $MSROO = "/home/abc";
if (my $fpid = fork()) {
for (my $i=1; $i<=1200; $i++) {
sleep 1;
if (-e "$MSROO/logs/Complete") {
last;
}
}
elsif (defined ($fpid)) {
&runAndMonitor (\#ForRun, "$MSROO/logs/Test.log"); ### #ForRun has the list of test cases
system("touch $MSROO/logs/Complete");
exit 0;
}
sub runAndMonitor {
my #ForRunPerProduct = #{$_[0]};
my $logFile = $_[1];
foreach my $TestVar (#ForRunPerProduct) {
my $TestVarDirName = $TestVar;
$TestVarDirName = dirname ($TestVarDirName);
my $lkpid;
my $filehandle;
if ( !($pid = open( $filehandle, "-|" , " $bashexe -c \" echo abc \; perl.exe reg_script.pl $TestVarDirName -t wint\" >> $logFile "))) {
die( "Failed to start process: $!" );
}
else {
print "$pid is pid of shell running: $TestVar\n"; ### Issue (error message above) is coming here after piped open is launched for a new test
my $taskInfo=`tasklist | grep "$pid"`;
chomp ($taskInfo);
print "$taskInfo is taskInfo\n";
}
if ($lkpid = fork()) {
sleep 1;
chomp ($lkpid);
LabelToCheck:
my $pidExistingOrNotInParent = kill 0, $pid;
if ($pidExistingOrNotInParent) {
sleep 10;
goto LabelToCheck;
}
}
elsif (defined ($lkpid)) {
sleep 12;
my $pidExistingOrNot = kill 0, $pid;
if ($pidExistingOrNot){
print "$pid still exists\n";
my $taskInfoVar1 =`tasklist | grep "$pid"`;
chomp ($taskInfoVar1);
my $killPID = kill 15, $pid;
print "$killPID is the value of PID\n"; ### Here, I am getting output 1 (value of $killPID). Also, I tried with signal 9, and seeing same behavior
my $taskInfoVar2 =`tasklist | grep "$pid"`;
sleep 10;
exit 0;
}
}
system("TASKKILL /F /T /PID $lkpid") if ($lkpid); ### Here, child pid is not being killed . Saying "ERROR: The process "-1472" not found"
sleep 2;
print "$lkpid is lkpid\n"; ## Here, though I am getting message "-1472 is lkpid"
#waitpid($lkpid, 0);
return;
}
Why is it that even after "exit 0 in child" and then "waitpid in parent", child subprocesses are not being killed? What can be done to fully clean child process and its subprocesses?

The exit doesn't touch child processes; it's not meant to. It just exits the process. In order to shut down its child processes as well you'd need to signal them.†
However, since this is Windows, where fork is merely emulated, here is what perlfork says
Behavior of other Perl features in forked pseudo-processes
...
kill() "kill('KILL', ...)" can be used to terminate a pseudo-process by passing it the ID returned by fork(). The outcome of kill on a
pseudo-process is unpredictable and it should not be used except under dire circumstances, because the operating system may not
guarantee integrity of the process resources when a running thread is terminated
...
exit() exit() always exits just the executing pseudo-process, after automatically wait()-ing for any outstanding child pseudo-processes. Note
that this means that the process as a whole will not exit unless all running pseudo-processes have exited. See below for some
limitations with open filehandles.
So don't do kill, while exit behaves nearly opposite to what you need.
But the Windows command TASKKILL can terminate a process and its tree
system("TASKKILL /F /T /PID $pid");
This should terminate a process with $pid and its children processes. (The command can use a process's name instead, TASKKILL /F /T /IM $name, but using names on a busy modern system, with a lot going on, can be tricky.) See taskkill on MS docs.
A more reliable way about this, altogether, is probably to use dedicated modules for Windows process management.
A few other comments
I also notice that you use pipe-open, while perlfork says for that
Forking pipe open() not yet implemented
The open(FOO, "|-") and open(BAR, "-|") constructs are not yet implemented.
So I am confused, does that pipe-open work in your code? But perlfork continues with
This limitation can be easily worked around in new code by creating a pipe explicitly. The following example shows how to write to a forked child: [full code follows]
That C-style loop, for (my $i=1; $i<=1200; $i++), is better written as
for my $i (1..1200) { ... }
(or foreach, synonyms) A C-style loop is very rarely needed in Perl.
† A kill with a negative signal (name or number) OR process-id generally terminates the whole tree under the signaled process. This is on Linux.
So one way would be to signal that child from its parent when ready, instead of exit-ing from it. (Then the child would have signal the parent in some way when it's ready.)
Or, the child can send a negative terminate signal to all its direct children process, then exit.

You didn't say which perl you are using. On Windows with Strawberry Perl (and presumably Active State), fork() emulation is ... very problematic, (maybe just "broken") as #zdim mentioned. If you want a longer explanation, see Proc::Background::Win32 - Perl Fork Limitations
Meanwhile, if you use Cygwin's Perl, fork works perfectly. This is because Cygwin does a full emulation of Unix fork() semantics, so anything built against cygwin works just like it does on Unix. The downside is that file paths show up weird, like /cygdrive/c/Program Files. This may or may not trip up code you've already written.
But, you might also have confusion about process trees. Even on Unix, killing a parent process does not kill the child processes. This usually happens for various reasons, but it is not enforced. For example, most child processes have a pipe open to the parent, and when the parent exits that pipe closes and then reading/writing the pipe gives SIGPIPE that kills the child. In other cases, the parent catches SIGTERM and then re-broadcasts that to its children before exiting gracefully. In other cases, monitors like Systemd or Docker create a container inherited by all children of the main process, and when the main process exits the monitor kills off everything else in the container.
Since it looks like you're writing your own task monitor, I'll give some advice from one that I wrote for Windows (and is running along happily years later). I ended up with a design using Proc::Background where the parent starts a task that writes to a file as STDOUT/STDERR. Then it opens that same log file and wakes up every few seconds to try reading more of the log file to see what the task is doing, and check with the Proc::Background object to see if the task exited. When the task exits, it appends the exit code and timestamp to the log file. The monitor has a timeout setting that if the child exceeds, it just un-gracefully runs TerminateProcess. (you could improve on that by leaving STDIN open as a pipe between monitor and worker, and then have the worker check STDIN every now and then, but on Windows that will block, so you have to use PeekNamedPipe, which gets messy)
Meanwhile, the monitor parses any new lines of the log file to read status information and send updates to the database. The other parts of the system can watch the database to see the status of background tasks, including a web admin interface that can also open and read the log file. If the monitor sees that a child has run for too long, it can use TerminateProcess to stop it. Missing from this design is any way for the monitor to know when it's being asked to exit, and clean up, which is a notable deficiency, and one you're probably looking for. However, there actually isn't any way to intercept a TerminateProcess aimed at the parent! Windows does have some Message Queue API stuff where you can set up to receive notifications about termination, but I never chased down the full details there. If you do, please come back and drop a comment for me :-)

Related

Why does bash "forget" about my background processes?

I have this code:
#!/bin/bash
pids=()
for i in $(seq 1 999); do
sleep 1 &
pids+=( "$!" )
done
for pid in "${pids[#]}"; do
wait "$pid"
done
I expect the following behavior:
spin through the first loop
wait about a second on the first pid
spin through the second loop
Instead, I get this error:
./foo.sh: line 8: wait: pid 24752 is not a child of this shell
(repeated 171 times with different pids)
If I run the script with shorter loop (50 instead of 999), then I get no errors.
What's going on?
Edit: I am using GNU bash 4.4.23 on Windows.
POSIX says:
The implementation need not retain more than the {CHILD_MAX} most recent entries in its list of known process IDs in the current shell execution environment.
{CHILD_MAX} here refers to the maximum number of simultaneous processes allowed per user. You can get the value of this limit using the getconf utility:
$ getconf CHILD_MAX
13195
Bash stores the statuses of at most twice as that many exited background processes in a circular buffer, and says not a child of this shell when you call wait on the PID of an old one that's been overwritten. You can see how it's implemented here.
The way you might reasonably expect this to work, as it would if you wrote a similar program in most other languages, is:
sleep is executed in the background via a fork+exec.
At some point, sleep exits leaving behind a zombie.
That zombie remains in place, holding its PID, until its parent calls wait to retrieve its exit code.
However, shells such as bash actually do this a little differently. They proactively reap their zombie children and store their exit codes in memory so that they can deallocate the system resources those processes were using. Then when you wait the shell just hands you whatever value is stored in memory, but the zombie could be long gone by then.
Now, because all of these exit statuses are being stored in memory, there is a practical limit to how many background processes can exit without you calling wait before you've filled up all the memory you have available for this in the shell. I expect that you're hitting this limit somewhere in the several hundreds of processes in your environment, while other users manage to make it into the several thousands in theirs. Regardless, the outcome is the same - eventually there's nowhere to store information about your children and so that information is lost.
I can reproduce on ArchLinux with docker run -ti --rm bash:5.0.18 bash -c 'pids=; for ((i=1;i<550;++i)); do true & pids+=" $!"; done; wait $pids' and any earlier. I can't reproduce with bash:5.1.0 .
What's going on?
It looks like a bug in your version of Bash. There were a couple of improvements in jobs.c and wait.def in Bash:5.1 and Make sure SIGCHLD is blocked in all cases where waitchld() is not called from a signal handler is mentioned in the changelog. From the look of it, it looks like an issue with handling a SIGCHLD signal while already handling another SIGCHLD signal.

How to launch crashing (rarely) application in subprocess

I'm having python application which needs to execute proprietary application (which crashes from time to time) about 20 000 times a day.
The problem is when application crashes, Windows automatically triggers WerFault which will keep program hanging, thus python's subprocess.call() will wait forever for user input (that application has to run on weekends, on holidays, 24/7... so this is not acceptable).
If though about using sleep; poll; kill; terminate but that would mean losing ability to use communicate(), application can run from few miliseconds to 2 hours, so setting fixed timeout will be ineffective
I also tried turning on automatic debugging (use a script which would take a crash dump of an application and terminate id), but somehow this howto doesn't work on my server (WerFault still appears and waits for user input).
Several other tutorials like this didn't take any effect either.
Question:
is there a way how to prevent WerFault from displaying (waiting for user input)? this is more system then programming question
Alternative question: is there a graceful way in python how to detect application crash (whether WerFault was displayed)
Simple (and ugly) answer, monitor for WerFault.exe instances from time to time, specially the one associated with the PID of the offending application. And kill it. Dealing with WerFault.exe is complicated but you don't want to disable it -- see Windows Error Reporting service.
Get a list of processes by name that match WerFault.exe. I use psutil package. Be careful with psutil because processes are cached, use psutil.get_pid_list().
Decode its command line by using argparse. This might be overkill but it leverages existing python libraries.
Identify the process that is holding your application according to its PID.
This is a simple implementation.
def kill_proc_kidnapper(self, child_pid, kidnapper_name='WerFault.exe'):
"""
Look among all instances of 'WerFault.exe' process for an specific one
that took control of another faulting process.
When 'WerFault.exe' is launched it is specified the PID using -p argument:
'C:\\Windows\\SysWOW64\\WerFault.exe -u -p 5012 -s 68'
| |
+-> kidnapper +-> child_pid
Function uses `argparse` to properly decode process command line and get
PID. If PID matches `child_pid` then we have found the correct parent
process and can kill it.
"""
parser = argparse.ArgumentParser()
parser.add_argument('-u', action='store_false', help='User name')
parser.add_argument('-p', type=int, help='Process ID')
parser.add_argument('-s', help='??')
kidnapper_p = None
child_p = None
for proc in psutil.get_pid_list():
if kidnapper_name in proc.name:
args, unknown_args = parser.parse_known_args(proc.cmdline)
print proc.name, proc.cmdline
if args.p == child_pid:
# We found the kidnapper, aim.
print 'kidnapper found: {0}'.format(proc.pid)
kidnapper_p = proc
if psutil.pid_exists(child_pid):
child_p = psutil.Process(child_pid)
if kidnapper_p and child_pid:
print 'Killing "{0}" ({1}) that kidnapped "{2}" ({3})'.format(
kidnapper_p.name, kidnapper_p.pid, child_p.name, child_p.pid)
self.taskkill(kidnapper_p.pid)
return 1
else:
if not kidnapper_p:
print 'Kidnapper process "{0}" not found'.format(kidnapper_name)
if not child_p:
print 'Child process "({0})" not found'.format(child_pid)
return 0
Now, taskkill function invokes taskkill commmand with correct PID.
def taskkill(self, pid):
"""
Kill task and entire process tree for this process
"""
print('Task kill for PID {0}'.format(pid))
cmd = 'taskkill /f /t /pid {0}'.format(pid)
subprocess.call(cmd.split())
I see no reason as to why your program needs to crash, find the offending piece of code, and put it into a try-statement.
http://docs.python.org/3.2/tutorial/errors.html#handling-exceptions

perl alarm with subprocess

I have a perl script that runs a series of batch scripts for regression testing. I want to implement a timeout on the batch scripts. I currently have the following code.
my $pid = open CMD, "$cmd 2>&1 |";
eval {
# setup the alarm
local $SIG{ALRM} = sub { die "alarm\n" };
# alarm on the timeout
alarm $MAX_TIMEOUT;
log_output("setting alarm to $MAX_TIMEOUT\n");
# run our exe
while( <CMD> ) {
$$out_ref .= $_;
}
$timeRemaining = alarm 0;
};
if ($#) {
#catch the alarm, kill the executable
}
The problem is that no matter what I set the max timeout to, the alarm is never tripped. I've tried using Perl::Unsafe::Signals but that did not help.
Is this the best way to execute the batch scripts if I want to be able to capture their output? Is there another way that would do the same thing that would allow me to use alarms, or is there another method besides alarms to timeout the program?
I have built a test script to confirm that alarm works on with my perl and windows version, but it does not work when I run a command like this.
I'm running this with activeperl 5.10.1 on windows 7 x64.
It's hard to tell when alarm will work, when a system call will and won't get interrupted by a SIGALRM, how the same code might behave differently on different operating systems, etc.
If your job times out, you want to kill the subprocess you have started. This is a good use case for the poor man's alarm:
my $pid = open CMD, "$cmd 2>&1 |";
my $time = $MAX_TIMEOUT;
my $poor_mans_alarm = "sleep 1,kill(0,$pid)||exit for 1..$time;kill -9,$pid";
if (fork() == 0) {
exec($^X, "-e", $poor_mans_alarm);
die "Poor man's alarm failed to start"; # shouldn't get here
}
# on Windows, instead of fork+exec, you can say
# system 1, qq[$^X -e "$poor_mans_alarm"]
...
The poor man's alarm runs in a separate process. Every second, it checks whether the process with identifier $pid is still alive. If the process isn't alive, the alarm process exits. If the process is still alive after $time seconds, it sends a kill signal to the process (I used 9 to make it untrappable and -9 to take out the whole subprocess tree, your needs may vary).
(The exec actually may not be necessary. I use it because I also use this idiom to monitor processes that might outlive the Perl script that launched them. Since that wouldn't be the case with this problem, you could skip the exec call and say
if (fork() == 0) {
for (1..$time) { sleep 1; kill(0,$pid) || exit }
kill -9, $pid;
exit;
}
instead.)

Ruby run two processes, output results to terminal, and end safely

In Ruby, I'm running a system("command here") that is constantly watching changes for files, similar to tail. I'd like my program to continue to run and not halt at the system() call. Is there a way in Ruby to create another process so both can run independently, output results to the terminal, and then when you exit the program all processes the application created are removed?
Just combine spawn and waitall:
spawn 'sleep 6'
spawn 'sleep 8'
Process.waitall
You don't want to use system as that waits for the process to complete. You could use spawn instead and then wait for the processes (to avoid zombies). Then, when you want to exit, send a SIGTERM to your spawned processes. You could also use fork to launch your child processes but spawn is probably easier if you're using external programs.
You could also use process groups instead of tracking all the process IDs, then a single Process.kill('TERM', -process_group_id) call would take care of things. Your child processes should end up in the same process group but there is Process.setpgid if you need it.
Here's an example that uses fork (easier to get it all wrapped in one package that way).
def launch(id, sleep_for)
pid = Process.fork do
while(true)
puts "#{id}, pgid = #{Process.getpgid(Process.pid())}, pid = #{Process.pid()}"
sleep(sleep_for)
end
end
# No zombie processes please.
Process.wait(pid, Process::WNOHANG)
pid
end
# These just forward the signals to the whole process group and
# then immediately exit.
pgid = Process.getpgid(Process.pid())
Signal.trap('TERM') { Process.kill('TERM', -pgid); exit }
Signal.trap('INT' ) { Process.kill('INT', -pgid); exit }
launch('a', 5)
launch('b', 3)
while(true)
puts "p, pgid = #{Process.getpgid(Process.pid())}, pid = #{Process.pid()}"
sleep 2
end
If you run that in one terminal and then kill it from another (using the shell's kill command)you'll see that the children are also killed. If you remove the "forward this signal to the whole process group" Signal.trap stuff, then a simple SIGTERM will leave the children still running.
All of this assumes that you're working on some sort of Unixy system (such as Linux or OSX), YMMV anywhere else.
One more vote for using Spawn. We use it in Production a lot and it's very stable.

How to kill all children of the current shell on interrupt?

My scripts cdist-deploy-to and cdist-mass-deploy (from cdist configuration management) run interactively (i.e. are called by a user).
These scripts call a lot of scripts, which again call some scripts:
cdist-mass-deploy ...
cdist-deploy-to ...
cdist-explorer-run-global ...
cdist-dir ....
What I want is to exit / kill all scripts, as soon as cdist-mass-deploy is either stopped by control C (SIGINT) or killed with SIGTERM.
cdist-deploy-to can also be called interactively and should exhibit the same behaviour.
Using ps -ef... and co variants to find out all processes with the ppid looks like it could be quite unportable. Using $! does not work as in the deeper levels the children are no background processes.
I tried using the following code:
__cdist_kill_on_interrupt()
{
__cdist_tmp_removal
kill 0
exit 1
}
trap __cdist_kill_on_interrupt INT TERM
But this leads to ugly Terminated messages as well as to a segfault in the shells (dash, bash, zsh) and seems not to stop everything instantly anyway:
# cdist-mass-deploy -p ikq04.ethz.ch ikq05.ethz.ch
core: Waiting for cdist-deploy-to jobs to finish
^CTerminated
Terminated
Terminated
Terminated
Segmentation fault
So the question is, how to cleanly exit including all (sub-)children in a portable manner (bourne shell, no csh support needed)?
You don't need to handle ^C, that will result in a signal being sent to the whole process group, which will kill all the processes that are not in the background. So you don't need to catch INT.
The only reason you get a Terminated when you kill them is that kill sends TERM by default, but that's reasonable if you are handling a TERM in the first place. You could use kill -INT 0 if you want to avoid the messages.
(responding with extra info)
If the child processes are run in the background, you can get their process ids just after you start them, using the $! special shell variable. Gather these together in a variable and just kill them all when you need to terminate.

Resources