How to launch crashing (rarely) application in subprocess - windows

I'm having python application which needs to execute proprietary application (which crashes from time to time) about 20 000 times a day.
The problem is when application crashes, Windows automatically triggers WerFault which will keep program hanging, thus python's subprocess.call() will wait forever for user input (that application has to run on weekends, on holidays, 24/7... so this is not acceptable).
If though about using sleep; poll; kill; terminate but that would mean losing ability to use communicate(), application can run from few miliseconds to 2 hours, so setting fixed timeout will be ineffective
I also tried turning on automatic debugging (use a script which would take a crash dump of an application and terminate id), but somehow this howto doesn't work on my server (WerFault still appears and waits for user input).
Several other tutorials like this didn't take any effect either.
Question:
is there a way how to prevent WerFault from displaying (waiting for user input)? this is more system then programming question
Alternative question: is there a graceful way in python how to detect application crash (whether WerFault was displayed)

Simple (and ugly) answer, monitor for WerFault.exe instances from time to time, specially the one associated with the PID of the offending application. And kill it. Dealing with WerFault.exe is complicated but you don't want to disable it -- see Windows Error Reporting service.
Get a list of processes by name that match WerFault.exe. I use psutil package. Be careful with psutil because processes are cached, use psutil.get_pid_list().
Decode its command line by using argparse. This might be overkill but it leverages existing python libraries.
Identify the process that is holding your application according to its PID.
This is a simple implementation.
def kill_proc_kidnapper(self, child_pid, kidnapper_name='WerFault.exe'):
"""
Look among all instances of 'WerFault.exe' process for an specific one
that took control of another faulting process.
When 'WerFault.exe' is launched it is specified the PID using -p argument:
'C:\\Windows\\SysWOW64\\WerFault.exe -u -p 5012 -s 68'
| |
+-> kidnapper +-> child_pid
Function uses `argparse` to properly decode process command line and get
PID. If PID matches `child_pid` then we have found the correct parent
process and can kill it.
"""
parser = argparse.ArgumentParser()
parser.add_argument('-u', action='store_false', help='User name')
parser.add_argument('-p', type=int, help='Process ID')
parser.add_argument('-s', help='??')
kidnapper_p = None
child_p = None
for proc in psutil.get_pid_list():
if kidnapper_name in proc.name:
args, unknown_args = parser.parse_known_args(proc.cmdline)
print proc.name, proc.cmdline
if args.p == child_pid:
# We found the kidnapper, aim.
print 'kidnapper found: {0}'.format(proc.pid)
kidnapper_p = proc
if psutil.pid_exists(child_pid):
child_p = psutil.Process(child_pid)
if kidnapper_p and child_pid:
print 'Killing "{0}" ({1}) that kidnapped "{2}" ({3})'.format(
kidnapper_p.name, kidnapper_p.pid, child_p.name, child_p.pid)
self.taskkill(kidnapper_p.pid)
return 1
else:
if not kidnapper_p:
print 'Kidnapper process "{0}" not found'.format(kidnapper_name)
if not child_p:
print 'Child process "({0})" not found'.format(child_pid)
return 0
Now, taskkill function invokes taskkill commmand with correct PID.
def taskkill(self, pid):
"""
Kill task and entire process tree for this process
"""
print('Task kill for PID {0}'.format(pid))
cmd = 'taskkill /f /t /pid {0}'.format(pid)
subprocess.call(cmd.split())

I see no reason as to why your program needs to crash, find the offending piece of code, and put it into a try-statement.
http://docs.python.org/3.2/tutorial/errors.html#handling-exceptions

Related

Perl: Child subprocesses are not being killed when child is being killed

This is being done on windows
I am getting error: The process cannot access the file because it is being used by another process. It seems that even after the child is exiting(exit 0) and the parent is waiting for the child to complete (waitpid($lkpid, 0)),the child's subprocesses are not being killed. Hence, when the next iteration (test case) is running, it is finding the process already running, and hence gives the error message.
Code Snippet ($bashexe and $bePath are defined):
my $MSROO = "/home/abc";
if (my $fpid = fork()) {
for (my $i=1; $i<=1200; $i++) {
sleep 1;
if (-e "$MSROO/logs/Complete") {
last;
}
}
elsif (defined ($fpid)) {
&runAndMonitor (\#ForRun, "$MSROO/logs/Test.log"); ### #ForRun has the list of test cases
system("touch $MSROO/logs/Complete");
exit 0;
}
sub runAndMonitor {
my #ForRunPerProduct = #{$_[0]};
my $logFile = $_[1];
foreach my $TestVar (#ForRunPerProduct) {
my $TestVarDirName = $TestVar;
$TestVarDirName = dirname ($TestVarDirName);
my $lkpid;
my $filehandle;
if ( !($pid = open( $filehandle, "-|" , " $bashexe -c \" echo abc \; perl.exe reg_script.pl $TestVarDirName -t wint\" >> $logFile "))) {
die( "Failed to start process: $!" );
}
else {
print "$pid is pid of shell running: $TestVar\n"; ### Issue (error message above) is coming here after piped open is launched for a new test
my $taskInfo=`tasklist | grep "$pid"`;
chomp ($taskInfo);
print "$taskInfo is taskInfo\n";
}
if ($lkpid = fork()) {
sleep 1;
chomp ($lkpid);
LabelToCheck:
my $pidExistingOrNotInParent = kill 0, $pid;
if ($pidExistingOrNotInParent) {
sleep 10;
goto LabelToCheck;
}
}
elsif (defined ($lkpid)) {
sleep 12;
my $pidExistingOrNot = kill 0, $pid;
if ($pidExistingOrNot){
print "$pid still exists\n";
my $taskInfoVar1 =`tasklist | grep "$pid"`;
chomp ($taskInfoVar1);
my $killPID = kill 15, $pid;
print "$killPID is the value of PID\n"; ### Here, I am getting output 1 (value of $killPID). Also, I tried with signal 9, and seeing same behavior
my $taskInfoVar2 =`tasklist | grep "$pid"`;
sleep 10;
exit 0;
}
}
system("TASKKILL /F /T /PID $lkpid") if ($lkpid); ### Here, child pid is not being killed . Saying "ERROR: The process "-1472" not found"
sleep 2;
print "$lkpid is lkpid\n"; ## Here, though I am getting message "-1472 is lkpid"
#waitpid($lkpid, 0);
return;
}
Why is it that even after "exit 0 in child" and then "waitpid in parent", child subprocesses are not being killed? What can be done to fully clean child process and its subprocesses?
The exit doesn't touch child processes; it's not meant to. It just exits the process. In order to shut down its child processes as well you'd need to signal them.†
However, since this is Windows, where fork is merely emulated, here is what perlfork says
Behavior of other Perl features in forked pseudo-processes
...
kill() "kill('KILL', ...)" can be used to terminate a pseudo-process by passing it the ID returned by fork(). The outcome of kill on a
pseudo-process is unpredictable and it should not be used except under dire circumstances, because the operating system may not
guarantee integrity of the process resources when a running thread is terminated
...
exit() exit() always exits just the executing pseudo-process, after automatically wait()-ing for any outstanding child pseudo-processes. Note
that this means that the process as a whole will not exit unless all running pseudo-processes have exited. See below for some
limitations with open filehandles.
So don't do kill, while exit behaves nearly opposite to what you need.
But the Windows command TASKKILL can terminate a process and its tree
system("TASKKILL /F /T /PID $pid");
This should terminate a process with $pid and its children processes. (The command can use a process's name instead, TASKKILL /F /T /IM $name, but using names on a busy modern system, with a lot going on, can be tricky.) See taskkill on MS docs.
A more reliable way about this, altogether, is probably to use dedicated modules for Windows process management.
A few other comments
I also notice that you use pipe-open, while perlfork says for that
Forking pipe open() not yet implemented
The open(FOO, "|-") and open(BAR, "-|") constructs are not yet implemented.
So I am confused, does that pipe-open work in your code? But perlfork continues with
This limitation can be easily worked around in new code by creating a pipe explicitly. The following example shows how to write to a forked child: [full code follows]
That C-style loop, for (my $i=1; $i<=1200; $i++), is better written as
for my $i (1..1200) { ... }
(or foreach, synonyms) A C-style loop is very rarely needed in Perl.
† A kill with a negative signal (name or number) OR process-id generally terminates the whole tree under the signaled process. This is on Linux.
So one way would be to signal that child from its parent when ready, instead of exit-ing from it. (Then the child would have signal the parent in some way when it's ready.)
Or, the child can send a negative terminate signal to all its direct children process, then exit.
You didn't say which perl you are using. On Windows with Strawberry Perl (and presumably Active State), fork() emulation is ... very problematic, (maybe just "broken") as #zdim mentioned. If you want a longer explanation, see Proc::Background::Win32 - Perl Fork Limitations
Meanwhile, if you use Cygwin's Perl, fork works perfectly. This is because Cygwin does a full emulation of Unix fork() semantics, so anything built against cygwin works just like it does on Unix. The downside is that file paths show up weird, like /cygdrive/c/Program Files. This may or may not trip up code you've already written.
But, you might also have confusion about process trees. Even on Unix, killing a parent process does not kill the child processes. This usually happens for various reasons, but it is not enforced. For example, most child processes have a pipe open to the parent, and when the parent exits that pipe closes and then reading/writing the pipe gives SIGPIPE that kills the child. In other cases, the parent catches SIGTERM and then re-broadcasts that to its children before exiting gracefully. In other cases, monitors like Systemd or Docker create a container inherited by all children of the main process, and when the main process exits the monitor kills off everything else in the container.
Since it looks like you're writing your own task monitor, I'll give some advice from one that I wrote for Windows (and is running along happily years later). I ended up with a design using Proc::Background where the parent starts a task that writes to a file as STDOUT/STDERR. Then it opens that same log file and wakes up every few seconds to try reading more of the log file to see what the task is doing, and check with the Proc::Background object to see if the task exited. When the task exits, it appends the exit code and timestamp to the log file. The monitor has a timeout setting that if the child exceeds, it just un-gracefully runs TerminateProcess. (you could improve on that by leaving STDIN open as a pipe between monitor and worker, and then have the worker check STDIN every now and then, but on Windows that will block, so you have to use PeekNamedPipe, which gets messy)
Meanwhile, the monitor parses any new lines of the log file to read status information and send updates to the database. The other parts of the system can watch the database to see the status of background tasks, including a web admin interface that can also open and read the log file. If the monitor sees that a child has run for too long, it can use TerminateProcess to stop it. Missing from this design is any way for the monitor to know when it's being asked to exit, and clean up, which is a notable deficiency, and one you're probably looking for. However, there actually isn't any way to intercept a TerminateProcess aimed at the parent! Windows does have some Message Queue API stuff where you can set up to receive notifications about termination, but I never chased down the full details there. If you do, please come back and drop a comment for me :-)

Ruby force close IO.popen java process

How can I force quit/kill a java process that was stared with IO.popen("command", "r+")?
I am running a script a small java program from ruby doing the following:
pipe = IO.popen("nice -n 19 java -Xmx2g -Djava.awt.headless=true -jar java_program.jar", 'r+')
Then I use stdio to send arguments back and forth, like this:
pipe.puts "data"
result = pipe.gets
This works fine for most data I send, but for some the java process seems to lock up or something, and I would like to force close / kill the java process.
I am currently doing the following, which does not seem to kill the java process (from another thread which watches over this stuff):
thepid = nil
thepid = pipe.pid if pipe.respond_to?(:pid)
pipe.puts('') if pipe.respond_to?(:puts) #This is the java_program's approach to closing, writing empty string to stdio will cause it to start it's shutdown procedure.
pipe.close if pipe.respond_to?(:close)
Process.kill('KILL', thepid) if thepid && Process.getpgid( thepid )
The java process lingers and refuses to die. What can I do to actually force the process to exit (it uses lots of ram :( )
Also: Is there a cross platform way of doing this?
What you may be seeing here is that you're killing the nice process and not the java process it launches.
You could avoid this by launching the java process directly and then altering the nice level using renice on the PID you get.
It's also worth checking that you're trying to kill the correct process. As you point out, spawning a second instance by accident would mean you're killing the wrong one.
A tool like popen3 allows for a lot more control over the child process and gives you the ability to feed input and capture output directly.

How can I create a process in Bash that has zero overhead but which gives me a process ID

For those of you who know what you're talking about I apologise for butchering the way that I'm going to phrase this question. I know nothing about bash whatsoever. With that caveat out of the way, let me get out my cleaver...
I am building a Rails app which has what's called a procfile which sets up any processes that need to be run in different environments
e.g.
web: bundle exec unicorn -p $PORT -c ./config/unicorn.rb
redis: redis-server
worker: bundle exec sidekiq
proxylocal: bin/proxylocal_local
Each one of these lines specs a process to be run. It also expects a pid to be returned after the process spins up. The syntax is
process_name: process_invokation_script
However the last process, proxylocal, only actually starts a process in development. In production it doesn't do anything.
Unfortunately that causes the Procfile to choke as it needs a process ID returned. So is there some super-simple, zero-overhead process that I can spawn in that case just to keep the procfile happy?
The sleep command does nothing for a specified period of time, with very low overhead. Give it an argument longer than your code will run.
For example
sleep 2147483647
does nothing for 231-1 seconds, just over 68 years. I picked that number because any reasonable implementation of sleep should be able to handle it.
In the unlikely event that that doesn't work (say if you're on an old 16-bit system that can't sleep for more than 216-1 seconds), you can do a sleep in an infinite loop:
sh -c 'while : ; do sleep 30000 ; done'
This assumes that you need the process to run for a very long time; that depends on what your application needs to do with the process ID. If it's required to be unique as long as the application is running, you need something that will continue to run for a long time; if the process terminates, its PID can be re-used by another process.
If that's not a requirement, you can use sleep 0 or true, which will terminate immediately.
If you need to give the application a little time to get the process ID before the process terminates, something like sleep 10 or even sleep 1 might work, though determining just how long it needs to run can be tricky and error-prone.
If Heroku isn't doing anything with proxylocal I'm not sure why you'd even want this in your Procifle. I'm also a bit confused about whether you want to change the Procfile or what bin/proxylocal_local does and how you would even do that.
That being said, if you are able to do anything you like for production your script can just call cat and it will create a pid and then just sit waiting for the next command (which never comes).
For truly minimal overhead, you don't want to run any external commands. When the shell starts a command, it first forks itself, then the child shell execs the external command. If the forked child can run a builtin, you can skip the exec.
Start by creating a read-only fifo somewhere.
mkfifo foo
chmod 400 foo
Then, whenever you need a do-nothing process, just fork a shell which tries to read from the fifo. It's read-only, so no one can write to it, so all reads will block.
read < foo &

Get process status by pid in Ruby

Is there a way to get a process's child process status based on its PID in Ruby?
For example, in Python you can do psutil.Process(pid).status
I don't know of a portable ruby method to get process state of a running process. You can do Process.wait and check $?.exitstatus, but that doesn't look like what you want. For a posix solution, you could use
`ps -o state -p #{pid}`.chomp
to get the letter code ps produces for process state
PROCESS STATE CODES
Here are the different values that the s, stat and state output specifiers
(header "STAT" or "S") will display to describe the state of a process.
D Uninterruptible sleep (usually IO)
R Running or runnable (on run queue)
S Interruptible sleep (waiting for an event to complete)
T Stopped, either by a job control signal or because it is being traced.
W paging (not valid since the 2.6.xx kernel)
X dead (should never be seen)
Z Defunct ("zombie") process, terminated but not reaped by its parent.
I was looking for the same thing. It's a shame ProcessStatus doesn't seem to be able to get initialized from a live pid. This is vital stuff if you want to do anything like a safe timed kill of a child process.
In any case,
it's the second line in /proc/$pid/status if you're on Linux.:
status_line = File.open("/proc/#{pid}/status") {|f| f.gets; f.gets }
Most likely much much faster than anything involving an external program.
On OS X, I setup a string:
outputstring="ps -O=S -p #{mypid}"
then execute it in a %x call:
termoutput=%x[#{outputstring}]
I can display that if needed, or just keep the output clean and act on the State I found with the call.

Can a standalone ruby script (windows and mac) reload and restart itself?

I have a master-workers architecture where the number of workers is growing on a weekly basis. I can no longer be expected to ssh or remote console into each machine to kill the worker, do a source control sync, and restart. I would like to be able to have the master place a message out on the network that tells each machine to sync and restart.
That's where I hit a roadblock. If I were using any sane platform, I could just do:
exec('ruby', __FILE__)
...and be done. However, I did the following test:
p Process.pid
sleep 1
exec('ruby', __FILE__)
...and on Windows, I get one ruby instance for each call to exec. None of them die until I hit ^C on the window in question. On every platform I tried this on, it is executing the new version of the file each time, which I have verified this by making simple edits to the test script while the test marched along.
The reason I'm printing the pid is to double-check the behavior I'm seeing. On windows, I am getting a different pid with each execution - which I would expect, considering that I am seeing a new process in the task manager for each run. The mac is behaving correctly: the pid is the same for every system call and I have verified with dtrace that each run is trigging a call to the execve syscall.
So, in short, is there a way to get a windows ruby script to restart its execution so it will be running any code - including itself - that has changed during its execution? Please note that this is not a rails application, though it does use activerecord.
After trying a number of solutions (including the one submitted by Byron Whitlock, which ultimately put me onto the path to a satisfactory end) I settled upon:
IO.popen("start cmd /C ruby.exe #{$0} #{ARGV.join(' ')}")
sleep 5
I found that if I didn't sleep at all after the popen, and just exited, the spawn would frequently (>50% of the time) fail. This is not cross-platform obviously, so in order to have the same behavior on the mac:
IO.popen("xterm -e \"ruby blah blah blah\"&")
The classic way to restart a program is to write another one that does it for you. so you spawn a process to restart.exe <args>, then die or exit; restart.exe waits until the calling script is no longer running, then starts the script again.

Resources