On Linux, Pid namespaces can be used to robustly kill all descendent (including orphaned & zombie) processes – see this answer for example.
What's the closest to a "robust" way to do the same on macOS? I can't rely on process groups unfortunately as some of the descendent processes alter them.
It's a gross kludge, but it might work: The first process would open a file descriptor so that, by default, all descendant processes inherit it. When it wants to kill them all, it runs lsof to find all processes with that file open and kills them.
It won't work for processes which have detached themselves, but you could walk the child process tree using proc_listchildpids() and send signals to each PID you obtain. There are probably some timing edge cases between checking a process's children and killing it - it could spawn more processes in this time. You could perhaps suspend all processes before listing their children and killing them. Processes whose parent has died should I think be reattached to their grandparent anyway though (I may be wrong on this) so in that case, as long as you keep calling proc_listchildpids() after sending each round of signals you should eventually end up in a steady state. (Ideally with no child processes left. But if they get into a really bad state [due to a kernel bug], some processes may be completely unkillable.)
proc_listchildpids() is declared in <libproc/libproc.h>.
Related
I am looking for a cross platform (various flavours of Unix, including Linux) to find and kill all processes spawned by my program. For Linux, I can walk to /proc to obtain this information, and I am sure I can find somethinf similar for OS X and *BSD. But I'd prefer if there were a standard library for this.
Background: I am writing a custom job schedular which needs to terminate the jobs that don't complete within a given period of time. Simply killing (SIGTERM, followed by SIGKILL- if the formar is not ignored) the child process works fine when the job doesn't spawn any other process or handles SIGTERM properly and takes care of the cleanup. But I don't control the jobs - and I know at least some that are poorly written. In the latter case, the system is left with a bunch of orphaned process which keep holdin on to certain resouces and cause all sorts of problems.
Any pointer to libraries or some cross platform way of doing this would be welcome.
A colleague insists that I need to call wait() after using & in a Bash script to spawn multiple child processes. I believe that the concern is that because the parent process is exiting before the child processes do, they'll be orphaned and left in zombie states.
I understand that fork() requires wait() or waitpid() to properly delete the file descriptors it creates. However, is this really needed for Bash? Isn't this something the bash subshell each child process is running in takes care of? In my own experimentation, I have not been able to recreate a situation in which a Bash child process I've created is left in a zombie state.
Processes whose parents die are reparented to init, which should eventually reap them when they exit. What causes zombie processes is when the parent process keeps living, but never gets around to reaping the child for some reason.
I'm refactoring a bit of concurrent processing in my Ruby on Rails server (running on Linux) to use Spawn. Spawn::fork_it documentation claims that forked processes can still be waited on after being detached: https://github.com/tra/spawn/blob/master/lib/spawn.rb (line 186):
# detach from child process (parent may still wait for detached process if they wish)
Process.detach(child)
However, the Ruby Process::detach documentation says you should not do this: http://www.ruby-doc.org/core/classes/Process.html
Some operating systems retain the status of terminated child processes until the parent collects that status (normally using some variant of wait(). If the parent never collects this status, the child stays around as a zombie process. Process::detach prevents this by setting up a separate Ruby thread whose sole job is to reap the status of the process pid when it terminates. Use detach only when you do not intent to explicitly wait for the child to terminate.
Yet Spawn::wait effectively allows you to do just that by wrapping Process::wait. On a side note, I specifically want to use the Process::waitpid2 method to wait on the child processes, instead of using the Spawn::wait method.
Will detach-and-wait not work correctly on Linux? I'm concerned that this may cause a race condition between the detached reaper thread and the waiting parent process, as to who collects the child status first.
The answer to this question is there in the documentation. Are you writing code for your own use in a controlled environment? Or to be used widely by third parties? Ruby is written to be widely used by third parties, so their recommendation is to not do something that could fail on "some operating systems". Perhaps the Spawn library is designed primarily for use on Linux machines and tested only on a small subset thereof where this tactic works.
If you're distributing the code you're writing to be used by anyone and everyone, I would take Ruby's approach.
If you control the environment where this code will be run, I would write two tests:
A test that spawns a process, detaches it and then waits for it.
A test that spawns a process and then just waits for it.
Count the failure rate for both and if they are equal (within a margin that you feel is acceptable), go for it!
I've just wrote a program that forks one process. The child process just displays "HI" 200 times. The father process just says he's the father.
I've printed out both pids.
When I run my program multiple times, I see that the parent's pid stays the same, which is normal. What I don't understand is why the child's pid keeps getting incremented by 2, and exactly 2.
My question: Is this the standard method of pid generation in Ubuntu? Incrementing by 2?
PIDs happen to be handed out monotonically increasing in Linux 2.6, but why does it matter which you get? Don't rely on any specific behavior. If there is a skip of +2 it might simply be because another process happened to spawn a child. Or because +1 would have reached a PID that is already in use.
Found a reference here saying that vfork() consumes a pid as a byproduct of its operation. As well, in some cases, if you're forking from a shell script, the fork might spawn a new shell before your actual script gets involved, which would also consume a pid.
I'd suggest suspending your program between a couple forks, and see if there's another process occupying those "missing" pids.
I would like to run arbitrary console-based sub-processes and manage them from a single master process. The console based sub-processes communicate via stdin, stdout and stderr, and if you run them in a genuine console they terminate cleanly when you press CTRL+C. Some of them may in fact be a tree of processes, such as a batch script that runs an executable which may in turn run another executable to do some work. I would like to redirect their standard I/O (for example, so that I can show their output in a GUI window) and in certain circumstances to send them a CTRL+C event so that they will give up and terminate cleanly.
The following two diagrams show first the normal structure - one master process has four worker sub-processes, and some of those workers have their own subprocesses; and then what should happen when one of the workers needs to be stopped - it and all of its children should get the CTRL+C event, but no other processes should receive the CTRL+C event.
(source: livejournal.com)
Additionally, I would much prefer that there are no extra windows visible to the user.
Here's what I've tried (note that I'm working in Python, but solutions for C would still be helpful):
Spawning an extra intermediate process with CREATE_NEW_CONSOLE, and then having it spawn the worker process. Then have it call GenerateConsoleCtrlEvent(CTRL_C_EVENT, 0) when we want to kill the worker. Unfortunately, CREATE_NEW_CONSOLE seems to prevent me from redirecting the standard I/O channels, so I'm left with no easy way to get the output back to the main program.
Spawning an extra intermediate process with CREATE_NEW_PROCESS_GROUP, and then having it spawn the worker process. Then have it call GenerateConsoleCtrlEvent(CTRL_C_EVENT, 0) when we want to kill the worker. Somehow, this manages to send the CTRL+C only to the master process, which is completely useless. On closer inspection, GenerateConsoleCtrlEvent says that CTRL+C cannot be sent to process groups.
Spawning the subprocess with CREATE_NEW_PROCESS_GROUP. Then call GenerateConsoleCtrlEvent(CTRL_BREAK_EVENT, pid) to kill the worker. This is not ideal, because CTRL+BREAK is less friendly than CTRL+C and will probably result in a messier termination. (E.g. if it's a Python process, no KeyboardInterrupt can be caught and no finally blocks run.)
Is there any good way to do what I want? I can see that I could theoretically build on the first attempt and find some other way to communicate between the processes, but I am worried it will turn out to be extremely awkward. Are there good examples of other programs that achieve the same effect? It seems so simple that it can't be all that uncommon a requirement.
I don't know about managing/redirecting stdin et. al., but for managing the subprocess tree
have you considered using the Windows Job Objects api?
There are several other questions about managing process trees (How do I automatically destroy child processes in Windows? Performing equivalent of “Kill Process Tree” in c++ on windows) and it looks like the cleanest method if you can use it.
Chapter 5 of Windows Via C/C++ by Jeffery Richter has a good discussion on using CreateJobObject and the related APIs.