Best way to handle SIGKILL in Linux kernel - linux-kernel

I'm writing a syscall in Linux 3.0, and while I wait for some event to occur (using a waitqueue), I would like to check for a pending SIGKILL and if one occurs, I would like for the current task to die as soon as possible. As far as I can tell, as soon as I return from the syscall (well, really: as soon as the process is to enter into user mode) returns, the kernel checks for pending signals and upon seeing the SIGKILL, the kernel will kill current before it returns to user mode.
Question: Is my above assumption correct about how SIGKILL works? My other option is to see that the fatal SIGKILL is pending, and instead of returning from the syscall, I just perform a do_exit(). I'd like to be as consistent as possible with other Linux use cases...and it appears that simply returning from the syscall is what other code does. I just want to ensure that the above assumption about how SIGKILL kills the task is correct.

Signal checking happens after system call exit, yes.
See e.g. ret_from_sys_call at arch/x86/kernel/entry_64.S.

Related

kill child process - exec.Command

How do you kill child processes?
I have a long running application starting a new process with "exec.Command":
// ...I am a long running application in the background
// Now I am starting a child process, that should be killed togeter with the parent application.
cmd := exec.Command("sh", "-c", execThis)
// create a new process group
// cmd.SysProcAttr = &syscall.SysProcAttr{Setpgid: true}
Now if I kill <pid of long running application in the background> it does not kill the child process - do you know how?
There are quite a few things to be teased apart here.
First, there's what the OS itself does. Then, once we know what the OS is and what it does, there's what the program does.
What the OS does is, obviously, OS-dependent. POSIX-flavored OSes have two kinds of kill though: plain kill and process-group-based kill, or killpg. The killpg variety of this function is the only one that sends a signal to an entire process group; plain kill just sends a signal to a single process.
When a program is run from a controlling terminal, keyboard signals (^C, ^Z, etc) get sent to the foreground process group of that control terminal (see the linked page for a reasonably good description of these and note that BSD/macOS has ^T and SIGINFO as well). But if the signals are being sent from some other program, rather than from a controlling terminal, it is up to that program whether to call killpg or kill, and what signal(s) to send.
Some signals cannot be caught. This is the case for SIGKILL and SIGSTOP. These signals should not be sent willy-nilly; they should be reserved to a last resort. Instead, programs that want another program to stop should generally send one of SIGINT, SIGTERM, SIGHUP, or (rarely) SIGQUIT. Go tends to tie SIGQUIT to debug (in that the runtime on POSIX systems makes ^\ dump the stacks of the various goroutines) so that one is not a good choice. However, it's not up to the Go program you write here, which can only try to catch the signal. The choice of what to send is up to the sender.
The "Go way" to catch the signal is to use a goroutine and a channel. The signal.Notify function turns an OS-level signal into an event on the channel. What you do not (and cannot) know is whether the signal reached your process through kill or killpg (though if it came from a controlling terminal interaction, the POSIX-y kernel sent it via the equivalent of killpg). If you want to propagate that signal on your own, simply use the notification event to invoke code that makes an OS-level kill call. When using the os/exec package, use cmd.Process.Signal: note that this invokes the POSIX kill, not its killpg, but you would not want to use killpg here since we're assuming a non-process-group signal in the first place (a pgroup-based signal presumably needs no propagation).
There is no fully portable way to send a signal to a POSIX process group (which is not surprising, since this isn't portable to non-POSIX systems). Sadly, there's no direct Unix or POSIX specific way to do that either, it seems, in Go.
On non-POSIX systems, everything is quite different. See the discussion near the front of the os/signal package.

Interrupt a kernel module when a user process terminates/receives a signal?

I am working on a kernel module where I need to be "aware" that a given process has crashed.
Right now my approach is to set up a periodic timer interrupt in the kernel module; on every timer interrupt, I check the task_struct.state and task_struct.exitstate values for that process.
I am wondering if there's a way to set up an interrupt in the kernel module that would go off when the process terminates, or, when the process receives a given signal (e.g., SIGINT or SIGHUP).
Thanks!
EDIT: A catch here is that I can't modify the user application. Or at least, it would be a much tougher sell to the customer if I place additional requirements/constraints on s/w from another vendor...
You could have your module create a character device node and then open that node from your userspace process. It's only about a dozen lines of boilerplate to register a simple cdev in your module. Your cdev's open method will get called when the process opens the device node and the release method will be called when the device node is closed. If a process exits, either intentionally or because of a signal, all open file descriptors are closed by the kernel. So you can be certain that release will be called. This avoids any need to poll the process status and you can avoid modifying any kernel code outside of your module.
You could also setup a watchdog style system, where your process must write one byte to the device every so often. Have the write method of the cdev reset a timer. If too much time passes without a write and the timer expires, it is assumed the process has somehow failed, even if it hasn't crashed and terminated. For instance a programming bug that allowed for a mutex deadlock or placed the process into an infinite loop.
There is a point in the kernel code where signals are delivered to user processes. You could patch that, check the process name, and signal a condition variable if it matches. This would just catch signals, not intentional process exits. IMHO, this is much uglier and you'll need to deal with maintaining a kernel patch. But it's not that hard, there's a single point, I don't recall what function, sorry, where one can insert the necessary code and it will catch all signals.

catch SIGKILL in MacOS driver

I'm currently debugging my daemon that supposedly die due to SIGKILL.
I'd like to catch that signal that is intended for my process and add a printout that this process got .
I'm aware that SIGKILL cannot be caught in process level signal handler, so I've decided to use kext.
I've looked in xnu source code and saw that psignal is the method that passes the signal to the target process. However, so I've tried to use trampoline to patch it, but this method is only calls another static method named psignal_internal that is static, and it's probably eliminated by compiler optimization.
perhaps there are other ways to get some sort of mechanism that may help catching this event of sigkill and maybe provide option to set a proper callback function in this case?
thanks

how to figure out if process is really running or waiting to run on Linux?

This is purely academic question, I don't really need to know this information for anything, but I would like to understand kernel a bit more :-)
According to kernel documentation http://www.tldp.org/LDP/tlk/kernel/processes.html processes in linux kernel have following states:
Running
The process is either running (it is the current process in the
system) or it is ready to run (it is waiting to be assigned to one of
the system's CPUs).
Waiting
The process is waiting for an event or for a resource. Linux
differentiates between two types of waiting process; interruptible and
uninterruptible. Interruptible waiting processes can be interrupted by
signals whereas uninterruptible waiting processes are waiting directly
on hardware conditions and cannot be interrupted under any
circumstances.
Stopped
The process has been stopped, usually by receiving a signal. A process
that is being debugged can be in a stopped state.
Zombie
This is a halted process which, for some reason, still has a
task_struct data structure in the task vector. It is what it sounds
like, a dead process.
As you can see, when I take a snapshot of processes state, using command like ps, I can see, if it's in Running state, that process either was literally Running or just waiting to be assigned to some CPU by kernel.
In my opinion, these 2 states (that are actually both represented by 1 state in task_struct) are quite different.
Why there is no state like "Ready" that would mean the process is "ready to run" but wasn't assigned to any CPU so far, so that the task_struct would be more clear about the real state? Is it even possible to retrieve this information, or is it secret for whatever reason which process is "literally running" on the CPU?
The struct task_struct contains a long to represent current state:
volatile long state; /* -1 unrunnable, 0 runnable, >0 stopped */
This simply indicates if a process is 'runnable'.
To see the currently executing process you should look at the runqueue. Specifically a struct rq (as defined in kernel/sched/sched.h) contains:
struct task_struct *curr, *idle, *stop;
The pointer *curr is the currently running process on this runqueue (there exists a runqueue per CPU).
You should consult files under kernel/sched/ to see how the Kernel determines which processes should be scheduled according to the different scheduling algorithms if you are interested in exactly how it arrives at the running state.
This is not a linux-kernel answer but a more general about scheduling ^^
A core part of any OS is the Scheduler: http://en.wikipedia.org/wiki/Process_scheduler
Many of them work giving every process a time slice of execution and letting each of them do a little bit of work before switching (referred as a context switch) to another process.
Since the length of a time slice is in the order of milliseconds by the time the information you requested is shown, the state has surely changed so differentiate between "Really Running" and "Ready-but-not-really-running" could result (most of the time) in inaccurate informations.

linux kernel check if process is still running

I'm working in kernel space and I want to find out when an application has stopped or crashed.
When I receive an ioctl call, I can get the struct task_struct where I have a lot of information regarding the process of the application.
My problem is that I want to periodically check if the process is still alive or better yet, to have some asynchronous call when the process is killed.
My test environment was on QEMU and after a while in the application I've run a system("kill -9 pid"). Meanwhile in the kernel I've had a periodical check on task_struct with:
volatile long state; /* -1 unrunnable, 0 runnable, >0 stopped */
static inline int pid_alive(struct task_struct *p)
The problem is that my task_struct pointer seems to be unmodified. Normally I would say that each process has a task_struct and of course it is corespondent with the process state. Otherwise I don't see the point of "volatile long state"
What am I missing? Is it that I'm testing on QEMU, it is that I've tested checking the task_struct in a while(1) with an msleep of 100? Any help would be appreciated.
I would be partially happy if I could receive the pid of the application when the app is closing the file descriptor of the module ("/dev/driver").
Thanks!
You cannot hive off the task_struct pointer and refer to it later. If the process has been killed, the pointer is no longer valid - that task_struct is gone. You also should not be using PID values within the kernel to refer to processes. PID values are re-used, so you might not even be talking about the same process.
Your driver can supply a .release callback, which will be called when your driver file is closed, including if the process is terminated or killed. You can access current from this callback. Note that if a process opens your file and then forks, the process calling .release could well be different from the process that called .open. Your driver must be able to handle this.
It has been a long time since I mucked around inside the kernel. It seems to me if your process actually dies, then your best bet would be to put hooks into the code that tears down processes. If it doesn't die but gets caught in a non-responsive loop, you'd probably be better off causing an application level core dump.
A solution that worked beautifully in my operating systems homework is to use a kprobe to detect when do_exit is called. What's beautiful is that do_exit will always be called, no matter how the process is closed. I think even in the case of a kernel oops this one will still be called.
You should also hook into _do_fork, just in case.
Oh, and look at the .release callback mentioned in the other answer (do note that dup2 and fork will cause unexpected behavior -- you will only be notified when the last of the copies created by these two is closed).

Resources