I'm working in kernel space and I want to find out when an application has stopped or crashed.
When I receive an ioctl call, I can get the struct task_struct where I have a lot of information regarding the process of the application.
My problem is that I want to periodically check if the process is still alive or better yet, to have some asynchronous call when the process is killed.
My test environment was on QEMU and after a while in the application I've run a system("kill -9 pid"). Meanwhile in the kernel I've had a periodical check on task_struct with:
volatile long state; /* -1 unrunnable, 0 runnable, >0 stopped */
static inline int pid_alive(struct task_struct *p)
The problem is that my task_struct pointer seems to be unmodified. Normally I would say that each process has a task_struct and of course it is corespondent with the process state. Otherwise I don't see the point of "volatile long state"
What am I missing? Is it that I'm testing on QEMU, it is that I've tested checking the task_struct in a while(1) with an msleep of 100? Any help would be appreciated.
I would be partially happy if I could receive the pid of the application when the app is closing the file descriptor of the module ("/dev/driver").
Thanks!
You cannot hive off the task_struct pointer and refer to it later. If the process has been killed, the pointer is no longer valid - that task_struct is gone. You also should not be using PID values within the kernel to refer to processes. PID values are re-used, so you might not even be talking about the same process.
Your driver can supply a .release callback, which will be called when your driver file is closed, including if the process is terminated or killed. You can access current from this callback. Note that if a process opens your file and then forks, the process calling .release could well be different from the process that called .open. Your driver must be able to handle this.
It has been a long time since I mucked around inside the kernel. It seems to me if your process actually dies, then your best bet would be to put hooks into the code that tears down processes. If it doesn't die but gets caught in a non-responsive loop, you'd probably be better off causing an application level core dump.
A solution that worked beautifully in my operating systems homework is to use a kprobe to detect when do_exit is called. What's beautiful is that do_exit will always be called, no matter how the process is closed. I think even in the case of a kernel oops this one will still be called.
You should also hook into _do_fork, just in case.
Oh, and look at the .release callback mentioned in the other answer (do note that dup2 and fork will cause unexpected behavior -- you will only be notified when the last of the copies created by these two is closed).
Related
I am working on a kernel module where I need to be "aware" that a given process has crashed.
Right now my approach is to set up a periodic timer interrupt in the kernel module; on every timer interrupt, I check the task_struct.state and task_struct.exitstate values for that process.
I am wondering if there's a way to set up an interrupt in the kernel module that would go off when the process terminates, or, when the process receives a given signal (e.g., SIGINT or SIGHUP).
Thanks!
EDIT: A catch here is that I can't modify the user application. Or at least, it would be a much tougher sell to the customer if I place additional requirements/constraints on s/w from another vendor...
You could have your module create a character device node and then open that node from your userspace process. It's only about a dozen lines of boilerplate to register a simple cdev in your module. Your cdev's open method will get called when the process opens the device node and the release method will be called when the device node is closed. If a process exits, either intentionally or because of a signal, all open file descriptors are closed by the kernel. So you can be certain that release will be called. This avoids any need to poll the process status and you can avoid modifying any kernel code outside of your module.
You could also setup a watchdog style system, where your process must write one byte to the device every so often. Have the write method of the cdev reset a timer. If too much time passes without a write and the timer expires, it is assumed the process has somehow failed, even if it hasn't crashed and terminated. For instance a programming bug that allowed for a mutex deadlock or placed the process into an infinite loop.
There is a point in the kernel code where signals are delivered to user processes. You could patch that, check the process name, and signal a condition variable if it matches. This would just catch signals, not intentional process exits. IMHO, this is much uglier and you'll need to deal with maintaining a kernel patch. But it's not that hard, there's a single point, I don't recall what function, sorry, where one can insert the necessary code and it will catch all signals.
This is purely academic question, I don't really need to know this information for anything, but I would like to understand kernel a bit more :-)
According to kernel documentation http://www.tldp.org/LDP/tlk/kernel/processes.html processes in linux kernel have following states:
Running
The process is either running (it is the current process in the
system) or it is ready to run (it is waiting to be assigned to one of
the system's CPUs).
Waiting
The process is waiting for an event or for a resource. Linux
differentiates between two types of waiting process; interruptible and
uninterruptible. Interruptible waiting processes can be interrupted by
signals whereas uninterruptible waiting processes are waiting directly
on hardware conditions and cannot be interrupted under any
circumstances.
Stopped
The process has been stopped, usually by receiving a signal. A process
that is being debugged can be in a stopped state.
Zombie
This is a halted process which, for some reason, still has a
task_struct data structure in the task vector. It is what it sounds
like, a dead process.
As you can see, when I take a snapshot of processes state, using command like ps, I can see, if it's in Running state, that process either was literally Running or just waiting to be assigned to some CPU by kernel.
In my opinion, these 2 states (that are actually both represented by 1 state in task_struct) are quite different.
Why there is no state like "Ready" that would mean the process is "ready to run" but wasn't assigned to any CPU so far, so that the task_struct would be more clear about the real state? Is it even possible to retrieve this information, or is it secret for whatever reason which process is "literally running" on the CPU?
The struct task_struct contains a long to represent current state:
volatile long state; /* -1 unrunnable, 0 runnable, >0 stopped */
This simply indicates if a process is 'runnable'.
To see the currently executing process you should look at the runqueue. Specifically a struct rq (as defined in kernel/sched/sched.h) contains:
struct task_struct *curr, *idle, *stop;
The pointer *curr is the currently running process on this runqueue (there exists a runqueue per CPU).
You should consult files under kernel/sched/ to see how the Kernel determines which processes should be scheduled according to the different scheduling algorithms if you are interested in exactly how it arrives at the running state.
This is not a linux-kernel answer but a more general about scheduling ^^
A core part of any OS is the Scheduler: http://en.wikipedia.org/wiki/Process_scheduler
Many of them work giving every process a time slice of execution and letting each of them do a little bit of work before switching (referred as a context switch) to another process.
Since the length of a time slice is in the order of milliseconds by the time the information you requested is shown, the state has surely changed so differentiate between "Really Running" and "Ready-but-not-really-running" could result (most of the time) in inaccurate informations.
I'm writing a syscall in Linux 3.0, and while I wait for some event to occur (using a waitqueue), I would like to check for a pending SIGKILL and if one occurs, I would like for the current task to die as soon as possible. As far as I can tell, as soon as I return from the syscall (well, really: as soon as the process is to enter into user mode) returns, the kernel checks for pending signals and upon seeing the SIGKILL, the kernel will kill current before it returns to user mode.
Question: Is my above assumption correct about how SIGKILL works? My other option is to see that the fatal SIGKILL is pending, and instead of returning from the syscall, I just perform a do_exit(). I'd like to be as consistent as possible with other Linux use cases...and it appears that simply returning from the syscall is what other code does. I just want to ensure that the above assumption about how SIGKILL kills the task is correct.
Signal checking happens after system call exit, yes.
See e.g. ret_from_sys_call at arch/x86/kernel/entry_64.S.
I want to write a program, which will launch a child process. The child process may be windows mode or console mode program.
I want to monitor the child process status and resource usage. e.g. I want to know the child process is still running or terminated. If it terminated, I want to know the reason (is terminated normally or because of crash?).
And during the child process running and/or it terminated, I want to know its resource usage, especially CPU time (user time, system) and memory usage (virtual size and/or rss). It is OK if the numbers are not very accurate.
In Unix terminology, I want to fork, exec, waitpid and getrusage . And fork+setrusage+exec can limit child's resource usage. But I don't know how to do these on the Windows platform.
Please point me the Windows API name. I could study the rest myself.
Prefer not using library other than the Windows API. Prefer it is not parent working as debugger and attaching to child process. Just not prefer, but still acceptable.
When you call CreateProcess, it returns a handle to the process.
WaitForSingleObject on a process handle will block until the process has exited or time-out has expired. A timeout of zero will return immediately and indicate if the process is still running.
BOOL IsProcessRunning(HANDLE process)
{
return WaitForSingleObject(process, 0) != WAIT_OBJECT_0;
}
void WaitForProcessToExit(HANDLE process)
{
WaitForSingleObject(process, INFINITE);
}
To get the exit code of a running process, you can use GetExitCodeProcess. You'll need to interpret what the error code means, however. 0xC0000005 is typical for an access violation, but not all crashes result in this error code.
For resource usage, you can call GetProcessTimes to get total CPU time, GetGuiResources to get GDI handle info, GetProcessMemoryInfo to get memory stats, and GetProcessIoCounters to get IO info.
Do we have any sort of relationship between fork() and CreateThread? Is there anything that
CreateThread internally calls fork()?
In NT, the fundamental working unit is called a thread (ie NT schedules threads, not processes.). User threads run in the context of a process. When you call CreateThread, you request the NT kernel to allocate a working unit within the context of your process (you also have fibres that are basically threads you can schedule yourself but that's beyond the topic of your question).
When you call CreateThread you provide the function with an entry point that is going to be run after the function is called. The code must be within the virtual space of the process and the page must have execution rights. Put simply, you give a function pointer. ;)
fork() is an UNIX function that requests the kernel to create copy of the running process. The parent process gets the pid of the child process and the child process gets 0 (this way you know who you are).
If you wish to create a process in Windows, you call the CreateProcess function, but that doesn't behave like fork(). The reason being that most of the time you will create threads, not processes.
As you can see, there is no relation between CreateThread and fork.
fork() only exists on Unix systems and it creates a new process with the same state as the caller. CreateThread() creates a new thread in the same process.
The Windows and Unix process model is fundamentally very different, so there is no way of directly mapping the API from one on top of the other.
fork() clones the current process into two. In the parent process, fork() returns the pid, and in the child it returns 0. This is typically used like this:
int pid;
if (pid = fork()) {
// this code is executed in the parent
} else {
// this code is executed in the child
}
Cygwin is an emulation layer for building and running Unix applications on Windows which emulates the behavior of fork() using CreateProcess().
CreateThread - is for threads, fork - is for creating duplicate process. And there is no native way to have fork functionality for windows (at least through Win32 ).
You might want to know Microsoft provides fork() in high-end versions of Windows with component called Subsystem for UNIX-based Applications (SUA). You can find details in my answer here.
Found this link which i believe could be helpful in clearing few facts regarding forking/threading.
Sharing over here: http://www.geekride.com/index.php/2010/01/fork-forking-vs-threading-thread-linux-kernel/