Modifying exit.c system call code

Modifying exit.c system call code - linux-kernel

Hello guys I need a little help here. After many hours of study and research I gave up I couldn't do it. I'm new in kernel programming and I have this task to do. I am asked to modify the exit() system call code so that it terminates all the children processes of calling process and then terminate the process.
As much as I know exit() system call gives the children to the init process after parent terminates. I thought I can terminate each children by using children id and calling:
kill (child_pid, SIGTERM);
also I know that we can access calling process task_struct using current global variable.
Anyone know can I get all the children PID from the current variable? Is there any other solution you know?
UPDATE:
I found a way how to traverse the children of current process. Here is my modified code.
void do_exit(long code)
{
struct task_struct *tsk = current;
//code added by me
int nice=current->static_prio-120;
if(tsk->myFlag==1 && nice>10){
struct task_struct *task;
struct list_head *list;
list_for_each(list, &current->children) {
task = list_entry(list, struct task_struct, sibling);
//kill child
kill(task->pid,SIGKILL);
}
}
Will this even work?

SIGTERM is catchable and in particular can be ignored. You want to send SIGKILL instead. You can't just use the 'kill' system call either. Instead, once you grab the pointer to the child, you send the signal to that. An example how to do it is, well, in the implementation of the kill syscall.
An example code which has to modify children list (add an element) would be clone. An example code which is very likely to traverse the list (and it likely does in your version) would be the wait* family, e.g. waitid.

Related

Is there a race between starting and seeing yourself in WinApi's EnumProcesses()?

I just found this code in the wild:
def _scan_for_self(self):
win32api.Sleep(2000) # sleep to give time for process to be seen in system table.
basename = self.cmdline.split()[0]
pids = win32process.EnumProcesses()
if not pids:
UserLog.warn("WindowsProcess", "no pids", pids)
for pid in pids:
try:
handle = win32api.OpenProcess(
win32con.PROCESS_QUERY_INFORMATION | win32con.PROCESS_VM_READ,
pywintypes.FALSE, pid)
except pywintypes.error, err:
UserLog.warn("WindowsProcess", str(err))
continue
try:
modlist = win32process.EnumProcessModules(handle)
except pywintypes.error,err:
UserLog.warn("WindowsProcess",str(err))
continue
This line caught my eye:
win32api.Sleep(2000) # sleep to give time for process to be seen in system table.
It suggests that if you call EnumProcesses() too fast after starting, you won't see yourself. Is there any truth to this?

There is a race, but it's not the race the code tried to protect against.
A successful call to CreateProcess returns only after the kernel object representing the process has been created and enqueued into the kernel's process list. A subsequent call to EnumProcesses accesses the same list, and will immediately observe the newly created process object.
That is, unless the process object has since been destroyed. This isn't entirely unusual since processes in Windows are initialized in-process. The documentation even makes note of that:
Note that the function returns before the process has finished initialization. If a required DLL cannot be located or fails to initialize, the process is terminated.
What this means is that if a call to EnumProcesses immediately following a successful call to CreateProcess doesn't observe the newly created process, it does so because it was late rather than early. If you are late already then adding a delay will only make you more late.
Which swiftly leads to the actual race here: Process IDs uniquely identify processes only for a finite time interval. Once a process object is gone, its ID is up for grabs, and the system will reuse it at some point. The only reliable way to identify a process is by holding a handle to it.
Now it's anyone's guess what the author of _scan_for_self was trying to accomplish. As written, the code takes more time to do something that's probably altogether wrong1 anyway.
1 Turns out my gut feeling was correct. This is just your average POSIX developer, that, in the process of learning that POSIX is insufficient would rather call out Microsoft instead of actually using an all-around superior API.

The documentation for EnumProcesses (WIn32 API - EnumProcesses function), does not mention anything about a delay needed to see the current process in the list it returns.
The example from Microsoft how to use EnumProcess to enumerate all running processes (Enumerating All Processes), also does not contain any delay before calling EnumProcesses.
A small test application I created in C++ (see below) always reports that the current process is in the list (tested on Windows 10):
#include <Windows.h>
#include <Psapi.h>
#include <iostream>
#include <vector>
const DWORD MAX_NUM_PROCESSES = 4096;
DWORD aProcesses[MAX_NUM_PROCESSES];
int main(void)
{
// Get the list of running process Ids:
DWORD cbNeeded;
if (!EnumProcesses(aProcesses, MAX_NUM_PROCESSES * sizeof(DWORD), &cbNeeded))
{
return 1;
}
// Check if current process is in the list:
DWORD curProcId = GetCurrentProcessId();
bool bFoundCurProcId{ false };
DWORD numProcesses = cbNeeded / sizeof(DWORD);
for (DWORD i=0; i<numProcesses; ++i)
{
if (aProcesses[i] == curProcId)
{
bFoundCurProcId = true;
}
}
std::cout << "bFoundCurProcId: " << bFoundCurProcId << std::endl;
return 0;
}
Note: I am aware that the fact that the program reported the expected result does not mean that there is no race. Maybe I just couldn't catch it manifest. But trying to run code like that can give you a hint sometimes (especially if the result would have been that there is a race).
The fact that I never had a problem running this test (did it many times), together with the lack of any mention of the need for a delay in Microsoft's documentation make me believe that it is not required.
My conclusion is that either:
There is a unique issue when using it from python (doubt it).
or:
The code you found is doing something unnecessary.

There is no race.
EnumProcesses calls a NT API function that switches to kernel mode to walk the linked list of processes. Your own process has been added to the list before it starts running.

Ruby - fork, exec, detach .... do we have a race condition here?

Simple example, which doesn't work on my platform (Ruby 2.2, Cygwin):
#!/usr/bin/ruby
backtt = fork { exec('mintty','/usr/bin/zsh','-i') }
Process.detach(backtt)
exit
This tiny program (when started from the shell) is supposed to span a terminal window (mintty) and then get me back to the shell prompt.
However, while it DOES create the mintty window, I don't have a shell prompt afterwards, and I can't type anything in the calling shell.
But when I introduce a small delay before the detach, either using 'sleep', or by printing something on stdout, it works as expected:
#!/usr/bin/ruby
backtt = fork { exec('mintty','/usr/bin/zsh','-i') }
sleep 1
Process.detach(backtt)
exit
Why is this necessary?
BTW, I'm well aware that I could (from the shell) do a
mintty /usr/bin/zsh -i &
directly, or I could use system(...... &) from inside Ruby, but this is not the point here. I'm particularily interested in the fork/exec/detach behaviour in Ruby. Any insights?

Posting as an answer, because it is too long for a comment
Although I am no specialist in Ruby, and do not know Cygwin at all, this situation sounds very familiar to me, coming from C/C++.
This script is too short, so the parent of the parent completes, while the grandchild tries to start.
What would happen if you put the sleep after detach and before exit?
If my theory is correct, it should work too. Your program exits before any (or enough) thread-switching happens.
I call such problems "interrupted hand shaking". Although this is psychology terminology, it describes what happens.
Sleep "gives up the time slice", leading to thread-switching,
Console output (any file I/O) runs into semaphores, also leading to thread switching.
If my idea is correct, it should also work, if you dont "sleep", just count to 1e9 (depending on the speed of computation) because then preemptive multitasking makes even the thread-switch itself not giving up the CPU.
So it is an error in programming (IMHO: race condition is philosophical in that case), but it will get hard to find "who" is responsible. There are many things involved.

According to the documentation:
Process::detach prevents this by setting up a separate Ruby thread whose sole job is to reap the status of the process pid when it terminates.
NB: I can’t reproduce this behaviour on any of available to me operating systems, and I’m posting this as an answer just for the sake of formatting.
Since Process.detach(backtt) transparently creates a thread, I would suggest you to try:
#!/usr/bin/ruby
backtt = fork { exec('mintty','/usr/bin/zsh','-i') }
# ⇓⇓⇓⇓⇓
Process.detach(backtt).join
exit
This is no hack by any mean (as opposite to silly sleep,) since you are likely aware of that the underlying command should return more-or-less immediately. I am not a guru in cygwin, but it might have some specific issues with threads, so, let this thread to be handled.

I'm neither a Ruby nor a Cygwin guy, so what I propose here may not work at all. Anyways: I guess, you're not even hitting a Ruby or Cygwin specific bug here. In a program called "start" I've written in C many years ago, I hit the same issue. Here is a comment from the start of the function void daemonize_now():
/*
* This is a little bit trickier than I expected: If we simply call
* setsid(), it may fail! We have to fork() and exit(), and let our
* child call setsid().
*
* Now the problem: If we fork() and exit() immediatelly, our child
* will be killed before it ever had been run. So we need to sleep a
* little bit. Now the question: How long? I don't know an answer. So
* let us being killed by our child :-)
*/
So, he strategy is this: Let the parent wait on it's child (that can be done immediately before the child actually had a chance to do anything) and then let the child do the detaching part. How? Let it create a new process group (it will be reparented to the init process). That's the setsid() call for, I'm talking about in the comment. It will work something like this (C-Syntax, you should be able to lookup the correct usage for Ruby and apply the needed changes yourself):
parentspid = getpid();
Fork = fork();
if (Fork) {
if (Fork == -1) { // fork() failed
handle error
} else { // parent, Fork is the pid of the child
int tmp; waitpid(0, &tmp, 0);
}
} else { // child
if (setsid() == -1) {
handle error - possibly by doing nothing
and just let the parent wait ...
} else {
kill(parentspid, SIGUSR1);
}
exec(...);
}
You can use any signal, that terminates the process (i.e. SIGKILL). I used SIGUSR1 and installed a signal handler that exit(0)s the parent process, so the caller gets a success message. Only caveat: You get a success even if the exec fails. However, that is a problem that can't really be worked around, since after a successful exec you can't signal your parent anything anymore. And since you don't know when the exec will have failed (if it fails), you're back at the race condition part.

What is the relation between `task_struct` and `pid_namespace`?

I'm studying some kernel code and trying to understand how the data structures are linked together. I know the basic idea of how a scheduler works, and what a PID is. Yet I have no idea what a namespace is in this context, and can't figure out how all of those work together.
I have read some explanations (including parts of O'Reilly "Understanding the Linux Kernel") and understand that it could be that the same PID got to two processes because one has terminated and the ID got reallocated. But I can't figure out how all this is done.
So:
What is a namespace in this context?
What is the relation between task_struct and pid_namespace? (I already figured it has to do with pid_t, but don't know how)
Some references:
Definition of pid_namespace
Definition of task_struct
Definition of upid (see also pid just beneath it)

Perhaps these links might help:
PID namespaces in operation
A brief introduction to PID namespaces (this one comes from a sysadmin)
After going through the second link it becomes clear that namespaces are a great way to isolate resources. And in any OS, Linux included, processes are one of the most crucial resource there is. In his own words
Yes, that’s it, with this namespace it is possible to restart PID
numbering and get your own “1″ process. This could be seen as a
“chroot” in the process identifier tree. It’s extremely handy when you
need to deal with pids in day to day work and are stuck with 4 digits
numbers…
So you sort of create your own private process tree and then assign it to a specific user and/or to a specific task. Within this tree, the processes need not worry about PIDs conflicting with those outside this 'container'. Hence it is as good as handing over this tree to a different 'root' user altogether. That fine fellow has done a wonderful job of explaining the things with a nice little example to top it off, so I won't repeat it here.
As far as the kernel is concerned, I can give you a few pointers to get you started. I am not an expert here but I hope this should help you to some extent.
This LWN article, describes the older and the newer way of looking at PIDs. In it's own words:
All the PIDs that a task may have are described in the struct pid. This structure contains the ID value, the list of tasks having this
ID, the reference counter and the hashed list node to be stored in the
hash table for a faster search. A few more words about the lists of
tasks. Basically a task has three PIDs: the process ID (PID), the
process group ID (PGID), and the session ID (SID). The PGID and the
SID may be shared between the tasks, for example, when two or more
tasks belong to the same group, so each group ID addresses more than
one task. With the PID namespaces this structure becomes elastic. Now,
each PID may have several values, with each one being valid in one
namespace. That is, a task may have PID of 1024 in one namespace, and
256 in another. So, the former struct pid changes. Here is how the
struct pid looked like before introducing the PID namespaces:
struct pid {
atomic_t count; /* reference counter */
int nr; /* the pid value */
struct hlist_node pid_chain; /* hash chain */
struct hlist_head tasks[PIDTYPE_MAX]; /* lists of tasks */
struct rcu_head rcu; /* RCU helper */
};
And this is how it looks now:
struct upid {
int nr; /* moved from struct pid */
struct pid_namespace *ns; /* the namespace this value
* is visible in */
struct hlist_node pid_chain; /* moved from struct pid */
};
struct pid {
atomic_t count;
struct hlist_head tasks[PIDTYPE_MAX];
struct rcu_head rcu;
int level; /* the number of upids */
struct upid numbers[0];
};
As you can see, the struct upid now represents the PID value -- it is stored in the hash and has the PID value. To convert the struct pid to the PID or vice versa one may use a set of helpers like
task_pid_nr(), pid_nr_ns(), find_task_by_vpid(), etc.
Though a bit dated, this information is fair enough to get you started. There's one more important structure that needs mention here. It is struct nsproxy. This structure is the focal point of all things namespace vis-a-vis the processes to which it is associated. It contains a pointer to the PID namespace that this process's children will use. The PID namespace for the current process is found using task_active_pid_ns.
Within struct task_struct, we have a namespace proxy pointer aptly called nsproxy, which points to this process's struct nsproxy structure. If you trace the steps needed to create a new process, you can find the relationship(s) between the task_struct, struct nsproxyand struct pid.
A new process in Linux is always forked out from an existing process and it's image later replaced using execve (or similar functions from the exec family). Thus as part of do_fork, copy_process is invoked.
As part of copying the parent process the following important things happen:
task_struct is first duplicated using dup_task_struct.
parent process's namespaces is also copied using copy_namespaces. This also creates a new nsproxy structure for the child and it's nsproxy pointer points to this newly created structure
For a non INIT process (the original global PID aka the first process spawned on boot), a PID structure is allocated using alloc_pid which actually allocates a new PID structure for the newly forked process. A short snippet from this function:
nr = alloc_pidmap(tmp);
if(nr<0)
goto out_free;
pid->numbers[i].nr = nr;
pid->numbers[i].ns = tmp;
This populates upid structure by giving it a new PID as well as the namespace to which it currently belongs.
Further as part of the copy process function, this newly allocated PID is then linked to the corresponding task_struct via function pid_nr i.e. it's global ID (which is the original PID nr as seem from the INIT namespace) is stored in the field pid in task_struct.
In the final stages of copy_process, a link is established between task_struct and this new pid structure through the pid_link field within task_struct through the function attach_pid.
Theres a lot more to it, but I hope this should at least give you some headstart.
NOTE: I am referring to the latest (as of now) kernel version viz. 3.17.2.

Make a system call to get list of processes

I'm new on modules programming and I need to make a system call to retrieve the system processes and show how much CPU they are consuming.
How can I make this call?

Why would you implement a system call for this? You don't want to add a syscall to the existing Linux API. This is the primary Linux interface to userspace and nobody touches syscalls except top kernel developers who know what they do.
If you want to get a list of processes and their parameters and real-time statuses, use /proc. Every directory that's an integer in there is an existing process ID and contains a bunch of useful dynamic files which ps, top and others use to print their output.
If you want to get a list of processes within the kernel (e.g. within a module), you should know that the processes are kept internally as a doubly linked list that starts with the init process (symbol init_task in the kernel). You should use macros defined in include/linux/sched.h to get processes. Here's an example:
#include <linux/module.h>
#include <linux/printk.h>
#include <linux/sched.h>
static int __init ex_init(void)
{
struct task_struct *task;
for_each_process(task)
pr_info("%s [%d]\n", task->comm, task->pid);
return 0;
}
static void __exit ex_fini(void)
{
}
module_init(ex_init);
module_exit(ex_fini);
This should be okay to gather information. However, don't change anything in there unless you really know what you're doing (which will require a bit more reading).

There are syscalls for that, called open, and read. The information of all processes are all kept in /proc/{pid} directories. You can gather process information by reading corresponding files.
More explained here: http://www.tldp.org/LDP/Linux-Filesystem-Hierarchy/html/proc.html

linux kernel check if process is still running

I'm working in kernel space and I want to find out when an application has stopped or crashed.
When I receive an ioctl call, I can get the struct task_struct where I have a lot of information regarding the process of the application.
My problem is that I want to periodically check if the process is still alive or better yet, to have some asynchronous call when the process is killed.
My test environment was on QEMU and after a while in the application I've run a system("kill -9 pid"). Meanwhile in the kernel I've had a periodical check on task_struct with:
volatile long state; /* -1 unrunnable, 0 runnable, >0 stopped */
static inline int pid_alive(struct task_struct *p)
The problem is that my task_struct pointer seems to be unmodified. Normally I would say that each process has a task_struct and of course it is corespondent with the process state. Otherwise I don't see the point of "volatile long state"
What am I missing? Is it that I'm testing on QEMU, it is that I've tested checking the task_struct in a while(1) with an msleep of 100? Any help would be appreciated.
I would be partially happy if I could receive the pid of the application when the app is closing the file descriptor of the module ("/dev/driver").
Thanks!

You cannot hive off the task_struct pointer and refer to it later. If the process has been killed, the pointer is no longer valid - that task_struct is gone. You also should not be using PID values within the kernel to refer to processes. PID values are re-used, so you might not even be talking about the same process.
Your driver can supply a .release callback, which will be called when your driver file is closed, including if the process is terminated or killed. You can access current from this callback. Note that if a process opens your file and then forks, the process calling .release could well be different from the process that called .open. Your driver must be able to handle this.

It has been a long time since I mucked around inside the kernel. It seems to me if your process actually dies, then your best bet would be to put hooks into the code that tears down processes. If it doesn't die but gets caught in a non-responsive loop, you'd probably be better off causing an application level core dump.

A solution that worked beautifully in my operating systems homework is to use a kprobe to detect when do_exit is called. What's beautiful is that do_exit will always be called, no matter how the process is closed. I think even in the case of a kernel oops this one will still be called.
You should also hook into _do_fork, just in case.
Oh, and look at the .release callback mentioned in the other answer (do note that dup2 and fork will cause unexpected behavior -- you will only be notified when the last of the copies created by these two is closed).

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio