What is the relation between `task_struct` and `pid_namespace`? - linux-kernel

I'm studying some kernel code and trying to understand how the data structures are linked together. I know the basic idea of how a scheduler works, and what a PID is. Yet I have no idea what a namespace is in this context, and can't figure out how all of those work together.
I have read some explanations (including parts of O'Reilly "Understanding the Linux Kernel") and understand that it could be that the same PID got to two processes because one has terminated and the ID got reallocated. But I can't figure out how all this is done.
So:
What is a namespace in this context?
What is the relation between task_struct and pid_namespace? (I already figured it has to do with pid_t, but don't know how)
Some references:
Definition of pid_namespace
Definition of task_struct
Definition of upid (see also pid just beneath it)

Perhaps these links might help:
PID namespaces in operation
A brief introduction to PID namespaces (this one comes from a sysadmin)
After going through the second link it becomes clear that namespaces are a great way to isolate resources. And in any OS, Linux included, processes are one of the most crucial resource there is. In his own words
Yes, that’s it, with this namespace it is possible to restart PID
numbering and get your own “1″ process. This could be seen as a
“chroot” in the process identifier tree. It’s extremely handy when you
need to deal with pids in day to day work and are stuck with 4 digits
numbers…
So you sort of create your own private process tree and then assign it to a specific user and/or to a specific task. Within this tree, the processes need not worry about PIDs conflicting with those outside this 'container'. Hence it is as good as handing over this tree to a different 'root' user altogether. That fine fellow has done a wonderful job of explaining the things with a nice little example to top it off, so I won't repeat it here.
As far as the kernel is concerned, I can give you a few pointers to get you started. I am not an expert here but I hope this should help you to some extent.
This LWN article, describes the older and the newer way of looking at PIDs. In it's own words:
All the PIDs that a task may have are described in the struct pid. This structure contains the ID value, the list of tasks having this
ID, the reference counter and the hashed list node to be stored in the
hash table for a faster search. A few more words about the lists of
tasks. Basically a task has three PIDs: the process ID (PID), the
process group ID (PGID), and the session ID (SID). The PGID and the
SID may be shared between the tasks, for example, when two or more
tasks belong to the same group, so each group ID addresses more than
one task. With the PID namespaces this structure becomes elastic. Now,
each PID may have several values, with each one being valid in one
namespace. That is, a task may have PID of 1024 in one namespace, and
256 in another. So, the former struct pid changes. Here is how the
struct pid looked like before introducing the PID namespaces:
struct pid {
atomic_t count; /* reference counter */
int nr; /* the pid value */
struct hlist_node pid_chain; /* hash chain */
struct hlist_head tasks[PIDTYPE_MAX]; /* lists of tasks */
struct rcu_head rcu; /* RCU helper */
};
And this is how it looks now:
struct upid {
int nr; /* moved from struct pid */
struct pid_namespace *ns; /* the namespace this value
* is visible in */
struct hlist_node pid_chain; /* moved from struct pid */
};
struct pid {
atomic_t count;
struct hlist_head tasks[PIDTYPE_MAX];
struct rcu_head rcu;
int level; /* the number of upids */
struct upid numbers[0];
};
As you can see, the struct upid now represents the PID value -- it is stored in the hash and has the PID value. To convert the struct pid to the PID or vice versa one may use a set of helpers like
task_pid_nr(), pid_nr_ns(), find_task_by_vpid(), etc.
Though a bit dated, this information is fair enough to get you started. There's one more important structure that needs mention here. It is struct nsproxy. This structure is the focal point of all things namespace vis-a-vis the processes to which it is associated. It contains a pointer to the PID namespace that this process's children will use. The PID namespace for the current process is found using task_active_pid_ns.
Within struct task_struct, we have a namespace proxy pointer aptly called nsproxy, which points to this process's struct nsproxy structure. If you trace the steps needed to create a new process, you can find the relationship(s) between the task_struct, struct nsproxyand struct pid.
A new process in Linux is always forked out from an existing process and it's image later replaced using execve (or similar functions from the exec family). Thus as part of do_fork, copy_process is invoked.
As part of copying the parent process the following important things happen:
task_struct is first duplicated using dup_task_struct.
parent process's namespaces is also copied using copy_namespaces. This also creates a new nsproxy structure for the child and it's nsproxy pointer points to this newly created structure
For a non INIT process (the original global PID aka the first process spawned on boot), a PID structure is allocated using alloc_pid which actually allocates a new PID structure for the newly forked process. A short snippet from this function:
nr = alloc_pidmap(tmp);
if(nr<0)
goto out_free;
pid->numbers[i].nr = nr;
pid->numbers[i].ns = tmp;
This populates upid structure by giving it a new PID as well as the namespace to which it currently belongs.
Further as part of the copy process function, this newly allocated PID is then linked to the corresponding task_struct via function pid_nr i.e. it's global ID (which is the original PID nr as seem from the INIT namespace) is stored in the field pid in task_struct.
In the final stages of copy_process, a link is established between task_struct and this new pid structure through the pid_link field within task_struct through the function attach_pid.
Theres a lot more to it, but I hope this should at least give you some headstart.
NOTE: I am referring to the latest (as of now) kernel version viz. 3.17.2.

Related

In node* newNode=new node() , whose address is exactly getting returned by new here?

node* newNode=new node();
Here node is a typical linked list node class and newNode is the pointer used to dynamically create a new node containing int data and node* next attributes. Please tell exactly which address gets returned by the new keyword and getting stored in the newNode here?
For instance in int* p = arr[n]; , the address of arr or arr[0] is stored specifically.
There is lot of behind the scenes are happening when you are specifying new keyword. The context of "which address gets returned by the new keyword" is not the primary thing to focus, rather you need to know how a program (more specifically a process) deals with memory via Operating System.
what is process?
a process is more than a program code, which sometimes referred as text section; it also includes current activity as represented by the value of program counter and contents of the processor's registers.
a process generally includes:
the process stack, which contains temporary data such as function parameters, return addresses and local variables.
the data section, which contains global variables.
a process also include a heap area (which is important for your question), which is memory that is dynamically allocated during process runtime.
How memory allocated dynamically?
The new operator creates the object using that memory, and then returns a pointer containing the address of the memory that has been allocated.
new int; // dynamically allocate an integer.
Most often, we’ll assign the return value to our own pointer variable so we can access the allocated memory later.
int *ptr{ new int };
We can then perform indirection through the pointer to access the memory:
*ptr = 7;
Also some points to mention:
when you are requesting some memory by new keyword, the Operating System first search for free memory needed to fulfill your requirement from heap area (as mentioned above), if OS finds the memory, it is allocated to your variable/s.
when you are done with your process, the process itself return the memory back to the OS, so that it can be used by another program's process.
REFERENCE: OPERATING SYSTEM CONCEPTS BY SILBERSCHATZ, GALVIN AND GAGNE.

Working of mmap()

I am trying to get an idea on how does memory mapping take place using the system call mmap.
So far I know mmap takes arguments from the user and returns a logical address of where the file is stored. When the user tries to access it takes this address to the map table converts it to a a physical address and carries the operation as requested.
However I found articles as code example and Theoretical explanation
What it mentions is the memory mapping is carried out as:
A. Using system call mmap ()
B. file operations using (struct file *filp, struct vm_area_struct *vma)
What I am trying to figure out is:
How the arguments passed in the mmap system call are used in the struct vm_area_struct *vma) More generally how are these 2 related.
for instance: the struct vm_area_struct has arguments such as starting address, ending address permissions,etc. How are the values sent by the user used to fill values of these variables.
I am trying to write a driver so, Does the kernal fill the values for variables in the structure for us and I simply use it to call and pass values to remap_pfn_range
And a more fundamental question, why is a different file systems operation needed. The fact that mmap returns the virtual address means that it has already achieved a mapping doesnt it ?
Finally I am not that clear about how the entire process would work in user as well as kernal space. Any documentation explaining the process in details would be helpful.

Modifying exit.c system call code

Hello guys I need a little help here. After many hours of study and research I gave up I couldn't do it. I'm new in kernel programming and I have this task to do. I am asked to modify the exit() system call code so that it terminates all the children processes of calling process and then terminate the process.
As much as I know exit() system call gives the children to the init process after parent terminates. I thought I can terminate each children by using children id and calling:
kill (child_pid, SIGTERM);
also I know that we can access calling process task_struct using current global variable.
Anyone know can I get all the children PID from the current variable? Is there any other solution you know?
UPDATE:
I found a way how to traverse the children of current process. Here is my modified code.
void do_exit(long code)
{
struct task_struct *tsk = current;
//code added by me
int nice=current->static_prio-120;
if(tsk->myFlag==1 && nice>10){
struct task_struct *task;
struct list_head *list;
list_for_each(list, &current->children) {
task = list_entry(list, struct task_struct, sibling);
//kill child
kill(task->pid,SIGKILL);
}
}
Will this even work?
SIGTERM is catchable and in particular can be ignored. You want to send SIGKILL instead. You can't just use the 'kill' system call either. Instead, once you grab the pointer to the child, you send the signal to that. An example how to do it is, well, in the implementation of the kill syscall.
An example code which has to modify children list (add an element) would be clone. An example code which is very likely to traverse the list (and it likely does in your version) would be the wait* family, e.g. waitid.

Make a system call to get list of processes

I'm new on modules programming and I need to make a system call to retrieve the system processes and show how much CPU they are consuming.
How can I make this call?
Why would you implement a system call for this? You don't want to add a syscall to the existing Linux API. This is the primary Linux interface to userspace and nobody touches syscalls except top kernel developers who know what they do.
If you want to get a list of processes and their parameters and real-time statuses, use /proc. Every directory that's an integer in there is an existing process ID and contains a bunch of useful dynamic files which ps, top and others use to print their output.
If you want to get a list of processes within the kernel (e.g. within a module), you should know that the processes are kept internally as a doubly linked list that starts with the init process (symbol init_task in the kernel). You should use macros defined in include/linux/sched.h to get processes. Here's an example:
#include <linux/module.h>
#include <linux/printk.h>
#include <linux/sched.h>
static int __init ex_init(void)
{
struct task_struct *task;
for_each_process(task)
pr_info("%s [%d]\n", task->comm, task->pid);
return 0;
}
static void __exit ex_fini(void)
{
}
module_init(ex_init);
module_exit(ex_fini);
This should be okay to gather information. However, don't change anything in there unless you really know what you're doing (which will require a bit more reading).
There are syscalls for that, called open, and read. The information of all processes are all kept in /proc/{pid} directories. You can gather process information by reading corresponding files.
More explained here: http://www.tldp.org/LDP/Linux-Filesystem-Hierarchy/html/proc.html

kernel: synchronizing deletion of shared field of task_struct

I would like to add a pointer (to an object) to task_struct that is shared between all threads in the group. After the object has been deleted by 1 thread, how could I ensure that another thread will not attempt to dereference the invalid pointer.
Could I add an atomic variable reference field to task_struct, and then update them in sync across all threads of a process (hold a global spinlock while traversing task_structs)?
Or implementing a kernel thread that manages the objects and their reference counts. Seems like this problem must have been solved by other shared entities like virtual memory and file handles.
You could do this by defining you own datastructure:
struct my_task_data {
void *real_data;
}
The task_struct must be enhanced:
struct task_struct {
....
struct my_task_data *mtd;
};
In the clone() call you need to handle the mdt member of the task_struct.
real_data points to whatever you want. Doing it this way means you have one pointer from each task_struct to a shared object (mtd) which is always valid and can be dereferenced at any time. This shared object contains a pointer to your actual data item. When you want to access the item do:
data = current()->mtd->real_data;
if data is NULL another thread has deleted it, otherwise it can be used.
Locking issues are not shown in this example.
Of course you need to protect access to real_data by some locking mechanism, like a mutex or semaphore in the my_task_data structure and use it while manipulating my_task_data.

Resources