Usage of the for_each_process macro in the Linux kernel - linux-kernel

I want to iterate over each process in the kernel and modify some parameters in task_struct. I think I can use the for_each_process() macro to do so.
However, to do it safely, I have to ensure that the process is not being executed currently and also after I get reference to its task_struct, I want to lock it down so that no one else accesses it while I am modifying it.
How can I accomplish these two goals?

You can use:
int flags;
smp_wmb();
raw_spin_lock_irqsave(&task->pi_lock, flags);
do your stuff
raw_spin_unlock_irqrestore(&task->pi_lock, flags);
to lock the task you are currently processing.

Related

Is returning while holding a spinlock automatically unsafe?

The venerated book Linux Driver Development says that
The flags argument passed to spin_unlock_irqrestore must be the same variable passed to spin_lock_irqsave. You must also call spin_lock_irqsave and spin_unlock_irqrestore in the same function; otherwise your code may break on some architectures.
Yet I can't find any such restriction required by the official documentation bundled with the kernel code itself. And I find driver code that violates this guidance.
Obviously it isn't a good idea to call spin_lock_irqsave and spin_unlock_irqrestore from separate functions, because you're supposed to minimize the work done while holding a lock (with interrupts disabled, no less!). But have changes to the kernel made it possible if done with care, was it never actually against the API contract, or is it still verboten to do so?
If the restriction has been removed at some point, did it apply to version 3.10.17?
This is just a guess, but the might be unclearly referring to a potential bug which could happen if you try to use a nonlocal variable or storage location for flags.
Basically, flags has to be private to the current execution context, which is why spin_lock_irqsave is a macro which takes the name of the flags. While flags is being saved, you don't have the spinlock yet.
How this is related to locking and unlocking in a different function:
Consider two functions that some driver developer might write:
void my_lock(my_object *ctx)
{
spin_lock_irqsave(&ctx->mylock, ctx->myflags); /* BUG */
}
void my_unlock(my_object *ctx)
{
spin_unlock_irqrestore(&ctx->mylock, ctx->myflags);
}
This is a bug because at the time ctx->myflags is written, the lock is not yet held, and it is a shared variable visible to other contexts and processors. The local flags must be saved to a private location on the stack. Then when the lock is owned, by the caller, a copy of the flags can be saved into the exclusively owned object. In other words, it can be fixed like this:
void my_lock(my_object *ctx)
{
unsigned long flags;
spin_lock_irqsave(&ctx->mylock, flag);
ctx->myflags = flags;
}
void my_unlock(my_object *ctx)
{
unsigned long flags = ctx->myflags; /* probably unnecessary */
spin_unlock_irqrestore(&ctx->mylock, flags);
}
If it couldn't be fixed like that, it would be very difficult to implement higher level primitives which need to wrap IRQ spinlocks.
How it could be arch-dependent:
Suppose that spin_lock_irqsave expands into machine code which saves the current flags in some register, then acquires the lock, and then saves that register into specified flags destination. In that case, the buggy code is actually safe. If the expanded code saves the flags into the actual flags object designated by the caller and then tries to acquire the lock, then it's broken.
I have never see that constraint aside from the book. Probably, given information in the book is just outdated, or .. simply wrong.
In the current kernel(and at least since 2.6.32, which I start to work with) actual locking is done through many level of nested calls from spin_lock_irqsave(see, e.g. __raw_spin_lock_irqsave, which is called in the middle). So different function's context for lock and unlock may hardly be a reason for misfunction.

Asyncronous copy to Global memory in OpenCL without waiting for event

I have a program running under OpenCL where after I perform the calculations in private memory, I would like to write them to Global memory. I have no use for the results further down the road-essentially I am looking for a built in solution to write to Global memory from either __local or __private memory asynchronously.
I already tried async_work_group_copy and I noticed that in order to ensure the data is correctly copied I have to wait for the event. For my card AMD HD7970 this is the same as doing a synchronous copy directly to Global memory.
Does anyone have any experience with async_work_group_copy without waiting for the event or any other viable alternative?
for (...) {
//Calculate some results and copy to __local array src
event_t e = async_work_group_copy(dest, src, size, 0);
wait_group_events(1, &e); //Can we safely skip this??
}
Here src is __local and dest is __global.
I suspect that since this function has to be identical for the whole Group, skipping waiting for the event may not work since other local work items may not have finished. This is in a for loop which complicates things further.
I think there isn't much you have to (can) do in this situation. I know that the Intel's GPU implementation will not stall on a global write unless there's a register dependency hazard to soon after the write (e.g. if the program reuses that register too soon after the write, it'll stall until the dependency hazard clears). Sadly, you can't really control register allocation or even see it unfortunately though.

Make a system call to get list of processes

I'm new on modules programming and I need to make a system call to retrieve the system processes and show how much CPU they are consuming.
How can I make this call?
Why would you implement a system call for this? You don't want to add a syscall to the existing Linux API. This is the primary Linux interface to userspace and nobody touches syscalls except top kernel developers who know what they do.
If you want to get a list of processes and their parameters and real-time statuses, use /proc. Every directory that's an integer in there is an existing process ID and contains a bunch of useful dynamic files which ps, top and others use to print their output.
If you want to get a list of processes within the kernel (e.g. within a module), you should know that the processes are kept internally as a doubly linked list that starts with the init process (symbol init_task in the kernel). You should use macros defined in include/linux/sched.h to get processes. Here's an example:
#include <linux/module.h>
#include <linux/printk.h>
#include <linux/sched.h>
static int __init ex_init(void)
{
struct task_struct *task;
for_each_process(task)
pr_info("%s [%d]\n", task->comm, task->pid);
return 0;
}
static void __exit ex_fini(void)
{
}
module_init(ex_init);
module_exit(ex_fini);
This should be okay to gather information. However, don't change anything in there unless you really know what you're doing (which will require a bit more reading).
There are syscalls for that, called open, and read. The information of all processes are all kept in /proc/{pid} directories. You can gather process information by reading corresponding files.
More explained here: http://www.tldp.org/LDP/Linux-Filesystem-Hierarchy/html/proc.html

linux kernel check if process is still running

I'm working in kernel space and I want to find out when an application has stopped or crashed.
When I receive an ioctl call, I can get the struct task_struct where I have a lot of information regarding the process of the application.
My problem is that I want to periodically check if the process is still alive or better yet, to have some asynchronous call when the process is killed.
My test environment was on QEMU and after a while in the application I've run a system("kill -9 pid"). Meanwhile in the kernel I've had a periodical check on task_struct with:
volatile long state; /* -1 unrunnable, 0 runnable, >0 stopped */
static inline int pid_alive(struct task_struct *p)
The problem is that my task_struct pointer seems to be unmodified. Normally I would say that each process has a task_struct and of course it is corespondent with the process state. Otherwise I don't see the point of "volatile long state"
What am I missing? Is it that I'm testing on QEMU, it is that I've tested checking the task_struct in a while(1) with an msleep of 100? Any help would be appreciated.
I would be partially happy if I could receive the pid of the application when the app is closing the file descriptor of the module ("/dev/driver").
Thanks!
You cannot hive off the task_struct pointer and refer to it later. If the process has been killed, the pointer is no longer valid - that task_struct is gone. You also should not be using PID values within the kernel to refer to processes. PID values are re-used, so you might not even be talking about the same process.
Your driver can supply a .release callback, which will be called when your driver file is closed, including if the process is terminated or killed. You can access current from this callback. Note that if a process opens your file and then forks, the process calling .release could well be different from the process that called .open. Your driver must be able to handle this.
It has been a long time since I mucked around inside the kernel. It seems to me if your process actually dies, then your best bet would be to put hooks into the code that tears down processes. If it doesn't die but gets caught in a non-responsive loop, you'd probably be better off causing an application level core dump.
A solution that worked beautifully in my operating systems homework is to use a kprobe to detect when do_exit is called. What's beautiful is that do_exit will always be called, no matter how the process is closed. I think even in the case of a kernel oops this one will still be called.
You should also hook into _do_fork, just in case.
Oh, and look at the .release callback mentioned in the other answer (do note that dup2 and fork will cause unexpected behavior -- you will only be notified when the last of the copies created by these two is closed).

Fixed Memory I don't need to allocate?

I just need a fixed address in any win32 process, where I can store 8 bytes without using any winapi function. I also cannot use assembler prefixes like fs:. and I have no stack pointer.
What I need:
-8 bytes of memory
-constant address and present in any process
-read and write access (via pointer, from the same process)
-should not crash the application (at least not instantly) if modified.
Don't even ask, why I need it.
The only way I'm aware of to do this is to use a DLL with a shared section...
// This goes in a DLL loaded by all apps that want to share the data
#pragma data_seg (".sharedseg")
long long myShared8Bytes = 0; // has to be initialized or this fails
#pragma data_seg()
Then, you add the following to the link command for the dll:
/SECTION:sharedseg,RWS
I am also curious why you want this...
Not that I recommend this, but the PEB probably has some unused or inconsequential fields in it that you could overwrite. I still think this is a terrible idea, though.
constant address and present in any
process
You won't be able to achieve that. Win32 uses paged memory so different processes can access the same memory addresses even though it is different memory.

Resources