how to transfer string(char*) in kernel into user process using copy_to_user - linux-kernel

I'm making code to transfer string in kernel to usermode using systemcall and copy_to_user
here is my code
kernel
#include<linux/kernel.h>
#include<linux/syscalls.h>
#include<linux/sched.h>
#include<linux/slab.h>
#include<linux/errno.h>
asmlinkage int sys_getProcTagSysCall(pid_t pid, char **tag){
printk("getProcTag system call \n\n");
struct task_struct *task= (struct task_struct*) kmalloc(sizeof(struct task_struct),GFP_KERNEL);
read_lock(&tasklist_lock);
task = find_task_by_vpid(pid);
if(task == NULL )
{
printk("corresponding pid task does not exist\n");
read_unlock(&tasklist_lock);
return -EFAULT;
}
read_unlock(&tasklist_lock);
printk("Corresponding pid task exist \n");
printk("tag is %s\n" , task->tag);
/*
task -> tag : string is stored in task->tag (ex : "abcde")
this part is well worked
*/
if(copy_to_user(*tag, task->tag, sizeof(char) * task->tag_length) !=0)
;
return 1;
}
and this is user
#include<stdio.h>
#include<stdlib.h>
int main()
{
char *ret=NULL;
int pid = 0;
printf("PID : ");
scanf("%4d", &pid);
if(syscall(339, pid, &ret)!=1) // syscall 339 is getProcTagSysCall
printf("pid %d does not exist\n", pid);
else
printf("Corresponding pid tag is %s \n",ret); //my output is %s = null
return 0;
}
actually i don't know about copy_to_user well. but I think copy_to_user(*tag, task->tag, sizeof(char) * task->tag_length) is operated like this code
so i use copy_to_user like above
#include<stdio.h>
int re();
void main(){
char *b = NULL;
if (re(&b))
printf("success");
printf("%s", b);
}
int re(char **str){
char *temp = "Gdg";
*str = temp;
return 1;
}

Is this a college assignment of some sort?
asmlinkage int sys_getProcTagSysCall(pid_t pid, char **tag){
What is this, Linux 2.6? What's up with ** instead of *?
printk("getProcTag system call \n\n");
Somewhat bad. All strings are supposed to be prefixed.
struct task_struct *task= (struct task_struct*) kmalloc(sizeof(struct task_struct),GFP_KERNEL);
What is going on here? Casting malloc makes no sense whatsoever, if you malloc you should have used sizeof(*task) instead, but you should not malloc in the first place. You want to find a task and in fact you just overwrite this pointer's value few lines later anyway.
read_lock(&tasklist_lock);
task = find_task_by_vpid(pid);
find_task_by_vpid requires RCU. The kernel would have told you that if you had debug enabled.
if(task == NULL )
{
printk("corresponding pid task does not exist\n");
read_unlock(&tasklist_lock);
return -EFAULT;
}
read_unlock(&tasklist_lock);
So... you unlock... but you did not get any kind of reference to the task.
printk("Corresponding pid task exist \n");
printk("tag is %s\n" , task->tag);
... in other words by the time you do task->tag, the task may already be gone. What requirements are there to access ->tag itself?
if(copy_to_user(*tag, task->tag, sizeof(char) * task->tag_length) !=0)
;
What's up with this? sizeof(char) is guaranteed to be 1.
I'm really confused by this entire business.
When you have a syscall which copies data to userspace where amount of data is not known prior to the call, teh syscall accepts both buffer AND its size. Then you can return appropriate error if the thingy you are trying to copy would not fit.
However, having a syscall in the first place looks incorrect. In linux per-task data is exposed to userspace in /proc/pid/. Figuring out how to add a file to proc is easy and left as an exercise for the reader.

It's quite obvious from the way you fixed it. copy_to_user() will only copy data between two memory regions - one accessible only to kernel and the other accessible also to user. It will not, however, handle any memory allocation. Userspace buffer has to be already allocated and you should pass address of this buffer to the kernel.
One more thing you can change is to change your syscall to use normal pointer to char instead of pointer to pointer which is useless.
Also note that you are leaking memory in your kernel code. You allocate memory for task_struct using kmalloc and then you override the only pointer you have to this memory when calling find_task_by_vpid() and this memory is never freed. find_task_by_vpid() will return a pointer to a task_struct which already exists in memory so there is no need to allocate any buffer for this.

i solved my problem by making malloc in user
I changed
char *b = NULL;
to
char *b = (char*)malloc(sizeof(char) * 100)
I don't know why this work properly. but as i guess copy_to_user get count of bytes as third argument so I should malloc before assigning a value
I don't know. anyone who knows why adding malloc is work properly tell me

Related

vmalloc() allocates from vm_struct list

Kernel document https://www.kernel.org/doc/gorman/html/understand/understand010.html says, that for vmalloc-ing
It searches through a linear linked list of vm_structs and returns a new struct describing the allocated region.
Does that mean vm_struct list is already created while booting up, just like kmem_cache_create and vmalloc() just adjusts the page entries? If that is the case, say if I have a 16GB RAM in x86_64 machine, the whole ZONE_NORMAL i.e
16GB - ZONE_DMA - ZONE_DMA32 - slab-memory(cache/kmalloc)
is used to create vm_struct list?
That document is fairly old. It's talking about Linux 2.5-2.6. Things have changed quite a bit with those functions from what I can tell. I'll start by talking about code from kernel 2.6.12 since that matches Gorman's explanation and is the oldest non-rc tag in the Linux kernel Github repo.
The vm_struct list that the document is referring to is called vmlist. It is created here as a struct pointer:
struct vm_struct *vmlist;
Trying to figure out if it is initialized with any structs during bootup took some deduction. The easiest way to figure it out was by looking at the function get_vmalloc_info() (edited for brevity):
if (!vmlist) {
vmi->largest_chunk = VMALLOC_TOTAL;
}
else {
vmi->largest_chunk = 0;
prev_end = VMALLOC_START;
for (vma = vmlist; vma; vma = vma->next) {
unsigned long addr = (unsigned long) vma->addr;
if (addr >= VMALLOC_END)
break;
vmi->used += vma->size;
free_area_size = addr - prev_end;
if (vmi->largest_chunk < free_area_size)
vmi->largest_chunk = free_area_size;
prev_end = vma->size + addr;
}
if (VMALLOC_END - prev_end > vmi->largest_chunk)
vmi->largest_chunk = VMALLOC_END - prev_end;
}
The logic says that if the vmlist pointer is equal to NULL (!NULL), then there are no vm_structs on the list and the largest_chunk of free memory in this VMALLOC area is the entire space, hence VMALLOC_TOTAL. However, if there is something on the vmlist, then figure out the largest chunk based on the difference between the address of the current vm_struct and the end of the previous vm_struct (i.e. free_area_size = addr - prev_end).
What this tells us is that when we vmalloc, we look through the vmlist to find the absence of a vm_struct in a virtual memory area big enough to accomodate our request. Only then can it create this new vm_struct, which will now be part of the vmlist.
vmalloc will eventually call __get_vm_area(), which is where the action happens:
for (p = &vmlist; (tmp = *p) != NULL ;p = &tmp->next) {
if ((unsigned long)tmp->addr < addr) {
if((unsigned long)tmp->addr + tmp->size >= addr)
addr = ALIGN(tmp->size +
(unsigned long)tmp->addr, align);
continue;
}
if ((size + addr) < addr)
goto out;
if (size + addr <= (unsigned long)tmp->addr)
goto found;
addr = ALIGN(tmp->size + (unsigned long)tmp->addr, align);
if (addr > end - size)
goto out;
}
found:
area->next = *p;
*p = area;
By this point in the function we have already created a new vm_struct named area. This for loop just needs to find where to put the struct in the list. If the vmlist is empty, we skip the loop and immediately execute the "found" lines, making *p (the vmlist) point to our struct. Otherwise, we need to find the struct that will go after ours.
So in summary, this means that even though the vmlist pointer might be created at boot time, the list isn't necessarily populated at boot time. That is, unless there are vmalloc calls during boot or functions that explicitly add vm_structs to the list during boot as in future kernel versions (see below for kernel 6.0.9).
One further clarification for you. You asked if ZONE_NORMAL is used for the vmlist, but those are two separate memory address spaces. ZONE_NORMAL is describing physical memory whereas vm is virtual memory. There are lots of resources for explaining the difference between the two (e.g. this Stack Overflow question). The specific virtual memory address range for vmlist goes from VMALLOC_START to VMALLOC_END. In x86, those were defined as:
#define VMALLOC_START 0xffffc20000000000UL
#define VMALLOC_END 0xffffe1ffffffffffUL
For kernel version 6.0.9:
The creation of the vm_struct list is here:
static struct vm_struct *vmlist __initdata;
At this point, there is nothing on the list. But in this kernel version there are a few boot functions that may add structs to the list:
void __init vm_area_add_early(struct vm_struct *vm)
void __init vm_area_register_early(struct vm_struct *vm, size_t align)
As for vmalloc in this version, the vmlist is now only a list used during initialization. get_vm_area() now calls get_vm_area_node(), which is a NUMA ready function. From there, the logic goes deeper and is much more complicated than the linear search described above.

Trap memory accesses inside a standard executable built with MinGW

So my problem sounds like this.
I have some platform dependent code (embedded system) which writes to some MMIO locations that are hardcoded at specific addresses.
I compile this code with some management code inside a standard executable (mainly for testing) but also for simulation (because it takes longer to find basic bugs inside the actual HW platform).
To alleviate the hardcoded pointers, i just redefine them to some variables inside the memory pool. And this works really well.
The problem is that there is specific hardware behavior on some of the MMIO locations (w1c for example) which makes "correct" testing hard to impossible.
These are the solutions i thought of:
1 - Somehow redefine the accesses to those registers and try to insert some immediate function to simulate the dynamic behavior. This is not really usable since there are various ways to write to the MMIO locations (pointers and stuff).
2 - Somehow leave the addresses hardcoded and trap the illegal access through a seg fault, find the location that triggered, extract exactly where the access was made, handle and return. I am not really sure how this would work (and even if it's possible).
3 - Use some sort of emulation. This will surely work, but it will void the whole purpose of running fast and native on a standard computer.
4 - Virtualization ?? Probably will take a lot of time to implement. Not really sure if the gain is justifiable.
Does anyone have any idea if this can be accomplished without going too deep? Maybe is there a way to manipulate the compiler in some way to define a memory area for which every access will generate a callback. Not really an expert in x86/gcc stuff.
Edit: It seems that it's not really possible to do this in a platform independent way, and since it will be only windows, i will use the available API (which seems to work as expected). Found this Q here:
Is set single step trap available on win 7?
I will put the whole "simulated" register file inside a number of pages, guard them, and trigger a callback from which i will extract all the necessary info, do my stuff then continue execution.
Thanks all for responding.
I think #2 is the best approach. I routinely use approach #4, but I use it to test code that is running in the kernel, so I need a layer below the kernel to trap and emulate the accesses. Since you have already put your code into a user-mode application, #2 should be simpler.
The answers to this question may provide help in implementing #2. How to write a signal handler to catch SIGSEGV?
What you really want to do, though, is to emulate the memory access and then have the segv handler return to the instruction after the access. This sample code works on Linux. I'm not sure if the behavior it is taking advantage of is undefined, though.
#include <stdint.h>
#include <stdio.h>
#include <signal.h>
#define REG_ADDR ((volatile uint32_t *)0x12340000f000ULL)
static uint32_t read_reg(volatile uint32_t *reg_addr)
{
uint32_t r;
asm("mov (%1), %0" : "=a"(r) : "r"(reg_addr));
return r;
}
static void segv_handler(int, siginfo_t *, void *);
int main()
{
struct sigaction action = { 0, };
action.sa_sigaction = segv_handler;
action.sa_flags = SA_SIGINFO;
sigaction(SIGSEGV, &action, NULL);
// force sigsegv
uint32_t a = read_reg(REG_ADDR);
printf("after segv, a = %d\n", a);
return 0;
}
static void segv_handler(int, siginfo_t *info, void *ucontext_arg)
{
ucontext_t *ucontext = static_cast<ucontext_t *>(ucontext_arg);
ucontext->uc_mcontext.gregs[REG_RAX] = 1234;
ucontext->uc_mcontext.gregs[REG_RIP] += 2;
}
The code to read the register is written in assembly to ensure that both the destination register and the length of the instruction are known.
This is how the Windows version of prl's answer could look like:
#include <stdint.h>
#include <stdio.h>
#include <windows.h>
#define REG_ADDR ((volatile uint32_t *)0x12340000f000ULL)
static uint32_t read_reg(volatile uint32_t *reg_addr)
{
uint32_t r;
asm("mov (%1), %0" : "=a"(r) : "r"(reg_addr));
return r;
}
static LONG WINAPI segv_handler(EXCEPTION_POINTERS *);
int main()
{
SetUnhandledExceptionFilter(segv_handler);
// force sigsegv
uint32_t a = read_reg(REG_ADDR);
printf("after segv, a = %d\n", a);
return 0;
}
static LONG WINAPI segv_handler(EXCEPTION_POINTERS *ep)
{
// only handle read access violation of REG_ADDR
if (ep->ExceptionRecord->ExceptionCode != EXCEPTION_ACCESS_VIOLATION ||
ep->ExceptionRecord->ExceptionInformation[0] != 0 ||
ep->ExceptionRecord->ExceptionInformation[1] != (ULONG_PTR)REG_ADDR)
return EXCEPTION_CONTINUE_SEARCH;
ep->ContextRecord->Rax = 1234;
ep->ContextRecord->Rip += 2;
return EXCEPTION_CONTINUE_EXECUTION;
}
So, the solution (code snippet) is as follows:
First of all, i have a variable:
__attribute__ ((aligned (4096))) int g_test;
Second, inside my main function, i do the following:
AddVectoredExceptionHandler(1, VectoredHandler);
DWORD old;
VirtualProtect(&g_test, 4096, PAGE_READWRITE | PAGE_GUARD, &old);
The handler looks like this:
LONG WINAPI VectoredHandler(struct _EXCEPTION_POINTERS *ExceptionInfo)
{
static DWORD last_addr;
if (ExceptionInfo->ExceptionRecord->ExceptionCode == STATUS_GUARD_PAGE_VIOLATION) {
last_addr = ExceptionInfo->ExceptionRecord->ExceptionInformation[1];
ExceptionInfo->ContextRecord->EFlags |= 0x100; /* Single step to trigger the next one */
return EXCEPTION_CONTINUE_EXECUTION;
}
if (ExceptionInfo->ExceptionRecord->ExceptionCode == STATUS_SINGLE_STEP) {
DWORD old;
VirtualProtect((PVOID)(last_addr & ~PAGE_MASK), 4096, PAGE_READWRITE | PAGE_GUARD, &old);
return EXCEPTION_CONTINUE_EXECUTION;
}
return EXCEPTION_CONTINUE_SEARCH;
}
This is only a basic skeleton for the functionality. Basically I guard the page on which the variable resides, i have some linked lists in which i hold pointers to the function and values for the address in question. I check that the fault generating address is inside my list then i trigger the callback.
On first guard hit, the page protection will be disabled by the system, but i can call my PRE_WRITE callback where i can save the variable state. Because a single step is issued through the EFlags, it will be followed immediately by a single step exception (which means that the variable was written), and i can trigger a WRITE callback. All the data required for the operation is contained inside the ExceptionInformation array.
When someone tries to write to that variable:
*(int *)&g_test = 1;
A PRE_WRITE followed by a WRITE will be triggered,
When i do:
int x = *(int *)&g_test;
A READ will be issued.
In this way i can manipulate the data flow in a way that does not require modifications of the original source code.
Note: This is intended to be used as part of a test framework and any penalty hit is deemed acceptable.
For example, W1C (Write 1 to clear) operation can be accomplished:
void MYREG_hook(reg_cbk_t type)
{
/** We need to save the pre-write state
* This is safe since we are assured to be called with
* both PRE_WRITE and WRITE in the correct order
*/
static int pre;
switch (type) {
case REG_READ: /* Called pre-read */
break;
case REG_PRE_WRITE: /* Called pre-write */
pre = g_test;
break;
case REG_WRITE: /* Called after write */
g_test = pre & ~g_test; /* W1C */
break;
default:
break;
}
}
This was possible also with seg-faults on illegal addresses, but i had to issue one for each R/W, and keep track of a "virtual register file" so a bigger penalty hit. In this way i can only guard specific areas of memory or none, depending on the registered monitors.

Find out the process name by pid in osx kernel extension

I am working on kernel extension and want to find out how to find process name by pid in kernel extension
This code works great in user space
static char procdata[4096];
int mib[3] = { CTL_KERN, KERN_PROCARGS, pid };
procdata[0] = '\0'; // clear
size_t size = sizeof(procdata);
if (sysctl(mib, 3, procdata, &size, NULL, 0)) {
return ERROR(ERROR_INTERNAL);
}
procdata[sizeof(procdata)-2] = ':';
procdata[sizeof(procdata)-1] = '\0';
ret = procdata;
return SUCCESS;
but for the kernel space, there are errors such as "Use of undeclared identifier 'CTL_KERN'" (even if I add #include )
What is the correct way to do it in kernel extension?
The Kernel.framework header <sys/proc.h> is what you're looking for.
In particular, you can use proc_name() to get a process's name given its PID:
/* this routine copies the process's name of the executable to the passed in buffer. It
* is always null terminated. The size of the buffer is to be passed in as well. This
* routine is to be used typically for debugging
*/
void proc_name(int pid, char * buf, int size);
Note however, that the name will be truncated to MAXCOMLEN - 16 bytes.
You might also be able to use the sysctl via sysctlbyname() from the kernel. In my experience, that function doesn't work well though, as the sysctl buffer memory handling isn't expecting buffers in kernel address space, so most types of sysctl will cause a kernel panic if called from a non-kernel thread. It also doesn't seem to work for all sysctls.

Regarding how the parameters to the read function is passed in simple char driver

I am newbei to driver programming i am started writing the simple char driver . Then i created special file for my char driver mknod /dev/simple-driver c 250 0 .when it type cat /dev/simple-driver. it shows the string "Hello world from Kernel mode!". i know that function
static const char g_s_Hello_World_string[] = "Hello world tamil_vanan!\n\0";
static const ssize_t g_s_Hello_World_size = sizeof(g_s_Hello_World_string);
static ssize_t device_file_read(
struct file *file_ptr
, char __user *user_buffer
, size_t count
, loff_t *possition)
{
printk( KERN_NOTICE "Simple-driver: Device file is read at offset =
%i, read bytes count = %u", (int)*possition , (unsigned int)count );
if( *possition >= g_s_Hello_World_size )
return 0;
if( *possition + count > g_s_Hello_World_size )
count = g_s_Hello_World_size - *possition;
if( copy_to_user(user_buffer, g_s_Hello_World_string + *possition, count) != 0 )
return -EFAULT;
*possition += count;
return count;
}
is get called . This is mapped to (*read) in file_opreation structure of my driver .My question is how this function is get called , how the parameters like struct file,char,count, offset are passed bcoz is i simply typed cat command ..Please elabroate how this happening
In Linux all are considered as files. The type of file, whether it is a driver file or normal file depends upon the mount point where it is mounted.
For Eg: If we consider your case : cat /dev/simple-driver traverses back to the mount point of device files.
From the device file name simple-driver it retrieves Major and Minor number.
From those number(especially from minor number) it associates the driver file for your character driver.
From the driver it uses struct file ops structure to find the read function, which is nothing but your read function:
static ssize_t device_file_read(struct file *file_ptr, char __user *user_buffer, size_t count, loff_t *possition)
User_buffer will always take sizeof(size_t count).It is better to keep a check of buffer(In some cases it throws warning)
String is copied to User_buffer(copy_to_user is used to check kernel flags during copy operation).
postion is 0 for first copy and it increments in the order of count:position+=count.
Once read function returns the buffer to cat. and cat flushes the buffer contents on std_out which is nothing but your console.
cat will use some posix version of read call from glibc. Glibc will put the arguments on the stack or in registers (this depends on your hardware architecture) and will switch to kernel mode. In the kernel the values will be copied to the kernel stack. And in the end your read function will be called.

How to know the buffer size passed from user space?

I'm trying to develop a new syscall for the linux kernel. This syscall will write info on the user buffer that is taken as argument, e.g.:
asmlinkage int new_syscall(..., char *buffer,...){...}
In user space this buffer is statically allocated as:
char buffer[10000];
There's a way (as sizeof() in the user level) to know the whole buffer size (10000 in this case)?
I have tried strlen_user(buffer) but it returns me the length of the string that is currently into the buffer, so if the buffer is empty it returns 0.
You can try passing structure which will contain the buffer pointer & the size of the buffer. But the same structure should also be defined in both user-space application & inside your system-call's code in kernel.
struct new_struct
{
void *p; //set this pointer to your buffer...
int size;
};
//from user-application...
int main()
{
....
struct new_struct req_kernel;
your_system_call_function(...,(void *)&req_kernel,...);
}
........................................................................................
//this is inside your kernel...
your_system_call(...,char __user optval,...)
{
.....
struct new_struct req;
if (copy_from_user(&req, optval, sizeof(req)))
return -EFAULT;
//now you have the address or pointer & size of the buffer
//which you want in kernel with struct req...
}

Resources