Basic mmap implementation - linux-kernel

I am pretty much new to linux and am writing a custom driver for which I need to use mmap. I have found the sample code in many places as below
static int simple_remap_mmap(struct file *filp, struct vm_area_struct *vma)
{
if (remap_pfn_range(vma, vma->vm_start, vm->vm_pgoff,
vma->vm_end - vma->vm_start,
vma->vm_page_prot))
return -EAGAIN;
vma->vm_ops = &simple_remap_vm_ops;
simple_vma_open(vma);
return 0;
}
But I want to know where exactly does this code specify which portion of kernel memory to map e.g if I have allocated a buffer in the kernel using kmalloc pointed to by *kptr how can i mmap that buffer to user space. Thank you.

Related

What is the purpose of the function "blk_rq_map_user" in the NVME disk driver?

I am trying to understand the nvme linux drivers. I am now tackling the function nvme_user_submit_cmd, which I report partially here:
static int nvme_submit_user_cmd(struct request_queue *q,
struct nvme_command *cmd, void __user *ubuffer,
unsigned bufflen, void __user *meta_buffer, unsigned meta_len,
u32 meta_seed, u32 *result, unsigned timeout)
{
bool write = nvme_is_write(cmd);
struct nvme_ns *ns = q->queuedata;
struct gendisk *disk = ns ? ns->disk : NULL;
struct request *req;
struct bio *bio = NULL;
void *meta = NULL;
int ret;
req = nvme_alloc_request(q, cmd, 0, NVME_QID_ANY);
[...]
if (ubuffer && bufflen) {
ret = blk_rq_map_user(q, req, NULL, ubuffer, bufflen,
GFP_KERNEL);
[...]
The ubufferis a pointer to some data in the virtual address space (since this comes from an ioctl command from a user application).
Following blk_rq_map_user I was expecting some sort of mmap mechanism to translate the userspace address into a physical address, but I can't wrap my head around what the function is doing. For reference here's the call chain:
blk_rq_map_user -> import_single_range -> blk_rq_map_user_iov
Following those function just created some more confusion for me and I'd like some help.
The reason I think that this function is doing a sort of mmap is (apart from the name) that this address will be part of the struct request in the struct request queue, which will eventually be processed by the NVME disk driver (https://lwn.net/Articles/738449/) and my guess is that the disk wants the physical address when fulfilling the requests.
However I don't understand how is this mapping done.
ubuffer is a user virtual address, which means it can only be used in the context of a user process, which it is when submit is called. To use that buffer after this call ends, it has to be mapped to one or more physical addresses for the bios/bvecs. The unmap call frees the mapping after the I/O completes. If the device can't directly address the user buffer due to hardware constraints then a bounce buffer will be mapped and a copy of the data will be made.
Edit: note that unless a copy is needed, there is no kernel virtual address mapped to the buffer because the kernel never needs to touch the data.

Create array of struct scatterlist from buffer

I am trying to build an array of type "struct scatterlist", from a buffer pointed by a virtual kernel address (I know the byte size of the buffer, but it may be large). Ideally I would like to have function like init_sg_array_from_buf:
void my_function(void *buffer, int buffer_length)
{
struct scatterlist *sg;
int sg_count;
sg_count = init_sg_array_from_buf(buffer, buffer_length, sg);
}
Which function in the scatterlist api, does something similar? Currently the only possibility I see, is to manually determine the amount of pages, spanned by the buffer. Windows has a kernel macro called "ADDRESS_AND_SIZE_TO_SPAN_PAGES", but I didn't even manage to find something like this in the linux kernel.

Freeing platform driver device struct [duplicate]

I have found devm_kzalloc() and kzalloc() in device driver programmong. But I don't know when/where to use these functions. Can anyone please specify the importance of these functions and their usage.
kzalloc() allocates kernel memory like kmalloc(), but it also zero-initializes the allocated memory. devm_kzalloc() is managed kzalloc(). The memory allocated with managed functions is associated with the device. When the device is detached from the system or the driver for the device is unloaded, that memory is freed automatically. If multiple managed resources (memory or some other resource) were allocated for the device, the resource allocated last is freed first.
Managed resources are very helpful to ensure correct operation of the driver both for initialization failure at any point and for successful initialization followed by the device removal.
Please note that managed resources (whether it's memory or some other resource) are meant to be used in code responsible for the probing the device. They are generally a wrong choice for the code used for opening the device, as the device can be closed without being disconnected from the system. Closing the device requires freeing the resources manually, which defeats the purpose of managed resources.
The memory allocated with kzalloc() should be freed with kfree(). The memory allocated with devm_kzalloc() is freed automatically. It can be freed with devm_kfree(), but it's usually a sign that the managed memory allocation is not a good fit for the task.
In simple words devm_kzalloc() and kzalloc() both are used for memory allocation in device driver but the difference is if you allocate memory by kzalloc() than you have to free that memory when the life cycle of that device driver is ended or when it is unloaded from kernel but if you do the same with devm_kzalloc() you need not to worry about freeing memory,that memory is freed automatically by device library itself.
Both of them does the exactly the same thing but by using devm_kzalloc little overhead of freeing memory is released from programmers
Let explain you by giving example, first example by using kzalloc
static int pxa3xx_u2d_probe(struct platform_device *pdev)
{
int err;
u2d = kzalloc(sizeof(struct pxa3xx_u2d_ulpi), GFP_KERNEL); 1
if (!u2d)
return -ENOMEM;
u2d->clk = clk_get(&pdev->dev, NULL);
if (IS_ERR(u2d->clk)) {
err = PTR_ERR(u2d->clk); 2
goto err_free_mem;
}
...
return 0;
err_free_mem:
kfree(u2d);
return err;
}
static int pxa3xx_u2d_remove(struct platform_device *pdev)
{
clk_put(u2d->clk);
kfree(u2d); 3
return 0;
}
In this example you can this in funtion pxa3xx_u2d_remove(),
kfree(u2d)(line indicated by 3) is there to free memory allocated by u2d
now see the same code by using devm_kzalloc()
static int pxa3xx_u2d_probe(struct platform_device *pdev)
{
int err;
u2d = devm_kzalloc(&pdev->dev, sizeof(struct pxa3xx_u2d_ulpi), GFP_KERNEL);
if (!u2d)
return -ENOMEM;
u2d->clk = clk_get(&pdev->dev, NULL);
if (IS_ERR(u2d->clk)) {
err = PTR_ERR(u2d->clk);
goto err_free_mem;
}
...
return 0;
err_free_mem:
return err;
}
static int pxa3xx_u2d_remove(struct platform_device *pdev)
{
clk_put(u2d->clk);
return 0;
}
there is no kfree() to free function because the same is done by devm_kzalloc()

how to transfer string(char*) in kernel into user process using copy_to_user

I'm making code to transfer string in kernel to usermode using systemcall and copy_to_user
here is my code
kernel
#include<linux/kernel.h>
#include<linux/syscalls.h>
#include<linux/sched.h>
#include<linux/slab.h>
#include<linux/errno.h>
asmlinkage int sys_getProcTagSysCall(pid_t pid, char **tag){
printk("getProcTag system call \n\n");
struct task_struct *task= (struct task_struct*) kmalloc(sizeof(struct task_struct),GFP_KERNEL);
read_lock(&tasklist_lock);
task = find_task_by_vpid(pid);
if(task == NULL )
{
printk("corresponding pid task does not exist\n");
read_unlock(&tasklist_lock);
return -EFAULT;
}
read_unlock(&tasklist_lock);
printk("Corresponding pid task exist \n");
printk("tag is %s\n" , task->tag);
/*
task -> tag : string is stored in task->tag (ex : "abcde")
this part is well worked
*/
if(copy_to_user(*tag, task->tag, sizeof(char) * task->tag_length) !=0)
;
return 1;
}
and this is user
#include<stdio.h>
#include<stdlib.h>
int main()
{
char *ret=NULL;
int pid = 0;
printf("PID : ");
scanf("%4d", &pid);
if(syscall(339, pid, &ret)!=1) // syscall 339 is getProcTagSysCall
printf("pid %d does not exist\n", pid);
else
printf("Corresponding pid tag is %s \n",ret); //my output is %s = null
return 0;
}
actually i don't know about copy_to_user well. but I think copy_to_user(*tag, task->tag, sizeof(char) * task->tag_length) is operated like this code
so i use copy_to_user like above
#include<stdio.h>
int re();
void main(){
char *b = NULL;
if (re(&b))
printf("success");
printf("%s", b);
}
int re(char **str){
char *temp = "Gdg";
*str = temp;
return 1;
}
Is this a college assignment of some sort?
asmlinkage int sys_getProcTagSysCall(pid_t pid, char **tag){
What is this, Linux 2.6? What's up with ** instead of *?
printk("getProcTag system call \n\n");
Somewhat bad. All strings are supposed to be prefixed.
struct task_struct *task= (struct task_struct*) kmalloc(sizeof(struct task_struct),GFP_KERNEL);
What is going on here? Casting malloc makes no sense whatsoever, if you malloc you should have used sizeof(*task) instead, but you should not malloc in the first place. You want to find a task and in fact you just overwrite this pointer's value few lines later anyway.
read_lock(&tasklist_lock);
task = find_task_by_vpid(pid);
find_task_by_vpid requires RCU. The kernel would have told you that if you had debug enabled.
if(task == NULL )
{
printk("corresponding pid task does not exist\n");
read_unlock(&tasklist_lock);
return -EFAULT;
}
read_unlock(&tasklist_lock);
So... you unlock... but you did not get any kind of reference to the task.
printk("Corresponding pid task exist \n");
printk("tag is %s\n" , task->tag);
... in other words by the time you do task->tag, the task may already be gone. What requirements are there to access ->tag itself?
if(copy_to_user(*tag, task->tag, sizeof(char) * task->tag_length) !=0)
;
What's up with this? sizeof(char) is guaranteed to be 1.
I'm really confused by this entire business.
When you have a syscall which copies data to userspace where amount of data is not known prior to the call, teh syscall accepts both buffer AND its size. Then you can return appropriate error if the thingy you are trying to copy would not fit.
However, having a syscall in the first place looks incorrect. In linux per-task data is exposed to userspace in /proc/pid/. Figuring out how to add a file to proc is easy and left as an exercise for the reader.
It's quite obvious from the way you fixed it. copy_to_user() will only copy data between two memory regions - one accessible only to kernel and the other accessible also to user. It will not, however, handle any memory allocation. Userspace buffer has to be already allocated and you should pass address of this buffer to the kernel.
One more thing you can change is to change your syscall to use normal pointer to char instead of pointer to pointer which is useless.
Also note that you are leaking memory in your kernel code. You allocate memory for task_struct using kmalloc and then you override the only pointer you have to this memory when calling find_task_by_vpid() and this memory is never freed. find_task_by_vpid() will return a pointer to a task_struct which already exists in memory so there is no need to allocate any buffer for this.
i solved my problem by making malloc in user
I changed
char *b = NULL;
to
char *b = (char*)malloc(sizeof(char) * 100)
I don't know why this work properly. but as i guess copy_to_user get count of bytes as third argument so I should malloc before assigning a value
I don't know. anyone who knows why adding malloc is work properly tell me

Memory limit for mmap

I am trying to mmap a char device. It works for 65536 bytes. But I get the following error if I try for more memory.
mmap: Resource temporarily unavailable
I want to mmap 1MB memory for a device. I use alloc_chrdev_region, cdev_init, cdev_add for the char device. How can I mmap memory larger than 65K? Should I use block device?
Using the MAP_LOCKED flag in the mmap call can cause this error. The used mlock can return EAGAIN if the amount of memory can not be locked.
From man mmap:
MAP_LOCKED (since Linux 2.5.37) Lock the pages of the mapped region
into memory in the manner of mlock(2). This flag is ignored in older
kernels.
From man mlock:
EAGAIN:
Some or all of the specified address range could not be
locked.
Did you implement *somedevice_mmap()* file operation?
static int somedev_mmap(struct file *filp, struct vm_area_struct *vma)
{
/* Do something. You probably need to use ioremap(). */
return 0;
}
static const struct file_operations somedev_fops = {
.owner = THIS_MODULE,
/* Initialize other file operations. */
.mmap = somedev_mmap,
};

Resources