dma_common_mmap documentation to let user read/write physical address - linux-kernel

I am trying to write a Linux kernel module to map some address back to the user using dma_common_mmap(). I then want the user to mmap and write/read the address space.
My main problem now is that I can't find the documentation for dma_common_mmap(), does any exist? I have googled but didn't find out how to use it and let the user read/write the address.

The documentation for dma_common_mmap() doesn't exist. But you can look at Doxygen comment for dma_mmap_attrs() function:
/**
* dma_mmap_attrs - map a coherent DMA allocation into user space
* #dev: valid struct device pointer, or NULL for ISA and EISA-like devices
* #vma: vm_area_struct describing requested user mapping
* #cpu_addr: kernel CPU-view address returned from dma_alloc_attrs
* #handle: device-view address returned from dma_alloc_attrs
* #size: size of memory originally requested in dma_alloc_attrs
* #attrs: attributes of mapping properties requested in dma_alloc_attrs
*
* Map a coherent DMA buffer previously allocated by dma_alloc_attrs
* into user space. The coherent DMA buffer must not be freed by the
* driver until the user space mapping has been released.
*/
static inline int
dma_mmap_attrs(struct device *dev, struct vm_area_struct *vma, void *cpu_addr,
dma_addr_t dma_addr, size_t size, struct dma_attrs *attrs)
{
struct dma_map_ops *ops = get_dma_ops(dev);
BUG_ON(!ops);
if (ops->mmap)
return ops->mmap(dev, vma, cpu_addr, dma_addr, size, attrs);
return dma_common_mmap(dev, vma, cpu_addr, dma_addr, size);
}
#define dma_mmap_coherent(d, v, c, h, s) dma_mmap_attrs(d, v, c, h, s, NULL)
dma_mmap_attrs() calls in turn dma_common_mmap(), so all the documentation (except for attrs param) applies to dma_common_mmap() as is.
EDIT
I think you should use dma_mmap_coherent() (along with dma_alloc_coherent()), which does pretty much the same as dma_common_mmap() (see code above). See this example to get some clue on how to use it both in kernel side and in user-space. See also how dma_mmap_coherent() is used in ALSA kernel code, in snd_pcm_lib_default_mmap() function.

Related

What is the purpose of the function "blk_rq_map_user" in the NVME disk driver?

I am trying to understand the nvme linux drivers. I am now tackling the function nvme_user_submit_cmd, which I report partially here:
static int nvme_submit_user_cmd(struct request_queue *q,
struct nvme_command *cmd, void __user *ubuffer,
unsigned bufflen, void __user *meta_buffer, unsigned meta_len,
u32 meta_seed, u32 *result, unsigned timeout)
{
bool write = nvme_is_write(cmd);
struct nvme_ns *ns = q->queuedata;
struct gendisk *disk = ns ? ns->disk : NULL;
struct request *req;
struct bio *bio = NULL;
void *meta = NULL;
int ret;
req = nvme_alloc_request(q, cmd, 0, NVME_QID_ANY);
[...]
if (ubuffer && bufflen) {
ret = blk_rq_map_user(q, req, NULL, ubuffer, bufflen,
GFP_KERNEL);
[...]
The ubufferis a pointer to some data in the virtual address space (since this comes from an ioctl command from a user application).
Following blk_rq_map_user I was expecting some sort of mmap mechanism to translate the userspace address into a physical address, but I can't wrap my head around what the function is doing. For reference here's the call chain:
blk_rq_map_user -> import_single_range -> blk_rq_map_user_iov
Following those function just created some more confusion for me and I'd like some help.
The reason I think that this function is doing a sort of mmap is (apart from the name) that this address will be part of the struct request in the struct request queue, which will eventually be processed by the NVME disk driver (https://lwn.net/Articles/738449/) and my guess is that the disk wants the physical address when fulfilling the requests.
However I don't understand how is this mapping done.
ubuffer is a user virtual address, which means it can only be used in the context of a user process, which it is when submit is called. To use that buffer after this call ends, it has to be mapped to one or more physical addresses for the bios/bvecs. The unmap call frees the mapping after the I/O completes. If the device can't directly address the user buffer due to hardware constraints then a bounce buffer will be mapped and a copy of the data will be made.
Edit: note that unless a copy is needed, there is no kernel virtual address mapped to the buffer because the kernel never needs to touch the data.

Create array of struct scatterlist from buffer

I am trying to build an array of type "struct scatterlist", from a buffer pointed by a virtual kernel address (I know the byte size of the buffer, but it may be large). Ideally I would like to have function like init_sg_array_from_buf:
void my_function(void *buffer, int buffer_length)
{
struct scatterlist *sg;
int sg_count;
sg_count = init_sg_array_from_buf(buffer, buffer_length, sg);
}
Which function in the scatterlist api, does something similar? Currently the only possibility I see, is to manually determine the amount of pages, spanned by the buffer. Windows has a kernel macro called "ADDRESS_AND_SIZE_TO_SPAN_PAGES", but I didn't even manage to find something like this in the linux kernel.

Allocation and usage of cuda device variable in different functions

i am quite new to CUDA and I have a question regarding the memory management for an object. I have an object function to load the data to the device and if another object function is called the computation is carried out.
I have read some parts of the NVIDIA programming guide and some SO questions but they do data copying and computing in a single function so there no need of multiple functions.
Some more specifications:
The data is read one time. I do not know the data size at compile time therefore I need a dynamic allocation. My current device has a compute capability of 2.1 (will be updated soon to 6.1).
I want to copy the data in a first function and use the data in a different function. For example:
__constant__ int dev_size;
__device__ float* dev_data; //<- not sure about this
/* kernel */
__global__ void computeSomething(float* dev_output)
{
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx < dev_size)
{
dev_output[idx] = dev_data[idx]*100; // some computation;
}
}
// function 1
void OBJECT::copyVolumeToGPU(int size, float* data)
{
cudaMalloc(&dev_data, size * sizeof(float));
cudaMemcpy(dev_data, data, size * sizeof(float), cudaMemcpyHostToDevice );
cudaMemcpyToSymbol(dev_size, size, sizeof(int));
}
// function 2
void OBJECT::computeSmthOnDevice(int size)
{
// allocate output array
auto host_output = new float[size];
float* dev_output;
cudaMalloc(&dev_output, size * sizeof(float));
int block = 256;
int grid = ceil(size/block);
computeSomething<<<grid,block>>>(dev_output);
cudaMemcpy(host_output, dev_data, size * sizeof(float), cudaMemcpyDeviceToHost);
/* ... do something with output ... */
delete[] host_output;
cudaFree(dev_output);
}
gpuErrChk is carried out this way: https://stackoverflow.com/a/14038590/3921660 but omitted in this example.
Can I copy the data using a __device__pointer (like __device__ float* dev_data;)?
Generally, your idea is workable, but this:
cudaMalloc(&dev_data, size * sizeof(float));
is not legal. It is not legal to take an address of a __device__ item in host code. So if you know the size at compile time, the easiest approach is to convert this to a static allocation e.g.
__device__ float dev_data[1000];
If you really want to make this a dynamically allocated __device__ pointer, then you will need to use a method such as described here, which involves using cudaMalloc on a typical device pointer in host code that is a "temporary", then copy that "temporary" pointer to the __device__ pointer via cudaMemcpyToSymbol. And then when you want to copy data to/from that particular allocation via cudaMemcpy, you would use cudaMemcpy to/from the temporary pointer from host code.
Note that for the purposes of "communicating" data from one function to the next, or one kernel to the next, there's no reason you couldn't just use a dynamically allocated pointer from cudaMemcpy, and pass that pointer around to wherever you need it. You can even pass it via a global variable to any host function that needs it, like an ordinary global pointer. For kernels, however, you would still need to pass such a global pointer to the kernel via kernel argument.

how to read the value of variable type dma_addr_t

dma_alloc_coherent() returns a pointer for storing any data. And this function takes a variable of type dma_addr_t and it is used for DMA operations. So I want to read this value before DMA operation starts.
In according to DMA-API.txt dma_alloc_coherent() returns address in CPU virtual space. Meanwhile dma_handle is the address of the same region which may be used by the device that does actual DMA. In case you would like to get that value just use it as an integer that can contain such value, or print it as showed below:
dma_addr_t handle;
void *cpu_addr;
cpu_addr = dma_alloc_coherent(…, &handle, …);
pr_info("%s: got DMA address: %pad\n", __func__, &handle);

Memory limit for mmap

I am trying to mmap a char device. It works for 65536 bytes. But I get the following error if I try for more memory.
mmap: Resource temporarily unavailable
I want to mmap 1MB memory for a device. I use alloc_chrdev_region, cdev_init, cdev_add for the char device. How can I mmap memory larger than 65K? Should I use block device?
Using the MAP_LOCKED flag in the mmap call can cause this error. The used mlock can return EAGAIN if the amount of memory can not be locked.
From man mmap:
MAP_LOCKED (since Linux 2.5.37) Lock the pages of the mapped region
into memory in the manner of mlock(2). This flag is ignored in older
kernels.
From man mlock:
EAGAIN:
Some or all of the specified address range could not be
locked.
Did you implement *somedevice_mmap()* file operation?
static int somedev_mmap(struct file *filp, struct vm_area_struct *vma)
{
/* Do something. You probably need to use ioremap(). */
return 0;
}
static const struct file_operations somedev_fops = {
.owner = THIS_MODULE,
/* Initialize other file operations. */
.mmap = somedev_mmap,
};

Resources