kernel and user space sync - caching

I have memory area mapped to user space with do_mmap_pgoff() and remap_pfn_range() and I have the same area mapped to kernel with ioremap().
When I write to this area from user space and then read from kernel space I see that not all bytes was written to memory area.
When I write from user space then read from user and after that read from kernel everything fine. Reading from user space pushing changes made previously.
I understand that cache or buffer exist between kernel and user spaces. I understand that I need to implement some flush-invalidate or buffer dump to memory area.
I tried to make this VMA uncached with pgprot_uncached(), I tried to implement outer cache range flush-invalidate, VMA cache range flush, VMA tlb range flush but it all dont work as I expected. All flush-inval operations just clears memory area but I need to apply changes made from user space. Using uncached memory slows up the process of data transferring.
How to do that synchronization between user and kernel correctly?

I have nearly the same question as you.
I use a shared memory region to pass data between kernel and user space. In kernel, I directly use physical address to access data. In user space, I open /dev/mem and mmap it to read/write.
And problem comes: When I write data to address A from user space, the kernel may not receive the data, and even covers data in A with it's previous value. I think CPU cache may cause this problem.
Here is my solution:
I open /dev/mem like this:
fd = open("/dev/mem", O_RDWR);
NOT this:
fd = open("/dev/mem", O_RDWR | O_SYNC);
And problem solved.

Related

Difference between copying user space memory and mapping userspace memory

What is the difference between copying from user space buffer to kernel space buffer and, mapping user space buffer to kernel space buffer and then copying kernel space buffer to another kernel data structure?
What I meant to say is:
The first method is copy_from_user() function.
The second method is say, a user space buffer is mapped to kernel space and the kernel is passed with physical address(say using /proc/self/pagemap), then kernel space calls phys_to_virt() on the passed physical address to get it's corresponding kernel virtual address. Then kernel copies the data from one of its data structures say skb_buff to the kernel virtual address it got from the call to phys_to_virt() call.
Note: phys_to_virt() adds an offset of 0xc0000000 to the passed physical address to get kernel virtual address, right?
The second method describes the functionality in DPDK for KNI module and they say in documentation that it eliminates the overhead of copying from user space to kernel space. Please explain me how.
It really depends on what you're trying to accomplish, but still some differences I can think about?
To begin with, copy_from_user has some built-in security checks that should be considered.
While mapping your data "manually" to kernel space enables you to read from it continuously, and maybe monitor something that the user process is doing to the data in that page, while using the copy_to_user method will require constantly calling it to be aware of changes.
Can you elaborate on what you are trying to do?

Physical Memory Allocation in Kernel

I am writting a Kernel Module that is going to trigger and external PCIe device to read a block of data from my internel memory. To do this I need to send the PCIe device a pointer to the physical memory address of the data that I would like to send. Ultimately this data is going to be written from Userspace to the kernel with the write() function (userspace) and copy_from_user() (kernel space). As I understand it, the address that my kernel module will see is still a virtual memory address. I need a way to get the physical address of it so that the PCIe device can find it.
1) Can I just use mmap() from userspace and place my data in a known location in DDR memory, instead of using copy_from_user()? I do not want to accidently overwrite another processes data in memory though.
2) My kernel module reserves PCIe data space at initialization using ioremap_nocache(), can I do the same from my kernel module or is it a bad idea to treat this memory as io memory? If I can, what would happen if the memory that I try to reserve is already in use? I do not want to hard code a static memory location and then find out that it is in use.
Thanks in advance for you help.
You don't choose a memory location and put your data there. Instead, you ask the kernel to tell you the location of your data in physical memory, and tell the board to read that location. Each page of memory (4KB) will be at a different physical location, so if you are sending more data than that, your device likely supports "scatter gather" DMA, so it can read a sequence of pages at different locations in memory.
The API is this: dma_map_page() to return a value of type dma_addr_t, which you can give to the board. Then dma_unmap_page() when the transfer is finished. If you're doing scatter-gather, you'll put that value instead in the list of descriptors that you feed to the board. Again if scatter-gather is supported, dma_map_sg() and friends will help with this mapping of a large buffer into a set of pages. It's still your responsibility to set up the page descriptors in the format expected by your device.
This is all very well written up in Linux Device Drivers (Chapter 15), which is required reading. http://lwn.net/images/pdf/LDD3/ch15.pdf. Some of the APIs have changed from when the book was written, but the concepts remain the same.
Finally, mmap(): Sure, you can allocate a kernel buffer, mmap() it out to user space and fill it there, then dma_map that buffer for transmission to the device. This is in fact probably the cleanest way to avoid copy_from_user().

User space mmap and driver space mmap point to different addresses..?

[I am a newbie to device driver programming, so requesting people to be patient]
I am writing a character device driver, and I am trying to mmap some portion of the allocated memory in the driver to the user space.
In the init_module() function, I allocate the some buffer space like this -
buf = (char*)vmalloc_user(SIZE_OF_BUFFER);
buf now points to some address.
Now, in the driver's mmap function, I set the VM_RESERVED flag, and call
remap_vmalloc_range(vma, (void*)buf, 0);
Then I create a character device file in /dev with the correct major number.
Now I create a simple program in the user space to open the character device file, then call mmap() and read data from this mmap'ed memory.
In the call to mmap() in userspace, I know there is an option where we can pass the start address of the area. But is there a way the user space mmap can point to the same address as done by the buf in the driver space?
I think that because the address of buf in the driver space is different from the one returned by mmap() in the user space, my user space program ends up reading junk values. Is there any other way than actually entering the address in the mmap() in the user space to solve this problem?
You pretty much have to design your driver interface so that the userspace map address doesn't matter. This means, for example, not storing pointers in an mmap region that's accessed outside of a single userspace process.
Typically, you'd store offsets from the base mapped address instead of full pointers. The kernel driver and userspace code can both add these offsets to their base pointers, and get to the virtual address that's right for their respective contexts.

How remap_pfn_range remaps kernel memory to user space?

remap_pfn_range function (used in mmap call in driver) can be used to map kernel memory to user space. How is it done? Can anyone explain precise steps? Kernel Mode is a privileged mode (PM) while user space is non privileged (NPM). In PM CPU can access all memory while in NPM some memory is restricted - cannot be accessed by CPU. When remap_pfn_range is called, how is that range of memory which was restricted only to PM is now accessible to user space?
Looking at remap_pfn_range code there is pgprot_t struct. This is protection mapping related struct. What is protection mapping? Is it the answer to above question?
It's simple really, kernel memory (usually) simply has a page table entry with the architecture specific bit that says: "this page table entry is only valid while the CPU is in kernel mode".
What remap_pfn_range does is create another page table entry, with a different virtual address to the same physical memory page that doesn't have that bit set.
Usually, it's a bad idea btw :-)
The core of the mechanism is page table MMU:
Related image1 http://windowsitpro.com/content/content/3686/figure_01.gif
or this:
Both picture above are characteristics of x86 hardware memory MMU, nothing to do with Linux kernel.
Below described how the VMAs is linked to the process's task_struct:
Related image http://image9.360doc.com/DownloadImg/2010/05/0320/3083800_2.gif
(source: slideplayer.com)
And looking into the function itself here:
http://lxr.free-electrons.com/source/mm/memory.c#L1756
The data in physical memory can be accessed by the kernel through the kernel's PTE, as shown below:
(source: tldp.org)
But after calling remap_pfn_range() a PTE (for an existing kernel memory but to be used in userspace to access it) is derived (with different page protection flags). The process's VMA memory will be updated to use this PTE to access the same memory - thus minimizing the need to waste memory by copying. But kernel and userspace PTE have different attributes - which is used to control the access to the physical memory, and the VMA will also specified the attributes at the process level:
vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP;

How can I access memory at known physical address inside a kernel module?

I am trying to work on this problem: A user spaces program keeps polling a buffer to get requests from a kernel module and serves it and then responses to the kernel.
I want to make the solution much faster, so instead of creating a device file and communicating via it, I allocate a memory buffer from the user space and mark it as pinned, so the memory pages never get swapped out. Then the user space invokes a special syscall to tell the kernel about the memory buffer so that the kernel module can get the physical address of that buffer. (because the user space program may be context-switched out and hence the virtual address means nothing if the kernel module accesses the buffer at that time.)
When the module wants to send request, it needs put the request to the buffer via physical address. The question is: How can I access the buffer inside the kernel module via its physical address.
I noticed there is get_user_pages, but don't know how to use it, or maybe there are other better methods?
Thanks.
You are better off doing this the other way around - have the kernel allocate the buffer, then allow the userspace program to map it into its address space using mmap().
Finally I figured out how to handle this problem...
Quite simple but may be not safe.
use phys_to_virt, which calls __va(pa), to get the virtual address in kernel and I can access that buffer. And because the buffer is pinned, that can guarantee the physical location is available.
What's more, I don't need s special syscall to tell the kernel the buffer's info. Instead, a proc file is enough because I just need tell the kernel once.

Resources