check if the mapped memory supports write combining

check if the mapped memory supports write combining - memory-management

I write a kernel driver which exposes to the user space my I/O device.
Using mmap the application gets virtual address to write into the device.
Since i want the application write uses a big PCIe transaction, the driver maps this memory to be write combining.
According to the memory type (write-combining or non-cached) the application applies an optimal method to work with the device.
But, some architectures do not support write-combining or may support but just for part of the memory space.
Hence, it is important that the kernel driver tell to application if it succeeded to map the memory to be write-combining or not.
I need a generic way to check in the kernel driver if the memory it mapped (or going to map) is write-combining or not.
How can i do it?
here is part of my code:
vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
io_remap_pfn_range(vma, vma->vm_start, pfn, PAGE_SIZE, vma->vm_page_prot);

First, you can find out if an architecture supports write-combining at compile time, with the macro ARCH_HAS_IOREMAP_WC. See, for instance, here.
At run-time, you can check the return values from ioremap_wc, or set_memory_wc and friends for success or not.

Related

Can I call dma_map_single() on DeviceB using an addresses returned from dma_alloc_coherent on DeviceA?

I am writing custom linux driver that needs to DMA memory around between multiple PCIE devices. I have the following situation:
I'm using dma_alloc_coherent to allocate memory for DeviceA
I then use DeviceA to fill the memory buffer.
Everything is fine so far but at this point I would like to DMA the
memory to DeviceB and I'm not sure the proper way of doing it.
For now I am calling dma_map_single for DeviceB using the
address returned from dma_alloc_coherent called on DeviceA. This
seems to work fine in x86_64 but it feels like I'm breaking the rules
because:
dma_map_single is supposed to be called with memory allocated from kmalloc ("and friends"). Is it problem being called with an address returned from another device's dma_alloc_coherent call?
If #1 is "ok", then I'm still not sure if it is necessary to call the dma_sync_* functions which are needed for dma_map_single memory. Since the memory was originally allocated from dma_alloc_coherent, it should be uncached memory so I believe the answer is "dma_sync_* calls are not necessary", but I am not sure.
I'm worried that I'm just getting lucky having this work and a future
kernel update will break me since it is unclear if I'm following the API rules correctly.
My code eventually will have to run on ARM and PPC too, so I need to make sure I'm doing things in a platform independent manner instead of getting by with some x86_64 architecture hack.
I'm using this as a reference:
https://www.kernel.org/doc/html/latest/core-api/dma-api.html

dma_alloc_coherent() acts similarly to __get_free_pages() but as size granularity rather page, so no issue I would guess here.
First call dma_mapping_error() after dma_map_single() for any platform specific issue. dma_sync_*() helpers are used by streaming DMA operation to keep device and CPU in sync. At minimum dma_sync_single_for_cpu() is required as device modified buffers access state need to be sync before CPU use it.

Mapping external memory device

I am using the GCC toolchain and the ARM Cortex-M0 uC. I would like to ask if it is possible to define a space in the linker so that the reading and writing operations would call the external device driver functions for reading and writing it's space (eg. SPI memory). Can anyone give some hints how to do it?
Regards, Rafal
EDIT:
Thank you for your comments and replies. My setup is:
The random access SPI memory is connected via SPI controller and I use a "standard" driver to access the memory space and store/read data from it.
What I wanted to do is to avoid calling the driver's functions explicitly, but to hide them behind some fixed RAM address, so that any read of that address would call the spi read memory driver function and write would call the spi write memory function (the offset of the initial address would be the address of the data in the external memory). I doubt that it is at all possible in the uC without the MMU, but I think it is always worth to ask someone else who might have had similar idea.

No, this is not how it works. Cortex-M0 has no memory management Unit, and is therefore unable to intercept accesses to specific memory regions.
It's not really clear what you are trying to achieve. If you have connected SPI memory external to the chip, you have to perform all the accesses using a driver, it is not possible to memory map the SPI port abstraction.
If this is an on-device SPI memory controller, it will have two regions in the memory map. One will be the 'memory'region, and will probably behave read-only, one with be the control registers for the memory controller hardware, and it is these registers which the device driver talks to. Specifically, to write to the SPI, you need to perform driver accesses to perform the write.
In the extreme case, (for example Cortex-M1 for Xilinx), there will be an eXecute In Place (XIP) peripheral for the memory map behaviour, and a SPI Master device for the read/write functionality. A GPIO pin is used to multiplex the SPI EEPROM pins between 'memory mode' and çonfiguration mode'.

io_remap_pfn_range issue on powerpc

I'd like to access to PCIe IO from userland.
In the module driver, I'm able to write/read using the pointer returned by ioremap () without any problem.
From userland, I want to use the pointer returned by mmap () but the host hangs whatever I write or read on the PCIe bus.
I implemented the mmap call in the file operation structure which calls io_remap_pfn_range(vma, vma->vm_start, start >> PAGE_SHIFT, vma->vm_end - vma->vm_start, vma->vm_page_prot); where start is the value returned by pci_resource_start ().
What did I miss ?
Note that my module works fine on x86.
Thanks,
Fred

The POWER architecture does not support PCIe IO accesses; you'll need to use PCIe memory cycles instead. It's likely that your PCIe device has a corresponding resource for MMIO space, perhaps you can use that.
Also, depending on your usage, you may want to perform accesses on the resource<N> files in sysfs, under /sys/bus/pci/devices/<id>. This might mean that you don't need any kernel code at all.

Unknown symbol flush_cache_range in linux device driver

I am just writing my very first linux device driver, and I have ran into a problem. I want to prevent one memory region from being cached, so I have been trying to use flush_cache_range() and flush_tlb_range() to flush the cache for this memory region. Everything compiles well, but when I try to load the kernel module I get the following errors:
Unknown symbol flush_cache_range (err 0)
Unknown symbol flush_tlb_range (err 0)
I find this very strange. Shouldn't they be defined in kernel?
I know that alternatively I could also use dma_alloc_coherent() to allocate a non-cached memory region. But I don't have a device structure and passing NULL for this parameter didn't cause any errors, but I also couldn't see any of the data that was supposed to be there.
Some information about my system: I'm trying to get this running on a ARM microcontroller with an integrated FPGA (the Xilinx Zynq). The FPGA copies some data to a memory location specified by the CPU. Now I want to access this memory without getting old data from the caches.
Any help is very appreciated.

You cannot use functions such as flush_cache_range() because they are not intended to be used by modules.
To allocate memory that can be accessed by a DMA device, you must use dma_alloc_coherent().
This requires a valid device structure so that it can do proper mapping between memory addresses and bus addresses.
If your device is not on a bus that is handled by an existing framework (such as PCI), you have to create a platform device.

A few notes:
1- flush_cache_range doesn't "prevent one memory region from being cached" .. It just simply flush (clean + invalidate) the caches. Any future writes/reads to this memory region through the same virtual range will go through the cache again.
2- If the FPGA is writing to memory and then the CPU are going to read from this memory, probably flushing the cache isn't the correct thing to do any way. Usually what you need to do is to invalidate the memory region and then tell the FPGA to write.
3- Please take a look at "${kernel-src}/Documentation/DMA-API.txt" in the kernel sources. It has plenty of information about how you can safely ( cache maintenance + phys_to_dma translation ) use a specific region of memory for DMA.

Physical Memory Allocation in Kernel

I am writting a Kernel Module that is going to trigger and external PCIe device to read a block of data from my internel memory. To do this I need to send the PCIe device a pointer to the physical memory address of the data that I would like to send. Ultimately this data is going to be written from Userspace to the kernel with the write() function (userspace) and copy_from_user() (kernel space). As I understand it, the address that my kernel module will see is still a virtual memory address. I need a way to get the physical address of it so that the PCIe device can find it.
1) Can I just use mmap() from userspace and place my data in a known location in DDR memory, instead of using copy_from_user()? I do not want to accidently overwrite another processes data in memory though.
2) My kernel module reserves PCIe data space at initialization using ioremap_nocache(), can I do the same from my kernel module or is it a bad idea to treat this memory as io memory? If I can, what would happen if the memory that I try to reserve is already in use? I do not want to hard code a static memory location and then find out that it is in use.
Thanks in advance for you help.

You don't choose a memory location and put your data there. Instead, you ask the kernel to tell you the location of your data in physical memory, and tell the board to read that location. Each page of memory (4KB) will be at a different physical location, so if you are sending more data than that, your device likely supports "scatter gather" DMA, so it can read a sequence of pages at different locations in memory.
The API is this: dma_map_page() to return a value of type dma_addr_t, which you can give to the board. Then dma_unmap_page() when the transfer is finished. If you're doing scatter-gather, you'll put that value instead in the list of descriptors that you feed to the board. Again if scatter-gather is supported, dma_map_sg() and friends will help with this mapping of a large buffer into a set of pages. It's still your responsibility to set up the page descriptors in the format expected by your device.
This is all very well written up in Linux Device Drivers (Chapter 15), which is required reading. http://lwn.net/images/pdf/LDD3/ch15.pdf. Some of the APIs have changed from when the book was written, but the concepts remain the same.
Finally, mmap(): Sure, you can allocate a kernel buffer, mmap() it out to user space and fill it there, then dma_map that buffer for transmission to the device. This is in fact probably the cleanest way to avoid copy_from_user().

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio