Linux Kernel SRAM Map Into Addresses - linux-kernel

I have an FPGA that I am interfacing with an AM335x processor through the GPMC interface. The FPGA is basically acting as an audio codec. I am doing a DMA transfer from the SRAM into main memory. However, I need to map that memory into the address space via device tree. However, I want my module to be the only one that can access that memory (for the DMA transfer). How can I ensure that the memory is only used for the DMA transfer?

Related

DMA on FPGA Cannot Access Kernel Memory Allocated with GFP_KERNEL Flag

I would first like to give a brief description of the scenario that I am working on.
What I am trying to accomplish is to load image data from my user space application and transfer it over PCIe to a custom acceleration engine located inside a FPGA board.
The specifications of my host machine are:
Intel Xeon Processor with 16G ram.
64 Bit Debian Linux with kernel version 4.18.
The FPGA is a Virtex 7 KC705 development board.
The FPGA uses a PCIe controller (bridge) for the communication between the PCIe infrastructure and the AXI interface of the FPGA.
In addition, the FPGA is equiped with a DMA engine which is supposed to read data through the PCIe controller from the kernel memory and forward them to the accelerator.
Since in future implementations I would like to make multiple kernel allocations up to 256M, I have configured my kernel to support CMA and DMA Contiguous Allocator.
According to dmesg I can verify that my system reserves at startup the CMA area.
Regarding the acceleration procedure:
The driver initially allocates 4M kernel memory by using the dma_alloc_coherent() with GFP_KERNEL flag. This allocation is inside the range of the CMA.
Then from my user space application I call mmap with READ_PROT/WRITE_PROT and MAP_SHARED/MAP_LOCKED flags to map the previously allocated CMA memory and load the image data in there.
Once the image data is loaded I forward the dma_addr_t physical address of the CMA allocated memory and I start the DMA to transfer the data to the accelerator. When the acceleration is completed the DMA is supposed to write the processed data back to the same CMA kernel allocated memory.
On completion the user space application reads the processed data from the CMA memory and saves it to a .bmp file. When I check the "processed" image it is the same as the original one. I suppose that the processed data were never written to the CMA memory.
Is there some kind of memory protection that does not allow writing to the CMA memory when using GFP_KERNEL flag?
An interesting fact is that when I allocate kernel memory with dma_alloc_coherent but with either GFP_ATOMIC or GFP_DMA the processed data are written correctly to the kernel memory but unfortunately the allocated memory does not belong to the range of the CMA area.
What is wrong in my implementation?
Please let me know if you need more information!
In order to use mmap() I have adopted the debugfs file operations method.
Initially, I open a debugfs file as follows:
shared_image_data_file = open("/sys/kernel/debug/shared_image_data_mmap_value", O_RDWR);
The shared_image_data_mmap_value is my debugfs file which is created in my kernel driver and the shared_image_data_file is just an integer.
Then, I call mmap() from userspace as follows:
kernel_address = (unsigned int *)mmap(0, (4 * MBYTE), PROT_READ | PROT_WRITE, MAP_SHARED | MAP_LOCKED, shared_image_data_file, 0);
When I call the mmap() function in user space the mmap file operation of my debugfs file executes the following function in the kernel driver:
dma_mmap_coherent(&dev->dev, vma, shared_image_data_virtual_address, shared_image_data_physical_address, length);
The shared_image_data_virtual_address is a pointer of type uint_64_t while the shared_image_data_physical_address is of type dma_addr_t and they where created earlier when I used the following code to allocate memory in kernel space:
shared_image_data_virtual_address = dma_alloc_coherent(&dev->dev, 4 * MBYTE, &shared_image_data_physical_address, GFP_KERNEL);
The address that I pass to the DMA of the FPGA is the shared_image_data_physical_address.
I hope that the above are helpful.
Thank you!

how software interrupts are different than port IN/OUT ,

I am confused with port mapping and ISR
since i am following an article which mentioned that hardware ports are mapped to memory from 0x00000000 to 0x000003FF
now we can talk with microcontroller of that hardware using these port no using IN and OUT instructions ok
but what is ivt then mean i read ivt contain address of interrupt service routine
everthing is messed in mind
do when we use IN /OUTwith port no cpu checks in ivt and how microcontrollers knows their number
When hardware ports are mapped to memory location then this is called Memory-Mapped IO.
Hardware is accessed by reading/writing data/commands in their registers. In Memory Mapped IO, instead of transmitting data/command to hardware registers, the cpu reads/writes signal/command/data at particular memory locations which are mapped to hardware registers. Therefore, communication between hardware and cpu happens via read/write to specific memory location.
When a hardware in installed it is given a set of fixed memory location for the purpose of Memory Mapped IO and these memory location are recorded. Also, every hardware has its ISR whose address is stored in IVT. Now when a particular hardware interrupts the cpu, the cpu finds the interrupting hardware's ISR address from the IVT. Once the cpu identifies with which hardware the communication (I/O) needs to be done then it communicate with that hardware via Memory Mapped IO by making use of the fixed memory locations which were allocated for that hardware.

What is the use of the DMA controller in a processor?

DMA controllers are present on disks, networking devices. So they can transfer data to main memory directly. Then what is use of the dma controller inside processor chip ?Also i would like to know, if there are different buses (i2c, pci, spi) outside of processor chip and only one bus (AXI) inside processor. how does this work?(shouldn’t it result in some bottleneck)
The on-chip DMA can take the task of copying data from devices to memory and viceversa for simple devices that cannot implement a DMA of their own. I can think that such devices can be a mouse, a keyboard, a soundcard, a bluetooth device, etc. These devices have simple logic and their requests are multiplexed and sent to a single general purpose DMA on the chip.
Peripherals with high bandwidths like GPU cards, Network Adapters, Hard Disks implement their own DMA that communicates with the chip's bus in order to initiate uploads and downloads to the system's memory.
if there are different buses (i2c, pci, spi) outside of processor chip
and only one bus (AXI) inside processor. how does this work?(shouldn’t
it result in some bottleneck)
That's actually simple. The on-chip internal AXI bus is much faster - running at a much higher frequency (equal or in the same range to the CPU's frequency) (has a much higher bandwidth) than all the aggregated bandwidths of i2c+pci+spi. Of course multiple hardware elements compete on the AXI bus but usually you have priorities implemented and different optimization techniques.
From Wikipedia:
Direct memory access (DMA) is a feature of computerized systems that allows certain hardware subsystems to access main system memory independently of the central processing unit (CPU). [...] A DMA controller can generate memory addresses and initiate memory read or write cycles. It contains several processor registers that can be written and read by the CPU. These include a memory address register, a byte count register, and one or more control registers.

How does the CPU know the PCI adress-space

I understand that PCI and PCIe devices can be configured by the CPU (via code in the BIOS or OS) to respond to certain physical memory addresses by writing to specific areas of the device's configuration space.
In fact the Linux kernel has quite the complicated algorithm for doing this taking into account a lot of requirements of the device (memory alignment, DMA capabilities etc).
Seeing that software seems to be in control of if, when and where this memory is mapped, my question is: How can a piece of software control mapping of physical memory?
After this configuration, the PCI device will know to respond to the given address range, but how does the CPU know that it should go on the PCI bus for those specific addresses that were just dynamically decided?
The northbridge is programmed with the address range(s) that are to be routed to the memory controller(s).
All other addresses go to the external bus.
It is based on address mapping info that CPU had.
normally you have 2^64 -1 address lines with CPU if it is 64 bit processor.
Now memory is now around 16 GB which is 2^34 is around 16 GB.
So all the devices which CPU has (even legacy PCI and PCIe devices) and their config space can be mapped
to address line above this RAM physical address space.
Any IO to this space can be forwarded to respective device.
In our case CPU finds out that the config space which it wants to access to is a PCI or PCIe device then it forwards the
instruction to host bridge of CPU (00:00:00 Do lspci in a box you will see the host bridge with this BDF)
Once it finds out the target device is within host bridge the instruction (Can be IO or Memory) will be converted to appropriate TLP request.

In X86 Platform, does the DMA operation mean to move data between MMIO addr space and system memory addr space?

On the modern X86/X86_64 platform, due to MMIO mechanism, are DMA operations to move data between MMIO address space and memory address space? In the Linux kernel, I see that there is a dma_addr_t definition. Is this type used for MMIO addresses?
In general, a DMA operation just refers to a device other than the CPU accessing memory. On x86, there are not separate MMIO and RAM address spaces -- everything is unified. Some examples of typical DMA operations:
A network card might receive a packet from the network and use DMA to write the packet contents into the system's RAM.
A SATA controller might get a write command and use DMA to read the data to send to the hard disk from system RAM.
A graphics card might use DMA to read texture data from system RAM into its own video memory. The video memory is visible to the system CPU through a PCI BAR (MMIO), but that's not really relevant here.
The dma_addr_t type holds a "bus address" in Linux. The address that, for example, a PCI device (like a NIC / SATA controller / GPU) sees a given part of memory mapped at can be different than the address the CPU uses. So Linux has the abstraction of "DMA mapping" to handle this difference.
In the first example above, the network stack would allocate a buffer in RAM, and then pass it to a dma_map function to get a bus address that it hands to the NIC. The NIC would use that address to write the packet into memory.
In older x86 systems, there wasn't really any difference between the physical address that the CPU used and the bus address that external devices used, and the dma_map functions were pretty much NOPs. However, with modern technologies like VT-d, the bus address that a PCI device uses might be completely different than the CPU's physical address, and so it is important to do the DMA mapping and use a dma_addr_t for all addresses that are used by external DMA devices.

Resources