I have been trying to get the information on this for so long and still haven't got anything solid. So, what I have learned so far is that the IOMMU converts the IOVA provided by the DMA to the physical address and reads or writes from/to the memory. My questions are as follows:
1) Does IOMMU store different Memory map for every single device? Does each device see the address range starting from zero in their virtual address space
2) Where are these IOMMU memory maps are stored?
3) How does IOMMU know about which device the request is coming from if every device sees the virtual address starting from zero in their virtual address space?
4) Does the device also transmit some kind of Device specific ID or something which IOMMU recognizes and uses this to unmap the IOVA and protect the other memory addresses being seen or written by this device?
Related
I have a PCI device which has some memory address inside BAR0. I suppose this memory address is just OS virtual address which points to some physical memory of the device. The question is where it points? Reading the documentation of the device as well as the firmware source code I noticed that this device have some register responsible for setting so called memory windows. I was hopping that BAR0 will point exactly to them, hovewer this is not the case and looks like this:
BAR0 address -> Some unknown memory -> + 0x80000 My memory window
So why is my memory window offset by 0x80000 from where BAR0 points to, where is this BAR0 pointing us to + how is it set any by whom?
Thanks
No. The address in a BAR is the physical address of the beginning of the BAR. That is how the device knows when and how to respond to a memory read or write request. For example, let's say the BAR (BAR0) is of length 128K and has a base address of 0xb840 0000, then the device will respond to a memory read or write to any of these addresses:
0xb840 0000
0xb840 0080
0xb840 1184
0xb841 fffc
but NOT to any of these addresses:
0x5844 0000 (Below BAR)
0xb83f 0000 (Below)
0xb83f fffc (Below)
0xb842 0000 (Above BAR)
0xe022 0000 (Above)
This was more significant in the original PCI where the bus was actually a shared medium and devices might see requests for addresses belonging to other devices. With PCI-Express' point to point architecture, only PCI "bridges" will ever see requests for memory addresses they do not own. But it still functions in exactly the same way. And the low bits of the address space still allow the device to designate different functions / operations to different parts of the space (as in your device, creating the separate memory window you're attempting to access).
Now, how you as a programmer access the BAR memory space is a different question. For virtually all modern systems, all memory accesses made by programs are to virtual addresses. So, in order for your memory access to reach a device, there must be a mapping from the virtual address to the physical address. That is generally done through page tables (though some architectures, like MIPS, have a dedicated area of virtual address space that is permanently mapped to part of the physical address space).
The exact mechanism for allocating virtual address space, and setting up the page tables to map from that space to the BAR physical address space is processor- and OS-dependent. So you will need to allocate some virtual address space, then create page tables mapping from the start of the allocated space to (BAR0) + 0x80000 in order to work with your window. (I'm describing this as two steps, but your OS probably provides a single function call to allocate virtual address space and map it to a physical range in one fell swoop.)
Now, the process of assigning physical address space to the device (that is, actually sticking an address into the BAR) is generally done very early in system initialization by the system BIOS or an analogous early-boot mechanism while it's enumerating all the PCI devices installed in the system. The desired address space size is determined by querying the device, then the base address of a large enough physical address region is written into the BAR.
The final question: why your memory window is at an offset of 0x80000 within the device's address space is completely device-specific and cannot be answered more generally.
I have a number of memory pointers, allocated by dma_pool_alloc. For all of them, I have DMA bus address (dma_addr_t), kernel virtual address (void*), and length. I have no problems addressing them as a single entity via read/write operations provided by character device, but this approach slows down speed pretty hard (many kernel calls). I would like to map them all to a single virtual address in userspace. Stuff like remap_pfn_range accepts only a single address, which doesn't really cut it. Is there a way to do it?
P.S. Yeah, there is a number of similar questions here on SO, but none of them refer to mapping more than a single physical address.
I'm writing a PCI device driver and i need to allocate some memory for DMA,
I'm using this function:
void *dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle, int flag);
I pass dma_handle to the device.
The return value of this function is a virtual address i can use in the kernel, the thing is i don't want to save this address for each memory allocation im doing.
Is there a way to translate the physical address dma_handle to an address i can use in the kernel?
something like one of these functions/macros:
virt_to_page(kaddr)
page_to_pfn(page)
is there a phy_to_kvirt macro/function or any other way to translate a physical address to kernel virtual address?
thanks
No, there isn't, and dma_handle isn't just any physical address. It is a physical address from the point of view of the specific device. Different devices on different buses may have entirely different views of the main memory. In addition to that, the returned virtual address may be a dynamically mapped page instead of having a fixed relation with physical mapping of main memory.
There may be enough information in kernel structures to piece the information together on certain buses and architectures but no guarantees and don't expect it to be fast – the kernel's own dma_free_coherent() requires you to supply everything, virtual address, device and dma_handle to do its work, because that is the only way it can work universally across architectures and buses.
Just to reiterate: A dma_handle is meaningless on its own. Multiple devices could have the exact same dma_handle that still refers to different memory locations.
I understand that PCI and PCIe devices can be configured by the CPU (via code in the BIOS or OS) to respond to certain physical memory addresses by writing to specific areas of the device's configuration space.
In fact the Linux kernel has quite the complicated algorithm for doing this taking into account a lot of requirements of the device (memory alignment, DMA capabilities etc).
Seeing that software seems to be in control of if, when and where this memory is mapped, my question is: How can a piece of software control mapping of physical memory?
After this configuration, the PCI device will know to respond to the given address range, but how does the CPU know that it should go on the PCI bus for those specific addresses that were just dynamically decided?
The northbridge is programmed with the address range(s) that are to be routed to the memory controller(s).
All other addresses go to the external bus.
It is based on address mapping info that CPU had.
normally you have 2^64 -1 address lines with CPU if it is 64 bit processor.
Now memory is now around 16 GB which is 2^34 is around 16 GB.
So all the devices which CPU has (even legacy PCI and PCIe devices) and their config space can be mapped
to address line above this RAM physical address space.
Any IO to this space can be forwarded to respective device.
In our case CPU finds out that the config space which it wants to access to is a PCI or PCIe device then it forwards the
instruction to host bridge of CPU (00:00:00 Do lspci in a box you will see the host bridge with this BDF)
Once it finds out the target device is within host bridge the instruction (Can be IO or Memory) will be converted to appropriate TLP request.
On the modern X86/X86_64 platform, due to MMIO mechanism, are DMA operations to move data between MMIO address space and memory address space? In the Linux kernel, I see that there is a dma_addr_t definition. Is this type used for MMIO addresses?
In general, a DMA operation just refers to a device other than the CPU accessing memory. On x86, there are not separate MMIO and RAM address spaces -- everything is unified. Some examples of typical DMA operations:
A network card might receive a packet from the network and use DMA to write the packet contents into the system's RAM.
A SATA controller might get a write command and use DMA to read the data to send to the hard disk from system RAM.
A graphics card might use DMA to read texture data from system RAM into its own video memory. The video memory is visible to the system CPU through a PCI BAR (MMIO), but that's not really relevant here.
The dma_addr_t type holds a "bus address" in Linux. The address that, for example, a PCI device (like a NIC / SATA controller / GPU) sees a given part of memory mapped at can be different than the address the CPU uses. So Linux has the abstraction of "DMA mapping" to handle this difference.
In the first example above, the network stack would allocate a buffer in RAM, and then pass it to a dma_map function to get a bus address that it hands to the NIC. The NIC would use that address to write the packet into memory.
In older x86 systems, there wasn't really any difference between the physical address that the CPU used and the bus address that external devices used, and the dma_map functions were pretty much NOPs. However, with modern technologies like VT-d, the bus address that a PCI device uses might be completely different than the CPU's physical address, and so it is important to do the DMA mapping and use a dma_addr_t for all addresses that are used by external DMA devices.