PCIe DMA problems in ARM Machines - linux-kernel

I'm trying to write a PCIe driver for an ARM machine (Cavium ThunderX2). I'm working with Xilinx Alveo FPGAs. Our work involves migrating pages between heterogeneous nodes (x86 and ARM) and the driver takes care of the DMA between the host and the FPGA, and handles the device interrupts.
The DMA doesn't work (From Device/To Device) and I get "ARM SMMU v3.x 0x10 event occurred" errors. I tried disabling the SMMU (recommended by some threads in the NVIDIA community - https://forums.developer.nvidia.com/t/how-dma-works-in-arm-the-dma-stopped-working-with-our-pci-driver/53699), but that leads to a protection issue ("RAS Controller stopped"), and the system hangs.
I use dma_map_single APIs from dma-mapping.h to convert the virtual address to a DMA-capable bus address. Would dma_alloc_coherent make a difference? (Of course, I'll try this out)
I'm unable to figure out the problem. Is this is a PCIe driver issue or an issue with the device or is there a fix/patch available for ARM PCIe DMA ops? Any help would be appreciated!!
Thanks,
Narayan
Error snippet

Related

Emulate I2C on QEMU Aarch64

I have read this post How to emulate an i2c device on QEMU x86? about a solution for configuring an I2C device for QEMU emulating x86_64.
I am trying to do the same thing for ARM. Currently I have a simple I2C user space program that is timing out because although QEMU lists an I2C device in /dev it has no actual method of simulating the device and returning ACK. I was curious if someone could provide more detail on how I might implement the solution from that post because I am not very experienced in that area and the answer is pretty sparse in detail.
I am wondering how peripheral devices, other than USB, like ones using CAN and SPI are emulated when using QEMU.
Devices are supported on a device-by-device basis. QEMU emulation of Zynq 7000 can emulate certain EEPROM and flash devices and read and write to them over the I2C bus. Device support is listed here. Xilinx QEMU peripheral device support: http://www.wiki.xilinx.com/QEMU+-+Zynq-7000
I assume that support for other machine types is also on a by device basis, and hopefully it is as well documented as Xilinx's QEMU machines' peripheral support.
The wiki provided has other pages which provide examples of adding peripheral devices to the device tree. When you specify the device tree at QEMU invocation, QEMU will read the device tree and will begin emulating devices for which it has support.

what PSH kernel in intel edison mean? Is it the name of primary bootloader present inside ROM?

I was going through the logs after booting up the intel edison. I came accross the word. Is it the name of bios?Does it do some security verification like key matching/checking and all ?
Intel Edison board, more precisely Intel Tangier SoC, has a Minute IA (i486+, also known as Pentium ISA microarchitecture) based MCU (for example, Intel Quark D2000 SoC has it as far as I know) which is part of so called Platform Services Hub (PSH). PSH has own Page Cache (to keep RTOS and its applications), LAPIC. The peripheral, such as DMA and I2C, is shared with System Controller Unit (SCU). SCU actually controls PSH.
When system starts MCU boots first. Inside it is a Viper RTOS with some modifications, i.e. it has a library to support sensors.
There is no information available from Intel regarding use of open source RTOS, such as Zephyr, on PSH.

NVMe PCIe Hard Disk on Freescale LS2080A not recognised

I have a Freescale LS2080 box for which I am developing a custom linux 4.1.8 kernel using the Freescale Yocto project.
I have an NVMe hard disk attached to the LS2080 via a PCIe card, but the disk is not recognised when I boot up the board with my custom linux kernel.
I plugged the same combination of NVMe disk and PCIe card into a linux 3.16.7 desktop PC and it was detected and mounted without problem.
When building the LS2080 kernel using the Yocto project, I have enabled the NVMe block device driver and I have verified that this module is present in the kernel when booting on the board.
The PCIe slot on the board is working fine because I have tried it with a PCIe Ethernet card and a PCIe SATA disk.
I suspect that I am missing something in the kernel configuration or device tree, but I'm not sure what. When I add the NVMe driver to the kernel using menuconfig, the NVMe driver dependencies are supposed to be resolved.
Can anyone provide insight into what I am missing?
First make sure that PCIe device is recognized using lspci.
If device is not shown in lspci list this is enumeration problem, to check the error use PCIe analyzers.
If the device is shown in the list then simply add the device vendor id and device ID to NVMe driver and recompile to load the driver for your device.

Debugging processor registers when running an MMU

I am currently trying to access the registers of the ARM9 core on the Zynq Z702 SoC using the XIlinx's XMD tool provided as part of the SDK. When I try to read a part of memory, I am getting an MMU related error. Do i need some specialized hardware to read and write to these registers when running an OS on the processor. Please help.
P.S. : I am running Linux 3.x.x kernel on the processor and the memory region I am trying to access are memory mapped regions and are not occupied by any kernel or user-space code.

Writing a Windows 64-bit device driver for a 32-bit PCI device

I'm evaluating to port a device driver I wrote several years ago from 32 to 64 bits. The physical device is a 32-bit PCI card. That is, the device is 32 bits but I need to access it from Win7x64. The device presents some registers to the Windows world and then performs heavy bus master data transferring into a chunk of driver-allocated memory.
I've read in the Microsoft documentation that you can signal whether the driver supports 64-bit DMA or not. If it doesn't, then the DMA is double buffered. However, I'm not sure if this is the case. My driver would/could be a full 64-bit one, so it could support 64-bit addresses in the processor address space, but the actual physical device WON'T support it. In fact, the device BARs must be mapped under 4 GB and the device must get a PC RAM address to perform bus master below 4 GB. Does this mean that my driver will go through double buffering always? This is a very performance-sensitive process and the double buffering could prevent the whole system from working.
Of course, designing a new 64-bit PCI (or PCI-E) board is out of question.
Anybody could give me some resources for this process (apart from MS pages)?
Thanks a lot!
This is an old post, I hope the answer is still relevant...
There are two parts here, PCI target and PCI master access.
PCI target access: The driver maps PCI BARs to 64bit virtual address space and the driver just reads/writes through a pointer.
PCI master access: You need to create a DmaAdapter object by calling IoGetDmaAdapter(). When creating, you also describe your device is a 32bit (see DEVICE_DESCRIPTION parameter). Then you call DmaAdapter::AllocateCommonBuffer() method to allocate a contiguous DMA buffer in PC RAM.
I am not sure about double-buffering though. From my experience, double-buffering is not used, instead, DmaAdapter::AllocateCommonBuffer() simply fails if cannot allocate a buffer that satisfies the DEVICE_DESCRIPTION (in your case - 32bit dma addressing).
There's no problem writing a 64-bit driver for a device only capable of 32-bit PCI addressing. As Alexey pointed out, the DMA adapter object you create specifies the HW addressing capabilities of your device. As you allocate DMA buffers, the OS takes this into account and will make sure to allocate these within your HW's accessible region. Linux drivers behave similar, where your driver must supply a DMA address mask to associate with your device that DMA functions used later will refer to.
The performance hit you could run into is if your application allocates a buffer that you need to DMA to/from. This buffer could be scattered all throughout memory, with pages in memory above 4G. If your driver plans to DMA to these, it will need to lock the buffer pages in RAM during the DMA and build an SGL for your DMA engine based on the page locations. The problem is, for those pages above 4G, the OS would then have to copy/move them to pages under 4G so that your DMA engine is able to access them. That is where the potential performance hit is.

Resources