PCIe driver error for enabling device and allocating memory - linux-kernel

I'm using PCIe bus on Freescale MPC8308 (as root complex) and the endpoint device is an ASIC with just one 256 MB memory region and just one BAR register. The device configuration space registers are readily accessible through "pciutils" package. At first I tried to access memory region by using mmap() but it didn't work. So at the next level, I prepared a device driver for the PCIe endpoint device which is a kernel module that I load into kernel after Linux booting.
In my driver the endpoint device is identified from device ID table but when I want to enable the device by pci_enable_device(), I see this error:
driver-pci 0000:00:00.0: device not available because of BAR 0 [0x000000-0xfffffff] collisions
Also when I want to allocate memory region for PCIe device by using pci_request_region(), it is not possible.
Here is the part of driver code which is not working:
pci_enable_result = pci_enable_device (pdev);
if (pci_enable_result)
{
printk(KERN_INFO "PCI enable encountered a problem \n");
return pci_enable_result;
}
else
{
printk(KERN_INFO "PCI enable was succesfull \n");
}
And here is the result in "dmesg" :
driver-pci 0000:00:00.0: device not available because of BAR 0 [0x000000-0xfffffff] collisions
PCI enable encountered a problem
driver-pci: probe of 0000:00:00.0 failed with error -22
It is worth noting that in the driver I can read and write configuration registers correctly by using functions like pci_read_config_dword() and pci_write_config_dword().
What's the problem do you think? is it possible that the problem appears because the kernel initializes the device prior to kernel module? what should I do to prevent this to occur?

BAR registers access are generally for small region. Your BAR0 size seems to be too large. Try with less memory (less than 1MB), it should works.

Related

Accessing a page from device memory in userspace using linux

I have a device memory mapped to kernel virtual address via ioremap. Userspace needs to access a page at offset x from this device memory.
The way i can achieve it rightnow is via using mmap in userspace and writing a small memory mapping at driver side.
Is there any way to use offset ( lets assume kernel passes the offset to userspae )and achieve samething without making any mapping at driver side.
Can ioremapped kernel virtual addresses be used here ?

How can my PCI device driver remap PCI memory to userspace?

I am trying to implement a PCI device driver for a virtual PCI device on QEMU. The device defines a BAR region as RAM, and the driver can do ioremap() this region and access it without any issues. The next step is to assign this region (or a fraction of it) to a user application.
To do this, I have also implemented an .mmap function as part of my driver file operations. This mmap is simply using remap_pfn_range, but it also passes the pfn of the memory pointer returned by the ioremap() earlier.
However, upon running the user space application, the mmap is successful, but when the app tries to access the memory, it gets killed and I get the following dmesg errors.
"
a.out: Corrupted page table at address 7f66248b8000
..Some page table info..
Bad pagetable: 000f [#2] SMP NOPTI
..and the core dump..
"
Does anyone know what have I done wrong? Did I missed a step? Or it could be an error specific to QEMU?
I am running x86_softmmu as my QEMU configuration and my kernel is the 4.14
I've solved this issue and managed to map PCI memory to user space via the driver. As #IanAbbott implied, I've changed the pfn input of the remap_pfn_range() function I was using in my custom ->mmap().
The original was:
io_remap_pfn_range(vma, vma->vm_start, pfn, vma->vm_end - vma->vm_start, vma->vm_page_prot));
where the pfn was the result of the buffer pointer return from the ioremap(). I changed the pfn to:
pfn = pci_resource_start(pdev, BAR) >> PAGE_SHIFT;
That basically points to the actual starting address pointed by the BAR. My working remap_pfn_range() function is now:
io_remap_pfn_range(vma, vma->vm_start, pci_resource_start(pdev, BAR) >> PAGE_SHIFT, vma->vm_end - vma->vm_start,vma->vm_page_prot);
I confirmed that it works by doing some dummy writes to the buffer pointer in my driver, then picking up the reads and doing some writes in my user space application.

How does PCIe Endpoint device memory is mapped into the systems memory map (MMIO)?

How does Linux Kernel or BIOS map the PCIe endpoint device memory into systems MMIO space ? Is there any API to achieve it ?
Lets assume that when writing a Linux device driver for a PCIe endpoint device, How can we map PCIe device memory into MMIO space ? Or Is it true that the device is already mapped into MMIO by BIOS during enumeration and what I would need to do it just remap the device MMIO into the kernel virtual address space using ioremap() ?
Platform : Linux on x86
There are two parts to this answer
Role of the BIOS
The BIOS (typically UEFI based) will do some sort of Depth-First Search (DFS) and enumerate all the children as PCIe is a self-enumerating bus. Since it has the view of the world (device, buses, processors) it will write an address to the BAR registers (could be BAR0 and or multiple of them). This will be the address the system will use and it will actually route these requests from the Host Agent (HA on x86/Intel platforms) to the Root Port to a PCIe switch all the way to the end point.
Each of these elements track what address ranges belong to themselves or one of their child devices (example a Switch may be the child of a Root Port)
Role of the Device Driver
The OS/Kernel will provide a toolkit of helper routines that the driver authors will use to access the device registers. Typically a driver may follow the folling routines
This is some sample driver pseudo-code, just to help illustrate the idea
1. pci_resource_flags(pdev, 0) & IORESOURCE_MEM
Check if a resource region is valid, here check for BAR 0
2. pci_request_regions(pdev, "region")
Take ownership of the resource/region
3. drv->registers = pci_iomap(pdev, 0, SIZE_YOU_WANT_TO_MAP)
This will give you kernel virtual address to device register mapping
Note : In case the BIOS does not enumerate, through Linux one can rescan the PCIe tree to see if a device can be seen or not.

Linux PCI Device Driver - Bus v. Kernel IRQ

I am writing a device driver for a PCIe card in Linux. I am trying to use interrupts in my driver.
Reading the "IRQ Line" section of the PCI configuration register (offset 0x3C) reports that the assigned IRQ line for the device is 11. lspci -b -vv also reports that my device's interrupt number is 11.
Heres where it gets weird... cat /sys/bus/pci/devices/<my_device>/irq reports that the interrupt number is 19. lspci -vv also reports that the interrupt number is 19.
Requesting 11 in my driver does not work. If I request 19 in the driver, I catch interrupts just fine.
What gives?
Thanks!!!
I believe that it has to do with the difference between "physical" and "virtual" IRQ lines. Because the processor has a limited number of physical IRQ lines it assigns virtual IRQ lines to allow the total number of PCI devices to exceed the number of physical lines.
In this instance, 19 is your virtual IRQ line (as recognized by the processor) while 11 is the physical line (as recognized by the PCI device).
By the way, you should probably really get the IRQ number from the struct pci_dev for that device since they're dynamically generated.
Sean's answer is easy to understand. However here I would try to make it more complete.
CPU's IRQ pin, almost always, isn't connected directly to a peripheral device, but via an programmable interrupt controller(PIC, e.g. Intel 8259A). This helps handling large device fan-out and also heterogeneous interrupt format (pin based v.s. message based as in PCIe).
If you run a recent version of lspci, it would print information like
Interrupt: pin A routed to IRQ 26
Here, pin A as 11 in OP, is the physical pin. This is something saved by the PCI device and used by the hardware to exchange between interrupts controller. From LDP:
The PCI set up code writes the pin number of the interrupt controller
into the PCI configuration header for each device. It determines the
interrupt pin (or IRQ) number using its knowledge of the PCI interrupt
routing topology together with the devices PCI slot number and which
PCI interrupt pin that it is using. The interrupt pin that a device
uses is fixed and is kept in a field in the PCI configuration header
for this device. It writes this information into the interrupt line
field that is reserved for this purpose. When the device driver runs,
it reads this information and uses it to request control of the
interrupt from the Linux kernel.
IRQ 26 as 19 in OP is something that kernel code and CPU deal with. According to Linux Documentation/IRQ.txt:
An IRQ number is a kernel identifier used to talk about a hardware
interrupt source. Typically this is an index into the global irq_desc
array, but except for what linux/interrupt.h implements the details
are architecture specific.
So the PCI first receives interrupts from device, translate interrupt source to a IRQ number and informs the CPU. CPU use IRQ number to look into Interrupt Descriptor Table(IDT) and find the correct software handler.
Ref:
http://www.tldp.org/LDP/tlk/dd/interrupts.html
http://www.brokenthorn.com/Resources/OSDevPic.html

Looking for an explanation of kernel driver I/O interface capability

I am looking at ways of interfacing to specific hardware I/O addresses from various Windows versions from 32-bit XP up 64-bit Win7 and beyond. There seem to be various solutions published with varying degrees of capability under different Windows versions and I am trying to understand the possibilities for creating my own kernel driver. The most basic kernal I/O R/W capability seems to be the direct I/O operations such as READ_PORT_UCHAR and WRITE_PORT_UCHAR (and their word and long derivatives). I have also seen the technique below which I dont understand, appearing to be some memory mapping capability of which I have no experience and can find little readable documentation. Could someone comment on the suitability / compatibility of READ_PORT_UCHAR / WRITE_PORT_UCHAR versus this mapping technique that I reproduce below please?
Thanks in advance.
case IOCTL_PHYMEM_MAP:
if (dwInBufLen==sizeof(PHYMEM_MEM) && dwOutBufLen==sizeof(PVOID))
{
PHYSICAL_ADDRESS phyAddr;
PVOID pvk, pvu;
phyAddr.QuadPart=(ULONGLONG)pMem->pvAddr;
//get mapped kernel address
pvk=MmMapIoSpace(phyAddr, pMem->dwSize, MmNonCached);
if (pvk)
{
//allocate mdl for the mapped kernel address
PMDL pMdl=IoAllocateMdl(pvk, pMem->dwSize, FALSE, FALSE, NULL);
if (pMdl)
{
PMAPINFO pMapInfo;
//build mdl and map to user space
MmBuildMdlForNonPagedPool(pMdl);
pvu=MmMapLockedPages(pMdl, UserMode);
//insert mapped infomation to list
pMapInfo=(PMAPINFO)ExAllocatePool(\
NonPagedPool, sizeof(MAPINFO));
pMapInfo->pMdl=pMdl;
pMapInfo->pvk=pvk;
pMapInfo->pvu=pvu;
pMapInfo->memSize=pMem->dwSize;
PushEntryList(&lstMapInfo, &pMapInfo->link);
DebugPrint("Map physical 0x%x to virtual 0x%x, size %u", \
pMem->pvAddr, pvu, pMem->dwSize);
RtlCopyMemory(pSysBuf, &pvu, sizeof(PVOID));
irp->IoStatus.Information=sizeof(PVOID);
}
else
{
//allocate mdl error, unmap the mapped physical memory
MmUnmapIoSpace(pvk, pMem->dwSize);
irp->IoStatus.Status=STATUS_INSUFFICIENT_RESOURCES;
}
}
else
irp->IoStatus.Status=STATUS_INSUFFICIENT_RESOURCES;
}
else
irp->IoStatus.Status=STATUS_INVALID_PARAMETER;
break;
What are these I/O ports that you're trying to access? It's generally a Really Bad Idea to go partying on ports that you don't own because you have no way of synchronizing access to those ports with the driver that owns them, the O/S, or the BIOS (it's possible to take an SMI and have the BIOS start talking to ports that it thinks it owns).
The code snippet provided is also a horribly bad idea and should be burned. Basically, all it's doing is mapping a kernel virtual address to a device register (MmMapIoSpace) and then doing the work to then map that device register into user mode (MmMapLockedPages). There are two obvious problems with it:
1) You don't know the caching attributes of the memory, so randomly specifying MmNonCached can hang the system
2) Same as with I/O ports, you can't just arbitrarily access a device's registers. You can't properly synchronize yourself with the driver that owns them, so you're doomed to eventually borking your system.
-scott

Resources