who and when to assign PCI/PCIe device BARs base address? - linux-kernel

I'm looking for how kernel to do PCI/PCIe enumeration and BAR assigning.
I thought that kernel will assign PCI base addresses of BAR when start-up, but when I tried pci earlydump (before kernel initial PCI subsystem) to see the BARs valuse, I found all base addresses are already assigned !? Does it mean that BIOS do PCI/PCIe enumeration and BAR assigning? How BIOS know each pci devices base addresses and assign these without conflict?
And if so, how the embedded system(without BIOS ) to enumerate and assign base address for PCI/PCIe devices?
By the way, my PC is Ubuntu 14.04.1 LTS (x86_64)

On all IBM PC-compatible machines, BARs are assigned by the BIOS. Linux simply scans through the buses and records the BAR values.
Some MIPS boards adopt similar approaches, where BARs are assigned by firmware. However, the quality of BAR assignment by firmware vary quite a bit. Some firmware simply assigns BARs to on-board PCI devices and ignore all add-on PCI cards. In that case, Linux cannot solely rely on the firmware's assignment.
There is another issue of depending on the firmware assignment. You need to stick with the address range setup by the firmware. In other words, if the firmware assigns PCI memory space from 0x10000000 to 0x14000000, you cannot easily move it to a different address space somewhere else in Linux. Source: Bar asssignment in Linux

Related

Fix PCI/PCIe BAR address assignment on x86

I read who and when to assign PCI/PCIe device BARs base address? and Bar asssignment in Linux. In second link the next was mentioned:
On all IBM PC-compatible machines, BARs are assigned by the BIOS.
Linux simply scans through the buses and records the BAR values. Some
MIPS boards adopt similar approaches, where BARs are assigned by
firmware. However, the quality of BAR assignment by firmware vary
quite a bit. Some firmware simply assigns BARs to on-board PCI devices
and ignore all add-on PCI cards. In that case, Linux cannot solely
rely on the firmware's assignment.
There is another issue of depending on the firmware assignment. You
need to stick with the address range setup by the firmware. In other
words, if the firmware assigns PCI memory space from 0x10000000 to
0x14000000, you cannot easily move it to a different address space
somewhere else in Linux. There three ways to possibly fix this:
. . .
The second way is to do a complete PCI resource assignment before
Linux starts PCI bus scanning. In other words, we discard any PCI
resource assignment done in firmware, if there is any, and do a new
assignment by ourselves. This approach gives us complete control over
the address range and resource allocation. . . .
My firmware for x86 based system doesn't set appropriate BAR value which is works for me.
Is it possible to set BAR value in manual for x86 based system before Linux kernel starts? Maybe Linux kernel has some specific pci boot options?
P.S. No one of pci=option Linux kernel boot options helps me.

How are memory mapped devices allocated an address and how does the CPU know what it is

I gather that the main ways of the CPU addressing devices are "port" and "memory" mapped.
In both of these:
How are devices dynamically assigned an address - who assigns it and how?
How does the CPU then know a device exists, has been assigned and what the address is, particularly its running programs? (how does this work both if the computer is on and off)
How do interrupts work with these devices?
What's the distinction between what the OS and the hardware does?
Is it fair to say that Memory Mapped is the dominant approach in modern systems?
Realise this might be a lot in one go but thanks in advance!
In general, CPU does not know that a specific address is a memory mapped.
it's SW responsibility (BIOS/drivers mainly) to put define the address range as uncacheable (so each read/write will go through to the device and not held internally until WB), out of the core there is some mapping that redirect specific addresses to a device rather than to the DDR (memory).
short answers to part of your bullets (I'm not sure I understand all the questions):
How are devices dynamically assigned an address - who assigns it and how?
Either BIOS define such ranges (the driver communicates on a new device to the BIOS, BIOS save some addresses for plug and play devices)
How does the CPU then know a device exists, has been assigned and what the address is, particularly its running programs? (how does this work both if the computer is on and off)
The CPU doesn't know that, these addresses are treated as normal uncacheable addresses.
Is it fair to say that Memory Mapped is the dominant approach in modern systems?
Yes, it's easier to treat it just another place in memory (it also a bit faster).

How does x86 assigns interrupt number for PCI device in Linux?

My understanding is BIOS or EFI detects the hardware during bootup and determines interrupt number, then passes it to Linux once kernel is up and running. And based on my research the lower the interrupt number the higher its priority.
My question is how does BIOS/EFI decide which hardware should have high priority over another? Is it something that is configurable or is hardcoded by BIOS/EFI?
Kind of.
When using the legacy 8259A PIC chip, one of the priority modes is based on the IRQ number - with lower IRQs having more priority.
However with the IO APIC and the MSI(X) technology the IRQ priority is handled in the LAPIC and it is configurable by the OS.
For the legacy scenario, these devices have fixed IRQs (not configurable).
The priority was assigned so that important/frequent tasks could interrupt less important/frequent ones.
Today those devices are emulated and their IRQ can be reassigned (in same case, it depends on the chipset/superio/embedded controller) if needed but that could cause some compatibility issue.
So every device that impersonate a legacy one (e.g. an HDD) is usually assigned its legacy IRQ number.
A different topic is the PCI interrupts (PCIe deprecated the INTx# lines in favour of MSI) for non legacy devices (e.g. a NIC).
Those were (are) the real programmable IRQs, each PCI-to-PCI bridge remap its four PIRQA-PIRQD input pins to its four INTA#-INTD# output pins (that are connected to the bridge's parent PIRQA-PIRQD pins in a tangled fashion).
The Host-to-PCI-bridge INTA#-INTD# connects (conceptually) to the 8259A and the IO-APIC.
The mapping is configurable with some chipset registers (e.g. see Chapter 29 of the Intel Series 200 PCH datasheet Volume 2).
So the firmware is free to remap at least the PCI interrupts for non legacy devices. I think the algorithm used is simply to assign the lower free IRQ to the most "important" device.
However, as said above, as soon as the OS switch away from the 8259A mode these priorities stop to matter.

Where is PCI BAR0 pointing to?

I have a PCI device which has some memory address inside BAR0. I suppose this memory address is just OS virtual address which points to some physical memory of the device. The question is where it points? Reading the documentation of the device as well as the firmware source code I noticed that this device have some register responsible for setting so called memory windows. I was hopping that BAR0 will point exactly to them, hovewer this is not the case and looks like this:
BAR0 address -> Some unknown memory -> + 0x80000 My memory window
So why is my memory window offset by 0x80000 from where BAR0 points to, where is this BAR0 pointing us to + how is it set any by whom?
Thanks
No. The address in a BAR is the physical address of the beginning of the BAR. That is how the device knows when and how to respond to a memory read or write request. For example, let's say the BAR (BAR0) is of length 128K and has a base address of 0xb840 0000, then the device will respond to a memory read or write to any of these addresses:
0xb840 0000
0xb840 0080
0xb840 1184
0xb841 fffc
but NOT to any of these addresses:
0x5844 0000 (Below BAR)
0xb83f 0000 (Below)
0xb83f fffc (Below)
0xb842 0000 (Above BAR)
0xe022 0000 (Above)
This was more significant in the original PCI where the bus was actually a shared medium and devices might see requests for addresses belonging to other devices. With PCI-Express' point to point architecture, only PCI "bridges" will ever see requests for memory addresses they do not own. But it still functions in exactly the same way. And the low bits of the address space still allow the device to designate different functions / operations to different parts of the space (as in your device, creating the separate memory window you're attempting to access).
Now, how you as a programmer access the BAR memory space is a different question. For virtually all modern systems, all memory accesses made by programs are to virtual addresses. So, in order for your memory access to reach a device, there must be a mapping from the virtual address to the physical address. That is generally done through page tables (though some architectures, like MIPS, have a dedicated area of virtual address space that is permanently mapped to part of the physical address space).
The exact mechanism for allocating virtual address space, and setting up the page tables to map from that space to the BAR physical address space is processor- and OS-dependent. So you will need to allocate some virtual address space, then create page tables mapping from the start of the allocated space to (BAR0) + 0x80000 in order to work with your window. (I'm describing this as two steps, but your OS probably provides a single function call to allocate virtual address space and map it to a physical range in one fell swoop.)
Now, the process of assigning physical address space to the device (that is, actually sticking an address into the BAR) is generally done very early in system initialization by the system BIOS or an analogous early-boot mechanism while it's enumerating all the PCI devices installed in the system. The desired address space size is determined by querying the device, then the base address of a large enough physical address region is written into the BAR.
The final question: why your memory window is at an offset of 0x80000 within the device's address space is completely device-specific and cannot be answered more generally.

PCI Root Complex BAR usage

I want to understand the usage of BARs in the PCIe Root Complex.
The PCIe Root Complex is already a part of the CPU (as a peripheral to it).
And the CPU register spaces is easily accessible. CPU has register to access its various peripheral link PCIe controller, DIMM Controller, USB Controllers etc.
So in this case what is the usage of BAR inside the PCIe RC Config space ?
Secondly I want to understand how the PCIe RC is setup during enumeration with the proper memory windows. For example lets say I have a PCIe device (EP) directly connected to the RC. And in the Config space of EP programmed with some address 'X' with some size 's'. So basically, any read/write from the CPU to the window of 'X' and 'X +s', should go to the PCIe EP. But this should go through the PCIe RC.
Now how the RC knows that it should it should translate the CPU read/write to that memory window into PCIe transaction to the EP ? How does the RC is configured to do that ? Are there any standardized register in PCIe RC where this information is kept ?
/SG
The BAR (base address register) serves 2 purposes:
Before enumeration it holds the requested size of the to be mapped memory.
After enumeration is holds the base address (starting address) of the memory block.
A PCI endpoint (EP) can have up to 6 32-bit BARs. 2 BARs can be combined to a 64-bit BAR.
During enumeration the BIOS or the kernel traverses the PCI tree and reads the BARs and assigns the new base address.
The PCH (Platform Controller Hub / former north bridge) uses the BAR information to route data accesses to main memory or PCI EPs or whatever.
Books on PCI Express:
MindShare Press - PCI Express Technology 3.0

Resources