how is the communication of PCIe using BAR defined?

how is the communication of PCIe using BAR defined? - pci-e

I'm a total beginner in PCIe and have to develop a simple PCIe driver.
If I do have a PCIe device with a memory of 1kByte, what does the BAR contain? The addresses for the 1kByte space?
And what does it mean that the BAR is "mapped" into the memory or i/o address space?
Tried to find the answers in different books, without success...
Best regards
Thomas

The BAR will initially contain the type of the requested memory (bit 0 - MEM or IO) if it's a MEM bar you'll need to choose if it's a 32b or 64b region bar and also if it corresponds to a prefetch able (cachable) region. These attributes occupies the 4 LSBs or the BAR. The rest of the bar is for the requested size for allocation , the size that your device requests for allocation (bit for byte) should be read-only bits with value of 0. For example, a 1 kb memory would have to be represented in the bar that have bits 4-9 read only values set to 0. Bits 3-0 will have the attributes.

Related

What is the maximum physical memory? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
Consider a system where with 32 bits for adress .6 bits are used for segment so we have 2^6=254 segments.14 bits are used for paging= so we have 2^14= 16K pages.12 bits are used offset= so we have 2^12=4KB page size.My question is what is the maximum physical memory that can be supported by the system? A solution i am considering is that If a page table entry is 32-bit long it can give 32 bits to use as the high part of the physical address. So the maximum phyiscal memory that can be supported will be 2^32*2^14=2^46 but i have no idea if thats correct i mean segments don't play

Phys address size is not uniquely determined by virtual address size and page size.
Instead the upper limit of physical memory size for an ISA is determined by the page size and the number of physical page-address bits in a page-table entry.
For example, x86-64 (and x86 32-bit with PAE) have PTEs with room for 52-bit physical page-frame addresses.
The PTE itself has 40 of those bits, and the low 12 have to be 0 (page-frames are naturally aligned). x86 / x86-64 uses 4k pages = 12 bits for the byte-within-page part of physical and virtual addresse. Why in 64bit the virtual address are 4 bits short (48bit long) compared with the physical address (52 bit long)? has diagrams of the format and some nice explanation.
The architects of that page-table format chose to align the page-number bitfield so it starts at bit #12, with bits 11:0 holding flags. So the position of the top of the field is the physical address width. If they had more or fewer flags than page-offset bits, that wouldn't be the case.
In practice real hardware might only implement some lower number of physical bits. For example, my i7-6700k desktop Skylake reports (via CPUID) that it implement 39-bit physical addresses (and 48-bit virtual). In that case the higher bits above 39 in a page-table entry are reserved.
(Fewer physical bits supported means smaller cache tags, and smaller TLB entries, among other things.)
Fun fact: PML5 extends x86-64's paging scheme from 4 levels (48-bit virtual) to 5-level (57-bit virtual) with no change in physical address width. That's another good reminder that physical and virtual address width are independent.
Also note that not having enough virtual address space to map all the RAM makes it really inconvenient to write an OS. Linus Torvalds wrote an entertaining and informative rant about PAE (wide physical addresses for 32-bit virtual addresses on 32-bit x86), quoted on someone's blog.
Your 32-bit virtual space for 44-bit physical would be really hard for an OS to use.

Relation between size of address bus and memory size; memory Segmentation in 8086

My question is related to memory segmentation in 8086. I learnt that,
8086 has a 20 bit address bus. And so it can address 2^20 different addresses. Which means it has an memory size of 2^20, i.e, 1MB.
I have a few doubts:
What I understand from the fact that 8086 has a 20 bit address bus is that it could have 2^20 different combinations of 0s and 1s, each of which represents one physical address. What I don't understand is that how does 2^20 different address locations mean 1 MB of addressable memory? How is total number of different addresses locations related to memory size (in Megabytes)?
Also, correct me if I'm wrong, the 16 bit segment registers in 8086 hold the starting address of the different segments in the memory (Code, Stack, Data, Extra).My question is, aren't the addresses in memory of 20 bits? Then how can the 16 bit register hold 20 bit addresses? If it contains the upper 16 bit of the 20 bit address, how does the processor make out to which exact address location it has to point?
P.S: I am a beginner is micro-processors and total reliant on self study, so kindly excuse if my questions seem a bit silly.
Thanks in advance.

For this question, its important to remember there is a different between the number of possible memory addresses and the amount of actual memory (RAM) installed in the system. For the 8086, memory addresses are 20-bits long as you note, so that means there are 2^20 possible memory addresses (which is exactly 1 MiB in size since 1 MiB is 1024 or 2^10 KiB and 1 KiB is 1024 or 2^10 Bytes). This does NOT mean the system has 1 MiB worth of RAM necessarily, it very likely has less but the most addresses the 8086 could possibly address is 1 MiB; so if nothing but RAM was in the address space, the most RAM it could possibly have is 1 MiB. Frequently, you might have gaps in the address space not filled with anything, some of the address space is used for ROM or other peripherals. So, that size of the address space is 1 MiB but that does not mean there is 1 MiB of RAM/memory in the system.
Correct, the segment registers are all 16-bits for the 8086. A memory address is created by combining the appropriate segment register with the argument (the argument being the result of whatever the addressing mode being used by the instruction) by adding the argument to the segment register's value shifted by 4 bits. So, if for example the ss is 0x1111, sp is at 0x2222 and you preform a push ax instruction, the 20-bit address to which the value is pushed is (ss << 4) + sp or 0x11110 + 0x02222 = 0x13332. More information can be found on Wikipedia under the Real Mode section: https://en.wikipedia.org/wiki/X86_memory_segmentation

how does windows device manger resources correspond to PCI config space six BARs

after windows boot up, I use windbg to check PCI config space to take a look six bars.
3: kd> !pci 100 3 0 0
10: BAR0 d000000c ‭11010000000000000000000000001100‬
14: BAR1 00000000
18: BAR2 e000000c ‭11100000000000000000000000001100‬
1c: BAR3 00000000
20: BAR4 0000e001 ‭1110000000000001‬
24: BAR5 efd00000 ‭11101111110100000000000000000000‬
from this i can know there are two 64-bit memory location and one 32-bit memory location.
At the same time i open device manager --> device properties --> Resources
and i get this:
Memory Range 00000000D0000000 - 00000000DFFFFFFF
Memory Range 00000000E0000000 - 00000000E01FFFFF
Memory Range 00000000EFD00000 - 00000000EFD7FFFF
Memory Range 00000000000A0000 - 00000000000BFFFF
from this list there are four memory addresses, while PCI config space indicates three, why?
and also how does device manger know size of each memory ?

I can't answer the portion about where memory range A0000 showed up but I can answer some of your questions.
how does device manger know size of each memory ?
This is actually fairly easy to answer. Sadly I cannot link to a direct source since I lack access to the spec but I can explain it. Before the Host OS (Windows) allocates addresses to place at the BARs register (configuration space), the device has data placed in the BAR register. The content found there is the length of data to allocate for that BAR. What is the Base Address Register (BAR) in PCIe? https://en.wikipedia.org/wiki/PCI_configuration_space
reads back the device's requested memory size
Now with that said I do notice some strange things with your BARs so please reference this (https://wiki.osdev.org/PCI) for what I'm saying.
Your first 2 BARs (BAR0, and BAR2) are indeed 64bit memory BARs but specifically they are prefetchable. BAR1 and BAR3 are the other half of the 64bit BAR which is unused, not terribly uncommon for 64bit BARs to waste that space. Next on BAR4 we see that bit1 is set implying this is I/O space and not memory space. Then the last one BAR5 I believe you are wrong, to me this looks like a 32bit not prefetchable Memory space BAR. In my experience systems set BARs to 00000000 when they are unused, not this random address.
Now looking at what windows is reporting we see 4 BARs (makes sense to me). With a size that it determined when it first read the BAR (so you have no way to verify at this point it's already changed and long gone). However, strangely your I/O address changed from 0000e000 to ....A0000. This could be to do with how I/O space works I honestly have no experience with I/O space BARs. But hopefully the other info helps.

Virtually indexed physically tagged cache Synonym

I am not able to entirely grasp the concept of synonyms or aliasing in VIPT caches.
Consider the address split as:-
Here, suppose we have 2 pages with different VA's mapped to same physical address(or frame no).
The pageno part of VA (bits 13-39) which are different gets translated to PFN of PA(bits 12-35) and the PFN remains same for both the VA's as they are mapped to same physical frame.
Now the pageoffset part(bits 0-13) of both the VA's are same as the data which they want to access from a particular frame no is same.
As the pageoffset part of both VA's are same, bits (5-13) will also be same, so the index or set no is the same and hence there should be no aliasing as only single set or index no is mapped to a physical frame no.
How is bit 12 as shown in the diagram, responsible for aliasing ? I am not able to understand that.
It would be great if someone could give an example with the help of addresses.
Note: this diagram has a minor error that doesn't affect the question: 36 - 12 = 24-bit tags for 36-bit physical addresses, not 28. MIPS64 R4x00 CPUs do in fact have 40-bit virtual, 36-bit physical addresses, and 24-bit tags, according to chapters 4 and 11 of the manual.
This diagram is from http://www.cse.unsw.edu.au/~cs9242/02/lectures/03-cache/node8.html which does label it as being for MIPS R4x00.

The page offset is bits 0-11, not 0-13. Look at your bottom diagram: the page offset is the low 12 bits, so you have 4k pages (like x86 and other common architectures).
If any of the index bits come from above the page offset, VIPT no longer behaves like a PIPT with free translation for the index bits. That's the case here.
A process can have the same physical page (frame) mapped to 2 different virtual pages.
Your claim that The pageno part of VA (bits 13-39) which are different gets translated to PFN of PA(bits 12-35) and the PFN remains same for both the VA's is totally bogus. Translation can change bit #12. So one of the index bits really is virtual and not also physical, so two entries for the same physical line can go in different sets.
I think my main confusion is regarding the page offset range. Is it the same for both PA and VA (that is 0-11) or is it 0-12 for VA and 0-11 for PA? Will they always be same?
It's always the same for PA and VA. The page offset isn't marked on the VA part of your diagram, only the range of bits used as the index.
It wouldn't make sense for it to be any different: virtual and physical memory are both byte-addressable (or word-addressable). And of course a page frame (physical page) is the same size as a virtual page. Right or left shifting an address during translation from virtual to physical would make no sense.
As discussed in comments:
I did eventually find http://www.cse.unsw.edu.au/~cs9242/02/lectures/03-cache/node8.html (which includes the diagram in the question!). It says the same thing: physical tagging does solve the cache homonym problem as an alternative to flushing on context switch.
But not the synonym problem. For that, you can have the OS ensure that bit 12 of every VA = bit 12 of every PA. This is called page coloring.
Page coloring would also solve the homonym problem without the hardware doing overlapping tag bits, because it gives 1 more bit that's the same between physical and virtual address. phys idx = virt idx. (But then the HW would be relying on software to be correct, if it wanted to depend on this invariant.)
Another reason for having the tag overlap the index is write-back during eviction:
Outer caches are almost always PIPT, and memory itself obviously needs the physical address. So you need the physical address of a line when you send it out the memory hierarchy.
A write-back cache needs to be able to evict dirty lines (send them to L2 or to physical RAM) long after the TLB check for the store was done. Unlike a load, you don't still have the TLB result floating around unless you stored it somewhere. How does the VIPT to PIPT conversion work on L1->L2 eviction
Having the tag include all the physical address bits above the page offset solves this problem: given the page-offset index bits and the tag, you can construct the full physical address.
(Another solution would be a write-through cache, so you do always have the physical address from the TLB to send with the data, even if it's not reconstructable from the cache tag+index. Or for read-only caches, e.g. instruction caches, there is no write-back; eviction = drop.)

Does paging let us use physical memory that is larger than what can be addressed by the CPU’s address pointer length?

I was reading the dinosaur book on Operating System about memory management. I assume this is one of the best books but there's something about paging written in the book which I don't get.
The book says, "A 32-bit CPU uses 32-bit addresses, meaning that a given process space can only be 2^32 bytes (4 TB ). Therefore, paging lets us use physical memory that is larger than what can be addressed by the CPU’s address pointer length."
I don't quite get this part because if the CPU can only refer to 2^32 different physical addresses, if there were 2^32+1 physical addresses, the last address won't be able to be reached by the CPU. So how can paging help with this?
Also, earlier the book says "Frequently, on a 32-bit CPU , each page-table entry is 4 bytes long, but that size can vary as well. A 32-bit entry can point to one of 2^32 physical page frames. If frame size is 4 KB (2^12 ), then a system with 4-byte entries can address 2^44 bytes (or 16 TB ) of physical memory."
I don't see how that is even possible in ideal/theoretical situations, cuz as I understand it, part of the virtual address will refer to an entry of the page table while the other part of the virtual address will refer to the off-set of that particular type in that page. So in the above-mentioned situation put forward by the book, even if the CPU could point to 2^32 different page entries, it won't be able to read any particular byte within that page cuz it doesn't specify the office.
Maybe I've misunderstood the book or there is some part that I missed out. I much appreciate your help! Thanks a lot!

It sounds like you need to burn your book. It's useless.
"[P]aging lets us use physical memory that is larger than what can be addressed by the CPU’s address pointer length" is complete nonsense (unless the book is assigning two different meanings to the term "paging," in which it is still useless).
Let's start with logical addressing. A logical address is composed of a page selector and and offset into the page. Some number (P) of bits will be assigned to the page selector and the remained will be assigned to the offset. If pages are 2^9 bits, there are 23 bits in the page selector and 9 bits for the byte offset within the page.
Note that the 9/23 pick are arbitrary on my part. Most systems these days use larger pages but these are values have been used in the past.
The 23 bits in the page selector are indices into the process page table.
The size of entries in the page table are going to be a power of 2 (and I have never seen one less than 4). For our purposes let's say that each entry is 8-bytes long.
The bits in the page table entry are divided between those that index physical page frames and control bits. let's make the arbitrary choice that 32 bits index page frames and 32 bits are used for control.
That means the system can theoretically MANAGE 2^32 pages that are 2^9 bytes large or a total of 2^41 bytes. If we were to increase the page size from 2^9 to 2^20, the system could theoretically MANAGE 2^52 (32+20) bytes of memory.
Note that each process can still only ACCESS 2^32 bytes. But in my 9-bit page system, 2^9 processes could each access 2^32 pages simultaneously on a system with 2^41 physical bytes of memory (ignoring the need for a shared system address space in this gross oversimplification).
Note that if I change my page table to 32-bits and assign 9 of those bits to control and and 23 to page frame selection, the system can only MANAGE 2^32 bytes of memory (and that was more common than managing greater than 2^32 bytes).
You quote: "Frequently, on a 32-bit CPU , each page-table entry is 4 bytes long, but that size can vary as well. A 32-bit entry can point to one of 2^32 physical page frames. If frame size is 4 KB (2^12 ), then a system with 4-byte entries can address 2^44 bytes (or 16 TB ) of physical memory."
This is theoretical BS. A system that used all 32 bites of the page table entry as an index to page frames could not function. There would have to be some control bits in the page table.
The quotes you are taking from this book are highly misleading. Few (any?) 32-bit processors could even access 2^32 bytes of memory due to address line limitations.
While it is possible that the use of logical pages could allow a processor to manage more memory that the logical address size suggests, that was not the purpose of managing memory in pages.
The purpose of paging—which in its normal and customary usage refers to the movement of virtual memory pages between physical page frames and secondary storage—is to allow processes to access more virtual memory than there was physical memory on the system.
There is an additional system of memory management that is (thankfully) dying out: segments. Segments also provided a means for systems to manage more physical memory than the logical address space would allow.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio