x86 address space calculation PAE to 36 bits - memory-management

I'm having some hard time understanding PAE. I know it creates a 3rd level of indirection via the PDPT, so that the address translation goes from CR3 -> PDPT(4 entries) -> PD(512 entries) -> PT (512 entries) -> PAGE (4096). But the address is still 32 bits, how do you get 36 bit addresses from this scheme? I'd appreciate an example. How does adding another table "increases" the address space?

PAE changes nothing about 32-bit virtual addresses, only the size of physical address they're mapped to. (Which sucks a lot, nowhere near enough virtual address space to map all those physical pages at once. Linus Torvalds wrote a nice rant about PAE: https://cl4ssic4l.wordpress.com/2011/05/24/linus-torvalds-about-pae/ originally posted on https://www.realworldtech.com/forum/?threadid=76912&curpostid=76973 / https://www.realworldtech.com/forum/?threadid=76912&curpostid=76980)
It also widens a PTE (Page Table Entry) from 4 bytes to 8 bytes, which means 2 levels aren't enough anymore; that's where the small extra level comes to translate the top 2 bits of virtual addresses via those 4 entries.
36-bit only happened to be the supported physical address size in the first generation of CPUs that implemented PAE, Pentium Pro There is no inherent 36-bit limit to PAE.
x86-64 adopted the PTE format, which has room for up to 52-bit physical addresses. Current x86-64 CPUs support the same physical address-size in legacy mode with PAE as they do in 64-bit mode. (As reported by CPUID). That limit is a design choice that saves bits in cache tags, TLB entries, store-buffer entries, etc. and in comparators involved with them. It's normally chosen to be more than the amount of RAM that a real system could actually use, given the commercially available DIMM sizes and number of memory controllers even in multi-socket systems, and still leave room for some I/O address space.
x86-64 came soon after PAE, or soon enough for desktop use to be relevant, so it's a common misconception that PAE is only 36 bits. (Because 64-bit mode is a vastly better way to address more memory, allowing a single process to use more than 2G or 3G depending on user/kernel split.)

Related

Why 4-level paging can only cover 64 TiB of physical address

There are the words in linux/Documentation/x86/x86_64/5level-paging.rst
Original x86-64 was limited by 4-level paging to 256 TiB of virtual address space and 64 TiB of physical address space.
I know that the limit of virtual address is 256TB because 2^48 = 256TB. But I don't know why its limit of physical is only 64TB.
Suppose we set the size of each page to 4k. Thus a linear address has 12 bits of offset, 9 bits indicate the index in each four level, which means 512 entries per level. A linear address can cover 512^4 pages, 512^4 * 4k = 256TB of space.
This is my understanding of the calculation of space limit. I'm wondering what's wrong with it.
The x86-64 ISA's physical address space limit is unchanged by PML5, remaining at 52-bit. Real CPUs implement some narrower number of physical address bits, saving bits in cache tags and TLB entries, among other places.
The 64 TiB limit is not imposed by x86-64 itself, but by the way Linux requires more virtual address space than physical for its own convenience and efficiency. See x86_64/mm.txt for the actual layout of Linux's virtual address space on x86-64 with PML4 paging, and note the 64 TB "direct mapping of all physical memory (page_offset_base)"
x86-64 Linux doesn't do HIGHMEM / LOWMEM
Linux can't actually use more than phys mem = 1/4 virtual address space, without nasty HIGHMEM / LOWMEM stuff like in the bad old days of 32-bit kernels on machines with more than 1 GiB of RAM (vm/highmem.html). (With a 3:1 user:kernel split of address space, letting user-space have 3GiB, but with the kernel having to map pages in/out of its own space if not accessing them via the current process's user-space addresses.)
Linus's rant about 32-bit PAE expands on why it's nice for an OS to have enough virtual address space to keep everything mapped, with the usual assertion that people who don't agree with him are morons. :P I tend to agree with him on this, that there are obvious efficiency advantages and that PAE is a huge PITA for the kernel. Probably even moreso on an SMP system.
If anyone had proposed a patch to add highmem support for x86-64 to allow using more than 64 TiB of physical memory with the existing PML4 format, I'd expect Linus would tell them 1995 called and wants its bad idea back. He wouldn't consider merging such a patch unless much RAM became common for servers, but hardware vendors still hadn't provided an extension for wider virtual addresses.
Fortunately that didn't happen: probably no CPU has supported wider than 46-bit phys addrs without supporting PML5. Vendors know that supporting more RAM than mainline Linux can use wouldn't be a selling point. But as the doc said, commercial systems were getting up to a max capacity of 64 TiB.
x86-64's page-table format has room for 52-bit physical addresses
The x86-64 page-table format itself has always had that much room: Why in x86-64 the virtual address are 4 bits shorter than physical (48 bits vs. 52 long)? has diagrams from AMD's manuals. Of course early CPUs had narrower physical addresses so you couldn't for example have a PCIe device put its device memory way up high in physical address space.
Your calculation has nothing to do with physical address limits, which is set by the number of bits in each page-table entry that can be used for that.
In x86-64 (and PAE), the page table format reserves bits up to bit #51 for use as physical-address bits, so OSes must zero them for forward compatibility with future CPUs. The low 12 bits are used for other things, but the physical address is formed by zeroing out the bits other than the phys-address bits in the PTE, so those low 12 bits become the low zero bits in an aligned physical-page address.
x86 terminology note: logical addresses are seg:off, and segment_base + offset gives you a linear address. With paging enabled (as required in long mode), linear addresses are virtual, and are what's used as a search key for the page tables (effectively a radix tree cached by the TLB).
Your calculation is just correctly reiterating the 256 TiB size of virtual address space, based on 4-level page tables with 4k pages. That's how much memory can be simultaneously mapped with PML4.
A physical page has to be the same size as a virtual page, and in x86-64 yes that's 4 KiB. (Or 2M largepage or 1G hugepage).
Fun fact: the x86-64 page-table-entry format is the same as PAE, so modern CPUs can also access large amounts of memory 32-bit mode. But of course not map it all at once. It's probably not a coincidence that AMD chose to use an existing well-designed format when making AMD64, so their CPUs would only need two different modes for hardware page-table walker: legacy x86 with 4-byte PTEs (10 bits per level) and PAE/AMD64 with 8-byte PTEs (9 bits per level).

What is the maximum physical memory? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
Consider a system where with 32 bits for adress .6 bits are used for segment so we have 2^6=254 segments.14 bits are used for paging= so we have 2^14= 16K pages.12 bits are used offset= so we have 2^12=4KB page size.My question is what is the maximum physical memory that can be supported by the system? A solution i am considering is that If a page table entry is 32-bit long it can give 32 bits to use as the high part of the physical address. So the maximum phyiscal memory that can be supported will be 2^32*2^14=2^46 but i have no idea if thats correct i mean segments don't play
Phys address size is not uniquely determined by virtual address size and page size.
Instead the upper limit of physical memory size for an ISA is determined by the page size and the number of physical page-address bits in a page-table entry.
For example, x86-64 (and x86 32-bit with PAE) have PTEs with room for 52-bit physical page-frame addresses.
The PTE itself has 40 of those bits, and the low 12 have to be 0 (page-frames are naturally aligned). x86 / x86-64 uses 4k pages = 12 bits for the byte-within-page part of physical and virtual addresse. Why in 64bit the virtual address are 4 bits short (48bit long) compared with the physical address (52 bit long)? has diagrams of the format and some nice explanation.
The architects of that page-table format chose to align the page-number bitfield so it starts at bit #12, with bits 11:0 holding flags. So the position of the top of the field is the physical address width. If they had more or fewer flags than page-offset bits, that wouldn't be the case.
In practice real hardware might only implement some lower number of physical bits. For example, my i7-6700k desktop Skylake reports (via CPUID) that it implement 39-bit physical addresses (and 48-bit virtual). In that case the higher bits above 39 in a page-table entry are reserved.
(Fewer physical bits supported means smaller cache tags, and smaller TLB entries, among other things.)
Fun fact: PML5 extends x86-64's paging scheme from 4 levels (48-bit virtual) to 5-level (57-bit virtual) with no change in physical address width. That's another good reminder that physical and virtual address width are independent.
Also note that not having enough virtual address space to map all the RAM makes it really inconvenient to write an OS. Linus Torvalds wrote an entertaining and informative rant about PAE (wide physical addresses for 32-bit virtual addresses on 32-bit x86), quoted on someone's blog.
Your 32-bit virtual space for 44-bit physical would be really hard for an OS to use.

How can Windows split its virtual memory space asymmetrically?

According to the AMD64 Architecture Programmer's Manual Volume 2 (system programming), a logical address is valid only if the bits 48-63 are all the same as bit 47:
5.3.1 Canonical Address Form
The AMD64 architecture requires implementations supporting fewer than the full 64-bit virtual address to ensure that those addresses are in canonical form. An address is in canonical form if the address bits from the most-significant implemented bit up to bit 63 are all ones or all zeros. If the addresses of all bytes in a virtual-memory reference are not in canonical form, the processor generates a general-protection exception (#GP) or a stack fault (#SS) as appropriate.
So it seems the only valid address ranges are 0x0000_0000_0000_0000 ~ 0x0000_7FFF_FFFF_FFFF and 0xFFFF_8000_0000_0000 ~ 0xFFFF_FFFF_FFFF_FFFF, that is, the lower 128 TiB and higher 128 TiB. However, according to MSDN, the addresses used by Windows x64 kernel don't seem to be the case.
In 64-bit Windows, the theoretical amount of virtual address space is 2^64 bytes (16 exabytes), but only a small portion of the 16-exabyte range is actually used. The 8-terabyte range from 0x000'00000000 through 0x7FF'FFFFFFFF is used for user space, and portions of the 248-terabyte range from 0xFFFF0800'00000000 through 0xFFFFFFFF'FFFFFFFF are used for system space.
So, how can Windows split the virtual address space into lower 8 TiB and higher 248 TiB, despite the hardware specification? I'd like to know why it doesn't cause any problems with the hardware that checks whether the addresses are canonical.
**UPDATE: ** Seems like Microsoft fixed this discrepancy in Windows 8.1. See https://www.facebook.com/codemachineinc/posts/567137303353192 for details.
You're right; current x86-64 hardware with 48-bit virtual address support requires that the high 16 bits be the sign-extension of the low 48 (i.e. bit 47 matches bits [63:48]). That means about half of the 0xFFFF0800'00000000 to 0xFFFFFFFF'FFFFFFFF range is non-canonical on current x86-64 hardware.
Windows is just describing how it carves up the full 64-bit virtual address space, not which parts of that are actually in use on current x86-64 hardware. It can of course only use the 128 TiB that is canonical, from 0xFFFF8000'00000000 to -1. (Note the position of the 8; there's no gap between it and the high 16 bytes that are all-ones, unlike in the theoretical Windows range.)
Top-end servers can be built with 6TiB of RAM or maybe even more. (Xeon Platinum Scalable Processors are apparently available with up to 1.5TiB per socket, and up to 8-way, e.g. the 8180M).
Intel has proposed an extension for larger physical and virtual addressing that adds another level of page tables, https://software.intel.com/sites/default/files/managed/2b/80/5-level_paging_white_paper.pdf, so OSes will hopefully not be stuck without enough virtual address space to map all the RAM (like in the bad old days of PAE on 32-bit-only systems) before we have systems that have more than 128TiB of physical RAM.

Does paging let us use physical memory that is larger than what can be addressed by the CPU’s address pointer length?

I was reading the dinosaur book on Operating System about memory management. I assume this is one of the best books but there's something about paging written in the book which I don't get.
The book says, "A 32-bit CPU uses 32-bit addresses, meaning that a given process space can only be 2^32 bytes (4 TB ). Therefore, paging lets us use physical memory that is larger than what can be addressed by the CPU’s address pointer length."
I don't quite get this part because if the CPU can only refer to 2^32 different physical addresses, if there were 2^32+1 physical addresses, the last address won't be able to be reached by the CPU. So how can paging help with this?
Also, earlier the book says "Frequently, on a 32-bit CPU , each page-table entry is 4 bytes long, but that size can vary as well. A 32-bit entry can point to one of 2^32 physical page frames. If frame size is 4 KB (2^12 ), then a system with 4-byte entries can address 2^44 bytes (or 16 TB ) of physical memory."
I don't see how that is even possible in ideal/theoretical situations, cuz as I understand it, part of the virtual address will refer to an entry of the page table while the other part of the virtual address will refer to the off-set of that particular type in that page. So in the above-mentioned situation put forward by the book, even if the CPU could point to 2^32 different page entries, it won't be able to read any particular byte within that page cuz it doesn't specify the office.
Maybe I've misunderstood the book or there is some part that I missed out. I much appreciate your help! Thanks a lot!
It sounds like you need to burn your book. It's useless.
"[P]aging lets us use physical memory that is larger than what can be addressed by the CPU’s address pointer length" is complete nonsense (unless the book is assigning two different meanings to the term "paging," in which it is still useless).
Let's start with logical addressing. A logical address is composed of a page selector and and offset into the page. Some number (P) of bits will be assigned to the page selector and the remained will be assigned to the offset. If pages are 2^9 bits, there are 23 bits in the page selector and 9 bits for the byte offset within the page.
Note that the 9/23 pick are arbitrary on my part. Most systems these days use larger pages but these are values have been used in the past.
The 23 bits in the page selector are indices into the process page table.
The size of entries in the page table are going to be a power of 2 (and I have never seen one less than 4). For our purposes let's say that each entry is 8-bytes long.
The bits in the page table entry are divided between those that index physical page frames and control bits. let's make the arbitrary choice that 32 bits index page frames and 32 bits are used for control.
That means the system can theoretically MANAGE 2^32 pages that are 2^9 bytes large or a total of 2^41 bytes. If we were to increase the page size from 2^9 to 2^20, the system could theoretically MANAGE 2^52 (32+20) bytes of memory.
Note that each process can still only ACCESS 2^32 bytes. But in my 9-bit page system, 2^9 processes could each access 2^32 pages simultaneously on a system with 2^41 physical bytes of memory (ignoring the need for a shared system address space in this gross oversimplification).
Note that if I change my page table to 32-bits and assign 9 of those bits to control and and 23 to page frame selection, the system can only MANAGE 2^32 bytes of memory (and that was more common than managing greater than 2^32 bytes).
You quote: "Frequently, on a 32-bit CPU , each page-table entry is 4 bytes long, but that size can vary as well. A 32-bit entry can point to one of 2^32 physical page frames. If frame size is 4 KB (2^12 ), then a system with 4-byte entries can address 2^44 bytes (or 16 TB ) of physical memory."
This is theoretical BS. A system that used all 32 bites of the page table entry as an index to page frames could not function. There would have to be some control bits in the page table.
The quotes you are taking from this book are highly misleading. Few (any?) 32-bit processors could even access 2^32 bytes of memory due to address line limitations.
While it is possible that the use of logical pages could allow a processor to manage more memory that the logical address size suggests, that was not the purpose of managing memory in pages.
The purpose of paging—which in its normal and customary usage refers to the movement of virtual memory pages between physical page frames and secondary storage—is to allow processes to access more virtual memory than there was physical memory on the system.
There is an additional system of memory management that is (thankfully) dying out: segments. Segments also provided a means for systems to manage more physical memory than the logical address space would allow.

Why does the 8086 use an extra register to address 1MB of memory?

I heard that the 8086 has 16-bit registers which allow it to only address 64K of memory. Yet it is still able to address 1MB of memory which would require 20-bit registers. It does this by using another register to to hold another 16 bits, and then adds the value in the 16-bit registers to the value in this other register to be able to generate numbers which can address up to 1MB of memory. Is that right?
Why is it done this way? It seems that there are 32-bit registers, which is more than sufficient to address 1MB of memory.
Actually this has nothing to do with number of registers. Its the size of the register which matters. A 16 bit register can hold up to 2^16 values so it can address 64K bytes of memory.
To address 1M, you need 20 bits (2^20 = 1M), so you need to use another register for the the additional 4 bits.
The segment registers in an 8086 are also sixteen bits wide. However, the segment number is shifted left by four bits before being added to the base address. This gives you the 20 bits.
the 8088 (and by extension, 8086) is instruction compatible with its ancestor, the 8008, including the way it uses its registers and handles memory addressing. the 8008 was a purely 16 bit architecture, which really couldn't address more than 64K of ram. At the time the 8008 was created, that was adequate for most of its intended uses, but by the time the 8088 was being designed, it was clear that more was needed.
Instead of making a new way for addressing more ram, intel chose to keep the 8088 as similar as possible to the 8008, and that included using 16 bit addressing. To allow newer programs to take advantage of more ram, intel devised a scheme using some additional registers that were not present on the 8008 that would be combined with the normal registers. these "segment" registers would not affect programs that were targeted at the 8008; they just wouldn't use those extra registers, and would only 'see' 16 addres bits, the 64k of ram. Applications targeting the newer 8088 could instead 'see' 20 address bits, which gave them access to 1MB of ram
I heard that the 8086 has 16 registers which allow it to only address 64K of memory. Yet it is still able to address 1MB of memory which would require 20 registers.
You're misunderstanding the number of registers and the registers' width. 8086 has eight 16-bit "general purpose" registers (that can be used for addressing) along with four segment registers. 16-bit addressing means that it can only support 216 B = 64 KB of memory. By getting 4 more bits from the segment registers we'll have 20 bits that can be used to address a total of 24*64KB = 1MB of memory
Why is it done this way? It seems that there are 32 registers, which is more than sufficient to address 1MB of memory.
As said, the 8086 doesn't have 32 registers. Even x86-64 nowadays don't have 32 general purpose registers. And the number of registers isn't relevant to how much memory a machine can address. Only the address bus width determines the amount of addressable memory
At the time of 8086, memory is extremely expensive and 640 KB is an enormous amount that people didn't think that would be reached in the near future. Even with a lot of money one may not be able to get that large amount of RAM. So there's no need to use the full 32-bit address
Besides, it's not easy to produce a 32-bit CPU with the contemporary technology. Even 64-bit CPUs today aren't designed to use all 64-bit address lines
Why can't OS use entire 64-bits for addressing? Why only the 48-bits?
Why do x86-64 systems have only a 48 bit virtual address space?
It'll takes more wires, registers, silicons... and much more human effort to design, debug... a CPU with wider address space. With the limited transistor size of the technology in the 70s-80s that may not even come into reality.
8086 doesn't have any 32-bit integer registers; that came years later in 386 which had a much higher transistor budget.
8086's segmentation design made sense for a 16-bit-only CPU that wanted to be able to use 20-bit linear addresses.
Segment registers could have been only 8-bit or something with a larger shift, but apparently there are some advantages to fine-grained segmentation where a segment start address can be any 16-byte aligned linear address. (A linear address is computed from (seg << 4) + off.)

Resources