How is the 34 bit physical address space accessed in a RISC-V 32 bit system when virtual memory is disabled? - virtual-memory

In the RISC-V 32 bit ISA, the physical address space is 34 bit with a 32 bit virtual address space. When virtual memory is enabled in supervisor mode the 32 bit virtual address is translated by accessing the page table, yielding a 34 bit physical address. When virtual memory is disabled however, the 32 bit addresses still must be converted to a 34 bit physical address. In the RISC-V privileged ISA specification in section 4.1.12 it states:
When MODE=Bare,supervisor virtual addresses are equal to supervisor physical addresses
So, my question is: does this mean that only the low 4GB (bottom 32 bits) of memory are able to be accessed in supervisor mode with virtual memory disabled? If so, then how is the rest of the 16 GB (34 bit) physical memory supposed to be accessed in supervisor mode when virtual memory is disabled?
SV32 Virtual and Physical Addressing

Someone asked a similar question in an issue on the Github repo for the ISA manual. It appears to be the case that when running with MODE=Bare with RV32, you can only access the bottom 4GiB of the 34-bit physical address space, and the top 12GiB are inaccessible. The 32-bit register values are zero-extended into 34-bit physical addresses.
While this isn't explicitly stated in the manual, it does say in the caption for Figure 4.17 in the Privileged ISA spec that “when mapping between narrower and wider addresses, RISC-V usually zero-extends a narrower address to a wider size.”

Related

Why 4-level paging can only cover 64 TiB of physical address

There are the words in linux/Documentation/x86/x86_64/5level-paging.rst
Original x86-64 was limited by 4-level paging to 256 TiB of virtual address space and 64 TiB of physical address space.
I know that the limit of virtual address is 256TB because 2^48 = 256TB. But I don't know why its limit of physical is only 64TB.
Suppose we set the size of each page to 4k. Thus a linear address has 12 bits of offset, 9 bits indicate the index in each four level, which means 512 entries per level. A linear address can cover 512^4 pages, 512^4 * 4k = 256TB of space.
This is my understanding of the calculation of space limit. I'm wondering what's wrong with it.
The x86-64 ISA's physical address space limit is unchanged by PML5, remaining at 52-bit. Real CPUs implement some narrower number of physical address bits, saving bits in cache tags and TLB entries, among other places.
The 64 TiB limit is not imposed by x86-64 itself, but by the way Linux requires more virtual address space than physical for its own convenience and efficiency. See x86_64/mm.txt for the actual layout of Linux's virtual address space on x86-64 with PML4 paging, and note the 64 TB "direct mapping of all physical memory (page_offset_base)"
x86-64 Linux doesn't do HIGHMEM / LOWMEM
Linux can't actually use more than phys mem = 1/4 virtual address space, without nasty HIGHMEM / LOWMEM stuff like in the bad old days of 32-bit kernels on machines with more than 1 GiB of RAM (vm/highmem.html). (With a 3:1 user:kernel split of address space, letting user-space have 3GiB, but with the kernel having to map pages in/out of its own space if not accessing them via the current process's user-space addresses.)
Linus's rant about 32-bit PAE expands on why it's nice for an OS to have enough virtual address space to keep everything mapped, with the usual assertion that people who don't agree with him are morons. :P I tend to agree with him on this, that there are obvious efficiency advantages and that PAE is a huge PITA for the kernel. Probably even moreso on an SMP system.
If anyone had proposed a patch to add highmem support for x86-64 to allow using more than 64 TiB of physical memory with the existing PML4 format, I'd expect Linus would tell them 1995 called and wants its bad idea back. He wouldn't consider merging such a patch unless much RAM became common for servers, but hardware vendors still hadn't provided an extension for wider virtual addresses.
Fortunately that didn't happen: probably no CPU has supported wider than 46-bit phys addrs without supporting PML5. Vendors know that supporting more RAM than mainline Linux can use wouldn't be a selling point. But as the doc said, commercial systems were getting up to a max capacity of 64 TiB.
x86-64's page-table format has room for 52-bit physical addresses
The x86-64 page-table format itself has always had that much room: Why in x86-64 the virtual address are 4 bits shorter than physical (48 bits vs. 52 long)? has diagrams from AMD's manuals. Of course early CPUs had narrower physical addresses so you couldn't for example have a PCIe device put its device memory way up high in physical address space.
Your calculation has nothing to do with physical address limits, which is set by the number of bits in each page-table entry that can be used for that.
In x86-64 (and PAE), the page table format reserves bits up to bit #51 for use as physical-address bits, so OSes must zero them for forward compatibility with future CPUs. The low 12 bits are used for other things, but the physical address is formed by zeroing out the bits other than the phys-address bits in the PTE, so those low 12 bits become the low zero bits in an aligned physical-page address.
x86 terminology note: logical addresses are seg:off, and segment_base + offset gives you a linear address. With paging enabled (as required in long mode), linear addresses are virtual, and are what's used as a search key for the page tables (effectively a radix tree cached by the TLB).
Your calculation is just correctly reiterating the 256 TiB size of virtual address space, based on 4-level page tables with 4k pages. That's how much memory can be simultaneously mapped with PML4.
A physical page has to be the same size as a virtual page, and in x86-64 yes that's 4 KiB. (Or 2M largepage or 1G hugepage).
Fun fact: the x86-64 page-table-entry format is the same as PAE, so modern CPUs can also access large amounts of memory 32-bit mode. But of course not map it all at once. It's probably not a coincidence that AMD chose to use an existing well-designed format when making AMD64, so their CPUs would only need two different modes for hardware page-table walker: legacy x86 with 4-byte PTEs (10 bits per level) and PAE/AMD64 with 8-byte PTEs (9 bits per level).

x86 address space calculation PAE to 36 bits

I'm having some hard time understanding PAE. I know it creates a 3rd level of indirection via the PDPT, so that the address translation goes from CR3 -> PDPT(4 entries) -> PD(512 entries) -> PT (512 entries) -> PAGE (4096). But the address is still 32 bits, how do you get 36 bit addresses from this scheme? I'd appreciate an example. How does adding another table "increases" the address space?
PAE changes nothing about 32-bit virtual addresses, only the size of physical address they're mapped to. (Which sucks a lot, nowhere near enough virtual address space to map all those physical pages at once. Linus Torvalds wrote a nice rant about PAE: https://cl4ssic4l.wordpress.com/2011/05/24/linus-torvalds-about-pae/ originally posted on https://www.realworldtech.com/forum/?threadid=76912&curpostid=76973 / https://www.realworldtech.com/forum/?threadid=76912&curpostid=76980)
It also widens a PTE (Page Table Entry) from 4 bytes to 8 bytes, which means 2 levels aren't enough anymore; that's where the small extra level comes to translate the top 2 bits of virtual addresses via those 4 entries.
36-bit only happened to be the supported physical address size in the first generation of CPUs that implemented PAE, Pentium Pro There is no inherent 36-bit limit to PAE.
x86-64 adopted the PTE format, which has room for up to 52-bit physical addresses. Current x86-64 CPUs support the same physical address-size in legacy mode with PAE as they do in 64-bit mode. (As reported by CPUID). That limit is a design choice that saves bits in cache tags, TLB entries, store-buffer entries, etc. and in comparators involved with them. It's normally chosen to be more than the amount of RAM that a real system could actually use, given the commercially available DIMM sizes and number of memory controllers even in multi-socket systems, and still leave room for some I/O address space.
x86-64 came soon after PAE, or soon enough for desktop use to be relevant, so it's a common misconception that PAE is only 36 bits. (Because 64-bit mode is a vastly better way to address more memory, allowing a single process to use more than 2G or 3G depending on user/kernel split.)

How can Windows split its virtual memory space asymmetrically?

According to the AMD64 Architecture Programmer's Manual Volume 2 (system programming), a logical address is valid only if the bits 48-63 are all the same as bit 47:
5.3.1 Canonical Address Form
The AMD64 architecture requires implementations supporting fewer than the full 64-bit virtual address to ensure that those addresses are in canonical form. An address is in canonical form if the address bits from the most-significant implemented bit up to bit 63 are all ones or all zeros. If the addresses of all bytes in a virtual-memory reference are not in canonical form, the processor generates a general-protection exception (#GP) or a stack fault (#SS) as appropriate.
So it seems the only valid address ranges are 0x0000_0000_0000_0000 ~ 0x0000_7FFF_FFFF_FFFF and 0xFFFF_8000_0000_0000 ~ 0xFFFF_FFFF_FFFF_FFFF, that is, the lower 128 TiB and higher 128 TiB. However, according to MSDN, the addresses used by Windows x64 kernel don't seem to be the case.
In 64-bit Windows, the theoretical amount of virtual address space is 2^64 bytes (16 exabytes), but only a small portion of the 16-exabyte range is actually used. The 8-terabyte range from 0x000'00000000 through 0x7FF'FFFFFFFF is used for user space, and portions of the 248-terabyte range from 0xFFFF0800'00000000 through 0xFFFFFFFF'FFFFFFFF are used for system space.
So, how can Windows split the virtual address space into lower 8 TiB and higher 248 TiB, despite the hardware specification? I'd like to know why it doesn't cause any problems with the hardware that checks whether the addresses are canonical.
**UPDATE: ** Seems like Microsoft fixed this discrepancy in Windows 8.1. See https://www.facebook.com/codemachineinc/posts/567137303353192 for details.
You're right; current x86-64 hardware with 48-bit virtual address support requires that the high 16 bits be the sign-extension of the low 48 (i.e. bit 47 matches bits [63:48]). That means about half of the 0xFFFF0800'00000000 to 0xFFFFFFFF'FFFFFFFF range is non-canonical on current x86-64 hardware.
Windows is just describing how it carves up the full 64-bit virtual address space, not which parts of that are actually in use on current x86-64 hardware. It can of course only use the 128 TiB that is canonical, from 0xFFFF8000'00000000 to -1. (Note the position of the 8; there's no gap between it and the high 16 bytes that are all-ones, unlike in the theoretical Windows range.)
Top-end servers can be built with 6TiB of RAM or maybe even more. (Xeon Platinum Scalable Processors are apparently available with up to 1.5TiB per socket, and up to 8-way, e.g. the 8180M).
Intel has proposed an extension for larger physical and virtual addressing that adds another level of page tables, https://software.intel.com/sites/default/files/managed/2b/80/5-level_paging_white_paper.pdf, so OSes will hopefully not be stuck without enough virtual address space to map all the RAM (like in the bad old days of PAE on 32-bit-only systems) before we have systems that have more than 128TiB of physical RAM.

Address sizes in Intel i5

My cpuinfo file says that my processor has address sizes as 39 bits physical, and 48 bits virtual. This has got me into some confusion.
Mine is a 64 bit machine. From what I understand, this is the word size of my machine. That is, it will fetch data from memory in chunks of 8 bytes. Also, a 64 bit machine means that the CPU can address 2^64 byte addressable locations, which is a lot. So, manufacturers cut-down some of these lines.
Here are the questions:
If the CPU only generates logical addresses, then what is the need for having 39 bits physical address size.
When we say that the CPU can access 2^64 bytes, do we mean Physical memory or the virtual memory?
I read somewhere that a 64 bit machine has size of its registers as 64 bits, and a 32 bit machine has 32 bit registers. Is this the case?
I think I have confused myself terribly, and need some help. Any other information on this would be appreciated. Thanks!
I can see why people are puzzled considering the number of academic questions posed on this board that suggest there is some mathematical relationship among address sizes.
The processor word size, physical address size, logical address size, and bus size are all independent to some degrees.
If the CPU only generates logical addresses, then what is the need for having 39 bits physical address size.
The CPU translates logical addresses to physical addresses.
When we say that the CPU can access 2^64 bytes, do we mean Physical memory or the virtual memory?
I could be either.
I read somewhere that a 64 bit machine has size of its registers as 64 bits, and a 32 bit machine has 32 bit registers. Is this the case?
Generally this is true for general registers but special purpose registers may be a different size (e.g., floating point, control registers)
There have been many occasions when a processor does not use all available bits for the generation of addresses.
In ancient times, the old MC68000 had 32 bit registers but only a 24 bit address bus.
For the i5 consider that a 64 bit address would control a mind boggling memory space of 17,179,869,184 gigabytes. A stupendously huge number even compared to the storage at Google or the NSA or the planet Earth.
The i5 designers, trim this insane number down to a more manageable 512 gigabytes of physical address space and 262,144 gigabytes of virtual address space.

addresability vs address space vs address bus

How do you determine addressability based on address space? How do you determine the size of the address bus based on the addressability? Ex. The addressability of a machine is 32 bits, what is the size of the address bus?
The address bus connects the CPU with the main memory. So if the address bus has 32 bits, the max size of the main memory is 2^32 bytes, i.e. 4 GB.
The address bus transfers a physical address, and thus the physical address space in this example is 4 GB.
However the CPU generates virtual addresses, and the virtual addresses are the virtual address space. The virtual addresses have to be mapped to physical addresses by a memory management unit.
In principle, one can map a small virtual address space to a large physical one (as done earlier e.g. in the PDP11 computers), but nowadays mostly a larger virtual address space is mapped to a smaller physical one, e.g. from a 64-bit CPU with a 2^64 byte virtual address space to a physical memory with a 32-bit address bus, which is thus 4 GB large.
So if you have a primitive system without memory management, and you want that all addresses that the GPU can generate are existing main memory addresses, then you address bus must have the same number of bits as the CPU uses for addressing, e.g. 32 bits.
But in a real system the virtual CPU addresses are essentially independent from the physical memory addresses.

Resources