When truncating a 64 bit address to a 32 bit address in windows, why do we need to guarantee that the high 33 bits are 0, and not the high 32 bits? - windows

I've been reading Windows via C/C++ by Jeffrey Richter and came across the following snippet in the chapter about Windows' memory architecture related to porting 32 bit applications to a 64 bit environment.
If the system could somehow guarantee that no memory allocations would every be made above 0x00000000'7FFFFFFF, the application would work fine. Truncating a 64 bit address to a 32 bit address when the high 33 bits are 0 causes no problem whatsoever.
I'm having some trouble understanding why the system needs to guarantee that no memory allocations are made above 0x00000000'7FFFFFFF and not 0x00000000'FFFFFFFF. Shouldn't it be okay to truncate the address so long as the high 32 bits are 0? I'm probably missing something and would really appreciate it if someone with more knowledge about windows than me could explain why this is the case.

Not all 32bit systems/languages use unsigned values for memory addresses, so the 32th bit might have different meaning in some contexts. By limiting the address space to 31 bits, you don't run into that problem. And also, Windows limits a 32bit app from accessing addresses higher than 2 GB without the use of special extensions to extend that, so most apps would not need the 32th bit anyway.

Related

How can Windows split its virtual memory space asymmetrically?

According to the AMD64 Architecture Programmer's Manual Volume 2 (system programming), a logical address is valid only if the bits 48-63 are all the same as bit 47:
5.3.1 Canonical Address Form
The AMD64 architecture requires implementations supporting fewer than the full 64-bit virtual address to ensure that those addresses are in canonical form. An address is in canonical form if the address bits from the most-significant implemented bit up to bit 63 are all ones or all zeros. If the addresses of all bytes in a virtual-memory reference are not in canonical form, the processor generates a general-protection exception (#GP) or a stack fault (#SS) as appropriate.
So it seems the only valid address ranges are 0x0000_0000_0000_0000 ~ 0x0000_7FFF_FFFF_FFFF and 0xFFFF_8000_0000_0000 ~ 0xFFFF_FFFF_FFFF_FFFF, that is, the lower 128 TiB and higher 128 TiB. However, according to MSDN, the addresses used by Windows x64 kernel don't seem to be the case.
In 64-bit Windows, the theoretical amount of virtual address space is 2^64 bytes (16 exabytes), but only a small portion of the 16-exabyte range is actually used. The 8-terabyte range from 0x000'00000000 through 0x7FF'FFFFFFFF is used for user space, and portions of the 248-terabyte range from 0xFFFF0800'00000000 through 0xFFFFFFFF'FFFFFFFF are used for system space.
So, how can Windows split the virtual address space into lower 8 TiB and higher 248 TiB, despite the hardware specification? I'd like to know why it doesn't cause any problems with the hardware that checks whether the addresses are canonical.
**UPDATE: ** Seems like Microsoft fixed this discrepancy in Windows 8.1. See https://www.facebook.com/codemachineinc/posts/567137303353192 for details.
You're right; current x86-64 hardware with 48-bit virtual address support requires that the high 16 bits be the sign-extension of the low 48 (i.e. bit 47 matches bits [63:48]). That means about half of the 0xFFFF0800'00000000 to 0xFFFFFFFF'FFFFFFFF range is non-canonical on current x86-64 hardware.
Windows is just describing how it carves up the full 64-bit virtual address space, not which parts of that are actually in use on current x86-64 hardware. It can of course only use the 128 TiB that is canonical, from 0xFFFF8000'00000000 to -1. (Note the position of the 8; there's no gap between it and the high 16 bytes that are all-ones, unlike in the theoretical Windows range.)
Top-end servers can be built with 6TiB of RAM or maybe even more. (Xeon Platinum Scalable Processors are apparently available with up to 1.5TiB per socket, and up to 8-way, e.g. the 8180M).
Intel has proposed an extension for larger physical and virtual addressing that adds another level of page tables, https://software.intel.com/sites/default/files/managed/2b/80/5-level_paging_white_paper.pdf, so OSes will hopefully not be stuck without enough virtual address space to map all the RAM (like in the bad old days of PAE on 32-bit-only systems) before we have systems that have more than 128TiB of physical RAM.

Address sizes in Intel i5

My cpuinfo file says that my processor has address sizes as 39 bits physical, and 48 bits virtual. This has got me into some confusion.
Mine is a 64 bit machine. From what I understand, this is the word size of my machine. That is, it will fetch data from memory in chunks of 8 bytes. Also, a 64 bit machine means that the CPU can address 2^64 byte addressable locations, which is a lot. So, manufacturers cut-down some of these lines.
Here are the questions:
If the CPU only generates logical addresses, then what is the need for having 39 bits physical address size.
When we say that the CPU can access 2^64 bytes, do we mean Physical memory or the virtual memory?
I read somewhere that a 64 bit machine has size of its registers as 64 bits, and a 32 bit machine has 32 bit registers. Is this the case?
I think I have confused myself terribly, and need some help. Any other information on this would be appreciated. Thanks!
I can see why people are puzzled considering the number of academic questions posed on this board that suggest there is some mathematical relationship among address sizes.
The processor word size, physical address size, logical address size, and bus size are all independent to some degrees.
If the CPU only generates logical addresses, then what is the need for having 39 bits physical address size.
The CPU translates logical addresses to physical addresses.
When we say that the CPU can access 2^64 bytes, do we mean Physical memory or the virtual memory?
I could be either.
I read somewhere that a 64 bit machine has size of its registers as 64 bits, and a 32 bit machine has 32 bit registers. Is this the case?
Generally this is true for general registers but special purpose registers may be a different size (e.g., floating point, control registers)
There have been many occasions when a processor does not use all available bits for the generation of addresses.
In ancient times, the old MC68000 had 32 bit registers but only a 24 bit address bus.
For the i5 consider that a 64 bit address would control a mind boggling memory space of 17,179,869,184 gigabytes. A stupendously huge number even compared to the storage at Google or the NSA or the planet Earth.
The i5 designers, trim this insane number down to a more manageable 512 gigabytes of physical address space and 262,144 gigabytes of virtual address space.

Memory, Stack and 64 bit

On a x86 system a memory location can hold 4 bytes (32 / 8) of data, therefore a single memory address in a 64 bit system can hold 8 bytes per memory address. When examining the stack in GDB though this doesn't appear to be the case, example:
0x7fff5fbffa20: 0x00007fff5fbffa48 0x0000000000000000
0x7fff5fbffa30: 0x00007fff5fbffa48 0x00007fff857917e1
If I have this right then each hexadecimal pair (48) is a byte, thus the first memory address
0x7fff5fbffa20: is actually holding 16 bytes of data and not 8.
This has had me really confused and has for a while, so absolutely any input is vastly appreciated.
Short answer: on both x86 and x64 the minimum addressable entity is a byte: each "memory location" contains one byte, in each case. What you are seeing from GDB is only formatting: it is dumping 16 contiguous bytes, as the address increasing from ....20 to ....30, (on the left) indicates.
Long answer: 32bit or 64bit is used to indicate many things, in an architecture: almost always, is the addressable size (how many bits are in an address = how much memory you can directly address - again, bytes of memory). It also usually indicates the dimension of registers, and also (but not always) the native word size.
That means that usually, even if you can address a single byte, the machine works "better" using data of different (longer) size. What "better" means is beyond the question; a little background, however, is good to understand some misconceptions about word size in the question.

8 and 16 bit architecture

I'm a bit confused about bit architectures. I just cant find a good article that answers my questions, so I figured I'd ask SO.
Question 1:
When speaking of a 16 bit architecture, does it mean each ram address is 16 bits long? So if I create an int (32 bit) in C++ the variable would take up 2 addresses?
Question 2:
in a 16 bit architecture there are only 2^16 (65536) amount of addresses inside the RAM. Why can't they add more? Is this because 16 bit can't represent a higher value and therefore can't reference to adresses above 65535?
When speaking of a 16 bit architecture, does it mean each ram address is 16 bits long? So if I create an int (32 bit) in C++ the variable would take up 2 addresses?
You'd have to ask whoever was speaking of a 16-bit architecture what they meant by it. They could mean addresses are 16-bits long. They could mean general-purpose CPU registers are 16-bits long. They could mean something else. But there's no way we could know what some hypothetical person might mean. There is no universal definition of what makes something a "16-bit architecture".
For example, the 8032 is an 8-bit architecture with 8-bit general purpose registers. But it has a 16-bit pointer register that can be used to address 65,536 bytes of storage.
Regardless of bitness, almost all systems use byte addresses. So a 32-bit variable will take up 4 addresses on a machine of any bitness.
in a 16 bit architecture there are only 2^16 (65536) amount of addresses inside the RAM. Why can't they add more? Is this because 16 bit can't represent a higher value and therefore can't reference to adresses above 65535?
With 16-bits, there are only 65,536 possible ways those bits can be set. So a 16-bit register has 65,536 possible values.
Yes. Note, though that int on 16-bit architectures is usually just 16 bits wide.
Also note that it doesn't make sense to say that a variable "takes up" two addresses. The correct thing to say is that a 32-bit variable is as wide as two pointers on a 16-bit platform.
It will still occupy four bytes of space, no matter what architecture.
Yes; that's exactly what 16-bit addresses mean.
Note that each of these addresses points to a single byte of memory.
Depends on your definitions of 8-bit and 16-bit architecture.
The 6502 was considered an 8-bit CPU, because it operated on 8-bit values (the register size), yet had 16-bit addresses.
The 68000 was considered a 16-bit CPU, yet had 32-bit registers and addresses.
With x86, it is generally the address size that defines the architecture.
Also, '64-bit' CPUs don't always have a full 64-bit external address bus. They might internally handle addresses of that size, so the virtual address space can be large, but it doesn't mean they can have that much external memory.
Example From Wikipedia - All internal registers, as well as internal and external data buses, were 16 bits wide, firmly establishing the "16-bit microprocessor" identity of the 8086. A 20-bit external address bus gave a 1 MB physical address space (2^20 = 1,048,576). This address space was addressed by means of internal 'segmentation'. The data bus was multiplexed with the address bus in order to fit a standard 40-pin dual in-line package. 16-bit I/O addresses meant 64 KB of separate I/O space (2^16 = 65,536). The maximum linear address space was limited to 64 KB, simply because internal registers were only 16 bits wide. Programming over 64 KB boundaries involved adjusting segment registers (see below) and remained so until the 80386 introduced wider (32 bits) registers (and more advanced memory management hardware).
So you can see that there are no fixed rules that a 16 bit architecture will have 16 address lines only. Don't mix up two things, though it's intuitive to believe so.

Why does the 8086 use an extra register to address 1MB of memory?

I heard that the 8086 has 16-bit registers which allow it to only address 64K of memory. Yet it is still able to address 1MB of memory which would require 20-bit registers. It does this by using another register to to hold another 16 bits, and then adds the value in the 16-bit registers to the value in this other register to be able to generate numbers which can address up to 1MB of memory. Is that right?
Why is it done this way? It seems that there are 32-bit registers, which is more than sufficient to address 1MB of memory.
Actually this has nothing to do with number of registers. Its the size of the register which matters. A 16 bit register can hold up to 2^16 values so it can address 64K bytes of memory.
To address 1M, you need 20 bits (2^20 = 1M), so you need to use another register for the the additional 4 bits.
The segment registers in an 8086 are also sixteen bits wide. However, the segment number is shifted left by four bits before being added to the base address. This gives you the 20 bits.
the 8088 (and by extension, 8086) is instruction compatible with its ancestor, the 8008, including the way it uses its registers and handles memory addressing. the 8008 was a purely 16 bit architecture, which really couldn't address more than 64K of ram. At the time the 8008 was created, that was adequate for most of its intended uses, but by the time the 8088 was being designed, it was clear that more was needed.
Instead of making a new way for addressing more ram, intel chose to keep the 8088 as similar as possible to the 8008, and that included using 16 bit addressing. To allow newer programs to take advantage of more ram, intel devised a scheme using some additional registers that were not present on the 8008 that would be combined with the normal registers. these "segment" registers would not affect programs that were targeted at the 8008; they just wouldn't use those extra registers, and would only 'see' 16 addres bits, the 64k of ram. Applications targeting the newer 8088 could instead 'see' 20 address bits, which gave them access to 1MB of ram
I heard that the 8086 has 16 registers which allow it to only address 64K of memory. Yet it is still able to address 1MB of memory which would require 20 registers.
You're misunderstanding the number of registers and the registers' width. 8086 has eight 16-bit "general purpose" registers (that can be used for addressing) along with four segment registers. 16-bit addressing means that it can only support 216 B = 64 KB of memory. By getting 4 more bits from the segment registers we'll have 20 bits that can be used to address a total of 24*64KB = 1MB of memory
Why is it done this way? It seems that there are 32 registers, which is more than sufficient to address 1MB of memory.
As said, the 8086 doesn't have 32 registers. Even x86-64 nowadays don't have 32 general purpose registers. And the number of registers isn't relevant to how much memory a machine can address. Only the address bus width determines the amount of addressable memory
At the time of 8086, memory is extremely expensive and 640 KB is an enormous amount that people didn't think that would be reached in the near future. Even with a lot of money one may not be able to get that large amount of RAM. So there's no need to use the full 32-bit address
Besides, it's not easy to produce a 32-bit CPU with the contemporary technology. Even 64-bit CPUs today aren't designed to use all 64-bit address lines
Why can't OS use entire 64-bits for addressing? Why only the 48-bits?
Why do x86-64 systems have only a 48 bit virtual address space?
It'll takes more wires, registers, silicons... and much more human effort to design, debug... a CPU with wider address space. With the limited transistor size of the technology in the 70s-80s that may not even come into reality.
8086 doesn't have any 32-bit integer registers; that came years later in 386 which had a much higher transistor budget.
8086's segmentation design made sense for a 16-bit-only CPU that wanted to be able to use 20-bit linear addresses.
Segment registers could have been only 8-bit or something with a larger shift, but apparently there are some advantages to fine-grained segmentation where a segment start address can be any 16-byte aligned linear address. (A linear address is computed from (seg << 4) + off.)

Resources