What does the D flag in the code segment descriptor do for x86-64 instructions? - windows

I'm trying to understand the workings of the D flag in the code segment descriptor when used in the x86-64 code. It's set in the D/B bit 22 of the code segment descriptor as shown on this diagram:
The Intel documentation (from section 3.4.5 Segment Descriptors) states the following:
D/B (default operation size/default stack pointer size and/or upper
bound) flag
Performs different functions depending on whether the segment
descriptor is an executable code segment, an expand-down data segment,
or a stack segment. (This flag should always be set to 1 for 32-bit
code and data segments and to 0 for 16-bit code and data segments.)
• Executable code segment. The flag is called the D flag and it
indicates the default length for effective addresses and operands
referenced by instructions in the segment. If the flag is set, 32-bit
addresses and 32-bit or 8-bit operands are assumed; if it is clear,
16-bit addresses and 16-bit or 8-bit operands are assumed. The
instruction prefix 66H can be used to select an operand size other
than the default, and the prefix 67H can be used select an address
size other than the default.
So I'm trying to understand which x86-64 instructions does it affect and how?
PS. When I try to run some tests (in Windows kernel) by setting that bit on, the OS immediately triple faults.

If L (long mode) is set for a code segment descriptor, D must be clear. The L=1 / D=1 combination is currently meaningless / reserved. Intel documents this nearby in the same document you were looking at.
If L is clear, then D selects between 16 and 32-bit mode. (i.e. the default operand / address size). And yes, 16-bit protected mode exists, but no, nobody uses it.
There are only 3 possibilities for default address/operand-size:
16-bit modes (real, vm86, protected): default address and operand-size = 16-bit
32-bit protected mode: default address and operand-size = 32-bit
64-bit mode: default address size = 64-bit, default operand-size = 32-bit
There's no option to have 16x 64-bit registers but a default operand size of 16-bit or 64-bit. Or a default address size of 32-bit overrideable to 64.

Related

How to detect architecture of an LE (linear executable) file?

What instruction set architecture does an LE (linear executable) file have? The linked article says: mixed 16/32 bit.
Does it mean that the same LE file can contain 16-bit and 32-bit code?
How do I detect whether it contains 16-bit (8086) code?
How do I detect whether it contains 32-bit (i386) code?
Please note that I'm aware of the CPU type field (see here) which can distinguish between 80286 and i386 (80386). However, I interpret this as a CPU type requirement, so this doesn't specify the architecture, e.g. the hex 40 is valid in both: it means inc ax in 16-bit code and it means inc eax in 32-bit code, and both can be executed by a 80386 CPU. I'm interested in what hex 40 means in the code of an LE file.
I was able to find https://www.program-transformation.org/Transform/PcExeFormat , based on which the answer is the following.
If object flag bit 13 in the object table entry is set, then it's 32-bit, otherwise it's 16-bit.
In the executable entry table, LE flag bit 1 of LX bundle type byte distinguishes between 16-bit and 32-bit.
Since there can be multiple entries in these tables, a single LE or LX file may contain both 16-bit and 32-bit code.

Does the CS and DS registers still affect instructions in x64 intel

Afaik they are never used and CS=DS=SS nowadays. However if I were to set these values, would anything change or does the processor ignore them. Ive found really conflicting information on the question and I don't understand why they would still be there if they are ignored. Help pls
Yes, the segment registers do still affect code execution.
The question and some of the comments don't seem to distinguish between the selector value and the base address. To clearly understand some of the apparently conflicting information you're reading on this topic, you need to make sure you recognize which one is being discussed.
The CS selector cannot be 0. It must refer to a valid code segment descriptor in the GDT or LDT. The L bit of the code segment descriptor controls whether the current process is 64-bit mode or 32-bit compatibility mode.
CS (the selector) cannot be equal to DS and SS. CS must refer to a code segment, whereas DS and SS must refer to data segments (possibly the same one). The DS and SS selectors are allowed to be 0 (which would cause a GP fault in 32-bit mode).
The main aspect of segment registers that doesn't still have an effect is the base address and segment limit; the base address of CS, DS, ES, and SS are all treated as if they are 0, and there are no segment limit checks in 64-bit code.
This is the reason you see people saying that they are ignored.
As Margaret mentioned, the current privilege level (CPL) is in the low 2 bits of the CS and SS selector registers and also in the DPL bits of the descriptors in the GDT. These bits should be either 0 or 3, since no current operating systems use rings 1 and 2, as far as I know.
One other minor point is that certain faults caused by memory accesses are reported as stack faults instead of GP faults, if the memory access is performed using the SS segment (because RBP or RSP is used as a base register in the instruction operand).

How can Windows split its virtual memory space asymmetrically?

According to the AMD64 Architecture Programmer's Manual Volume 2 (system programming), a logical address is valid only if the bits 48-63 are all the same as bit 47:
5.3.1 Canonical Address Form
The AMD64 architecture requires implementations supporting fewer than the full 64-bit virtual address to ensure that those addresses are in canonical form. An address is in canonical form if the address bits from the most-significant implemented bit up to bit 63 are all ones or all zeros. If the addresses of all bytes in a virtual-memory reference are not in canonical form, the processor generates a general-protection exception (#GP) or a stack fault (#SS) as appropriate.
So it seems the only valid address ranges are 0x0000_0000_0000_0000 ~ 0x0000_7FFF_FFFF_FFFF and 0xFFFF_8000_0000_0000 ~ 0xFFFF_FFFF_FFFF_FFFF, that is, the lower 128 TiB and higher 128 TiB. However, according to MSDN, the addresses used by Windows x64 kernel don't seem to be the case.
In 64-bit Windows, the theoretical amount of virtual address space is 2^64 bytes (16 exabytes), but only a small portion of the 16-exabyte range is actually used. The 8-terabyte range from 0x000'00000000 through 0x7FF'FFFFFFFF is used for user space, and portions of the 248-terabyte range from 0xFFFF0800'00000000 through 0xFFFFFFFF'FFFFFFFF are used for system space.
So, how can Windows split the virtual address space into lower 8 TiB and higher 248 TiB, despite the hardware specification? I'd like to know why it doesn't cause any problems with the hardware that checks whether the addresses are canonical.
**UPDATE: ** Seems like Microsoft fixed this discrepancy in Windows 8.1. See https://www.facebook.com/codemachineinc/posts/567137303353192 for details.
You're right; current x86-64 hardware with 48-bit virtual address support requires that the high 16 bits be the sign-extension of the low 48 (i.e. bit 47 matches bits [63:48]). That means about half of the 0xFFFF0800'00000000 to 0xFFFFFFFF'FFFFFFFF range is non-canonical on current x86-64 hardware.
Windows is just describing how it carves up the full 64-bit virtual address space, not which parts of that are actually in use on current x86-64 hardware. It can of course only use the 128 TiB that is canonical, from 0xFFFF8000'00000000 to -1. (Note the position of the 8; there's no gap between it and the high 16 bytes that are all-ones, unlike in the theoretical Windows range.)
Top-end servers can be built with 6TiB of RAM or maybe even more. (Xeon Platinum Scalable Processors are apparently available with up to 1.5TiB per socket, and up to 8-way, e.g. the 8180M).
Intel has proposed an extension for larger physical and virtual addressing that adds another level of page tables, https://software.intel.com/sites/default/files/managed/2b/80/5-level_paging_white_paper.pdf, so OSes will hopefully not be stuck without enough virtual address space to map all the RAM (like in the bad old days of PAE on 32-bit-only systems) before we have systems that have more than 128TiB of physical RAM.

When are logical addresses created?

Always I refer to x86 (Linux)
Are logical addresses created during the generation of a binary?
If yes, are their inside the binary?
Thanks
The LINKER defines the initial layout of the processes user address space. The linker then defines the range of logical addresses and their page attributes (read or read/write, execute or no execute).
The user area of the logical address space gets set up by the program loader when the executable is run.
The answer to your question
Are logical addresses created during the generation of a binary?
then depends upon you mean "created" to be when the logical address space is defined (linker) or whether you mean when it is set up (program loader).
In x86, a logical address (also called a far pointer) consists of a 16-bit segment selector and a 16/32/64-bit offset (also called a near pointer). The size of the offset depends on the operating mode, the code segment descriptor, and the address size prefix. Then the segment selector is used to obtain the segment base address (or it's obtained from the segment descriptor cache except when operating in 64-bit mode in which the base address is considered to be zero for all segments except for FS and GS) to be added to the offset to form a virtual address. The x86 ISA offers no way to completely skip that process. So any x86 instruction must specify the two parts that constitute the logical address separately (implicitly or explicitly).
Are logical addresses created during the generation of a binary?
An x86 binary contains x86 instructions. Each instruction specifies which segment register to use and how to calculate the offset (using stuff like base, index, scale, and displacement). At run-time, when an instruction is being executed, the offset is calculated and the segment selector value is determined. So, technically, x86 instructions only tell the CPU where to get the segment selector from and how to calculate the offset, but it is the CPU that generates the logical address. Generally, the compiler and the OS determine the values of offsets, but only the OS controls the values of the segment selectors.
If yes, are their inside the binary?
x86 instructions may specify the offset as an immediate value (constant). The segment part can be either specified as an immediate value (far call or var jump), fetched from a segment register, or fetched from memory (far return). So the value of the offset might be in the binary encoded with the instruction that uses it, but the value of the segment selector might not.

Is x86 32-bit assembly code valid x86 64-bit assembly code?

Is all x86 32-bit assembly code valid x86 64-bit assembly code?
I've wondered whether 32-bit assembly code is a subset of 64-bit assembly code, i.e., every 32-bit assembly code can be run in a 64-bit environment?
I guess the answer is yes, because 64-bit Windows is capable of executing 32-bit programs, but then I've seen that the 64-bit processor supports a 32-bit compatible mode?
If not, please provide a small example of 32-bit assembly code that isn't valid 64-bit assembly code and explain how the 64-bit processor executes the 32-bit assembly code.
A modern x86 CPU has three main operation modes (this description is simplified):
In real mode, the CPU executes 16 bit code with paging and segmentation disabled. Memory addresses in your code refer to phyiscal addresses, the content of segment registers is shifted and added to the address to form an effective address.
In protected mode, the CPU executes 16 bit or 32 bit code depending on the segment selector in the CS (code segment) register. Segmentation is enabled, paging can (and usually is) enabled. Programs can switch between 16 bit and 32 bit code by far jumping to an appropriate segment. The CPU can enter the submode virtual 8086 mode to emulate real mode for individual processes from inside a protected mode operating system.
In long mode, the CPU executes 64 bit code. Segmentation is mostly disabled, paging is enabled. The CPU can enter the sub-mode compatibility mode to execute 16 bit and 32 bit protected mode code from within an operating system written for long mode. Compatibility mode is entered by far-jumping to a CS selector with the appropriate bits set. Virtual 8086 mode is unavailable.
Wikipedia has a nice table of x86-64 operating modes including legacy and real modes, and all 3 sub-modes of long mode. Under a mainstream x86-64 OS, after booting the CPU cores will always all be in long mode, switching between different sub-modes depending on 32 or 64-bit user-space. (Not counting System Management Mode interrupts...)
Now what is the difference between 16 bit, 32 bit, and 64 bit mode?
16-bit and 32-bit mode are basically the same thing except for the following differences:
In 16 bit mode, the default address and operand width is 16 bit. You can change these to 32 bit for a single instruction using the 0x67 and 0x66 prefixes, respectively. In 32 bit mode, it's the other way round.
In 16 bit mode, the instruction pointer is truncated to 16 bit, jumping to addresses higher than 65536 can lead to weird results.
VEX/EVEX encoded instructions (including those of the AVX, AVX2, BMI, BMI2 and AVX512 instruction sets) aren't decoded in real or Virtual 8086 mode (though they are available in 16 bit protected mode).
16 bit mode has fewer addressing modes than 32 bit mode, though it is possible to override to a 32 bit addressing mode on a per-instruction basis if the need arises.
Now, 64 bit mode is a somewhat different. Most instructions behave just like in 32 bit mode with the following differences:
There are eight additional registers named r8, r9, ..., r15. Each register can be used as a byte, word, dword, or qword register. The family of REX prefixes (0x40 to 0x4f) encode whether an operand refers to an old or new register. Eight additional SSE/AVX registers xmm8, xmm9, ..., xmm15 are also available.
you can only push/pop 64 bit and 16 bit quantities (though you shouldn't do the latter), 32 bit quantities cannot be pushed/popped.
The single-byte inc reg and dec reg instructions are unavailable, their instruction space has been repurposed for the REX prefixes. Two-byte inc r/m and dec r/m is still available, so inc reg and dec reg can still be encoded.
A new instruction-pointer relative addressing mode exists, using the shorter of the 2 redundant ways 32-bit mode had to encode a [disp32] absolute address.
The default address width is 64 bit, a 32 bit address width can be selected through the 0x67 prefix. 16 bit addressing is unavailable.
The default operand width is 32 bit. A width of 16 bit can be selected through the 0x66 prefix, a 64 bit width can be selected through an appropriate REX prefix independently of which registers you use.
It is not possible to use ah, bh, ch, and dh in an instruction that requires a REX prefix. A REX prefix causes those register numbers to mean instead the low 8 bits of registers si, di, sp, and bp.
writing to the low 32 bits of a 64 bit register clears the upper 32 bit, avoiding false dependencies for out-of-order exec. (Writing 8 or 16-bit partial registers still merges with the 64-bit old value.)
as segmentation is nonfunctional, segment overrides are meaningless no-ops except for the fs and gs overrides (0x64, 0x65) which serve to support thread-local storage (TLS).
also, many instructions that specifically deal with segmentation are unavailable. These are: push/pop seg (except push/pop fs/gs), arpl, call far (only the 0xff encoding is valid), les, lds, jmp far (only the 0xff encoding is valid),
instructions that deal with decimal arithmetic are unavailable, these are: daa, das, aaa, aas, aam, aad,
additionally, the following instructions are unavailable: bound (rarely used), pusha/popa (not useful with the additional registers), salc (undocumented),
the 0x82 instruction alias for 0x80 is invalid.
on early amd64 CPUs, lahf and sahf are unavailable.
And that's basically all of it!
No, it isn't.
While there is a large amount of overlap, 64-bit assembly code is not a superset of 32-bit assembly code and so 32-bit assembly is not in general valid in 64-bit mode.
This applies both the mnemonic assembly source (which is assembled into binary format by an assembler), as well as the binary machine code format itself.
This question covers in some detail instructions that were removed, but there are also many encoding forms whose meanings were changed.
For example, Jester in the comments gives the example of push eax not being valid in 64-bit code. Based on this reference you can see that the 32-bit push is marked N.E. meaning not encodable. In 64-bit mode, the encoding is used to represent push rax (an 8-byte push) instead. So the same sequence of bytes has a different meaning in 32-bit mode versus 64-bit mode.
In general, you can browse the list of instructions on that site and find many which are listed as invalid or not encodable in 64-bit.
If not, please provide a small example of 32-bit assembly code that
isn't valid 64-bit assembly code and explain how the 64-bit processor
executes the 32-bit assembly code.
As above, push eax is one such example. I think what is missing is that 64-bit CPUs support directly running 32-bit binaries. They don't do it via compatibility between 32-bit and 64-bit instructions at the machine language level, but simply by having a 32-bit mode where the decoders (in particular) interpret the instruction stream as 32-bit x86 rather than x86-64, as well as the so-called long mode for running 64-bit instructions. When such 64-bit chips were first released, it was common to run a 32-bit operating system, which pretty much means the chip is permanently in this mode (never goes into 64-bit mode).
More recently, it is typical to run a 64-bit operating system, which is aware of the modes, and which will put the CPU into 32-bit mode when the user launches a 32-bit process (which are still very common: until very recently my browser was still 32-bit).
All the details and proper terminology for the modes can be found in fuz's answer, which is really the one you should read.

Resources