saving general purpose registers in switch_to() in linux 2.6 - linux-kernel

I saw the code of switch_to in the article "Evolution of the x86 context switch in Linux" in the link https://www.maizure.org/projects/evolution_x86_context_switch_linux/
Most versions of switch_to only save/restore ESP/RSP and/or EBP/RBP, not other call-preserved registers in the inline asm. But the Linux 2.2.0 version does save them in this function, because it uses software context switching instead of relying on hardware TSS stuff. Later Linux versions still do software context switching, but don't have these push / pop instructions.
Are the registers are saved in other function (maybe in the schedule() function)? Or is there no need to save these registers in the kernel context?
(I know that those registers of the user context are saved in the kernel stack when the system enters kernel mode).

Linux versions before 2.2.0 use hardware task switching, where the TSS saves/restores registers for you. That's what the "ljmp %0\n\t" is doing. (ljmp is AT&T syntax for a far jmp, presumably to a task gate). I'm not really familiar with hardware TSS stuff because it's not very relevant; it's still used in modern kernels for getting RSP pointing to the kernel stack for interrupt handlers, but not for context switching between tasks.
Hardware task switching is slow, so later kernels avoid it. Linux 2.2 does save/restore the call-preserved registers manually, with push/pop before/after swapping stacks. EAX, EDX, and ECX are declared as dummy outputs ("=a" (eax), "=d" (edx), "=c" (ecx)) so the compiler knows that the old values of those registers are no longer available.
This is a sensible choice because switch_to is probably used inside a non-inline function. The caller will make a function call that eventually returns (after running another task for a while) with the call-preserved registers restored, and the call-clobbered registers clobbered, just like a regular function call. (So compiler code-gen for the function that uses the switch_to macro doesn't need to emit save/restore code outside of the inline asm). If you think about writing a whole context switch function in asm (not inline asm), you'd get this clobbering of volatile registers for free because callers expect that.
So how do later kernels avoid saving/restoring those registers in inline asm?
Linux 2.4 uses "=b" (last) as an output operand, so the compiler has to save/restore EBX in a function that uses this asm. The asm still saves/restores ESI, EDI, and EBP (as well as ESP). The text of the article notes this:
The 2.4 kernel context switch brings a few minor changes: EBX is no longer pushed/popped, but it is now included in the output of the inline assembly. We have a new input argument.
I don't see where they tell the compiler about EAX, ECX, and EDX not surviving, so that's odd. It might be a bug that they get away with by making the function noinline or something?
Linux 2.6 on i386 uses more output operands that get the compiler to handle the save/restore.
But Linux 2.6 for x86-64 introduces the trick that hands off the save/restore to the compiler easily: #define __EXTRA_CLOBBER ,"rcx","rbx","rdx","r8","r9","r10", "r11","r12","r13","r14","r15"
Notice the clobbers declaration: : "memory", "cc" __EXTRA_CLOBBER
This tells the compiler that the inline asm destroys all those registers, so the compiler will emit instructions to save/restore these registers at the start/end of whatever function switch_to ultimately inlines into.
Telling the compiler that all the registers are destroyed after a context switch solves the same problem as manually saving/restoring them with inline asm. The compiler will still make a function that obeys the calling convention.
The context-switch swaps to the new task's stack, so the compiler-generated save/restore code is always running with the appropriate stack pointer. Notice that the explicit push/pop instructions inside the inline asm int Linux 2.2 and 2.4 are before / after everything else.

Related

Why does Windows use RCX, RDX for pointers in a fresh x64 process, different from EAX, EBX in a newly created 32-bit process?

When I create a Windows x86 process in a suspended state (CREATE_SUSPENDED) its CONTEXT contains:
Virtual Address of Entry Point in Eax register;
Virtual Address of Process Environment Block structure in Ebx register.
But when I do the same for x86_64 process then CONTEXT contains:
Virtual Address of Entry Point in Rcx register (why not Rax?)
Virtual Address of PEB structure in Rdx register (why not Rbx?)
It seems logical to me to take Rax in x64 in place of Eax in x86 and Rbx in x64 in place of Ebx in x86 .
But instead of Eax→Rax and Ebx→Rbx we see Eax→Rcx and Ebx→Rdx.
Also, I see that 64-bit Cheat Engine is aware of this when opening the 32-bit process (notice the migration of the values eax↔ecx and ebx↔edx:
What was the reason to move from *ax register to *cx and from *bx to *dx in 64-bit processes?
Is it somehow connected to calling conventions?
Is it related to Windows only or do other OSes also have this kind of register repurposing?
Update:
Screenshots of just created x64 process in a suspended state:
It seems logical to me to take Rax in x64 in place of Eax in x86 and Rbx in x64 in place of Ebx in x86.
I don't see why it would be logical to assume so.
Even if, at MS, they had defined an internal ABI documenting the context of a just-created 32-bit process, the 64-bit version of would have been designed anew, so there is no reason to assume it carries anything over from the old 32-bit ABI.
If Windows uses sysret to return to user space, a process created with a suspended state may leak the target address in rcx.
Returning via other mechanisms (e.g. iret/retf), as could be the case for 32-bit code, will of course leak different data in different registers.
What you are seeing is probably an artifact of how Windows returns to user mode. I don't know exactly what the Windows kernel code to return to user mode is, but it is reasonable to assume that MS kept the same interface for 32-bit processes and that this interface was designed before sysret was widely used.
Note that at the PE entry-point rcx contains a pointer to the PEB and rdx to the entry-point (not the other way around). The former appears to be an undocumented parameter passed to the entry-point function, the latter may be just an artifact of how the entry-point is called.
In fact, a 32-bit process will find a pointer to the PEB in the stack, as the first parameter for the PE entry-point code.
Regarding other OSes, anything that is not documented to be stable is free to change at any time (including what's left in the registers). This is true in general.
As far as stability goes, passing from a 32-bit to a 64-bit implementation is a pretty big step and, again, there is no reason to keep using a very old interface (but with wider registers) instead of improving it with all the recent knowledge.
You can easily see that, for example, Linux "repurposed" the registers in the 64-bit system call ABI.

HW register value (R15) not saved during a function call to external library

My code is written in C++, and compiled with gcc version 4.7.2.
It's linked with 3rd party library, which is written in C, and compiled with gcc 4.5.2.
My code calls a function initStuff(). During the debug I found out that the value of R15 register before the call to initStuff() is not the same as the value upon return from that function.
As a quick hack I did:
asm(" mov %%r15, %0" : "=r" ( saveR15 ) );
initStuff();
asm(" mov %0, %%r15;" : : "r" (saveR15) );
which seems to work for now.
Who is to blame here? How can I find if it's a compiler issue, or maybe compatibility issue?
gcc on x86-64 follows the System V ABI, which defines r15 as a callee-saved register; any function which uses this register is supposed to save and restore it.
So if this third-party function is not doing so, it is failing to conform to the ABI, and unless this is documented, it is to blame. AFAIK this part of the ABI has been stable forever, so if compiler-generated code (with default options) is failing to save and restore r15, that would be a compiler bug. More likely some part of the third-party code uses assembly language and is buggy, or conceivably it was built with non-standard compiler options.
You can either dig into it, or as a workaround, write a wrapper around it that saves and restores r15. Your current workaround is not really safe, since the compiler might reorder your asm statements with respect to surrounding code. You should instead put the call to initStuff inside a single asm block with the save-and-restore (declaring it as clobbering all caller-saved registers), or write a "naked" assembly wrapper which does the save/restore and call, and call it instead. (Make sure to preserve stack alignment.)

Changing calling convention in gcc/g++ abi

How could I enforce gcc/g++ to not use registers but only stack in x86_64
to pass arguments to functions,
like it was in 32-bit version
(and possibly take the function result this way).
I know it breaks official ABI and both the caller side and the called side
must be compiled this way so it works. I don't care if push/pop or mov/sub way
is used. I expect there should be a flag to compiler that could enforce it
but I couldn't find it.
It seems you can't without hacking GCC's source code.
There is no standard x86-64 calling convention that uses inefficient stack args.
GCC only knows how to use standard calling conventions, in this case x86-64 SysV and MS Windows fastcall and vectorcall. (e.g. __attribute__((ms_abi)) or vectorcall). Normally nobody wants this; MS's calling convention is already friendly enough for wrappers or variadic functions. You can use that for some functions (controlled by __attribute__) even when compiling for Linux, MacOS, *BSD, etc., if that helps. Hard to imagine a use-case for pure stack args.
GCC lets you specify a register as fixed (never touched by GCC, like -ffixed-rdi), call-clobbered, or call-preserved. But using those with arg-passing registers just creates wrong code, not what you want.
e.g.
int foo(int a, int b, int c);
void caller(int x) {
foo(1,2,3);
//foo(4,x,6);
}
compiled by gcc9.2 -O3 -fcall-saved-rdi
caller:
push rdi
mov edx, 3
mov esi, 2
pop rdi
jmp foo
It saves/restores RDI but doesn't put a 1 in it before calling foo.
And it doesn't leave RDI out of the arg-passing sequence and bump other args later. (I was thinking you might be able to invent a calling convention where all the arg-passing registers were fixed or call-saved, maybe getting GCC to fall back to stack args. But nope.)

Can I use a register as a loop counter?

Since the calling convention of a function states which registers are preserved, can a register be used as a loop counter?
I first thought that the ecx register is used as a loop counter, but after finding out that an stdcall function I have used has not preserved the value of ecx, I thought otherwise.
Is there a register that is guaranteed (by mostly used calling conventions at least) to be preserved?
Note: I don't have a problem in using a stack variable as a loop counter, I just want to make sure that it is the only way.
You can use any general-purpose register, and occasionally others, as the loop counter (just not the stack pointer of course ☺).
Either you use one to loop manually, i.e. replace…
loop label
… with…
dec ebp
jnz label
… which is faster anyway (because AMD (and later Intel, when they caught up, MHz-wise) artificially slowed down the loop instruction as otherwise, Windows® and some Turbo Pascal compiled software crashed).
Or you just save the counter in between:
label:
push ecx
call func
pop ecx
loop label
Both are standard strategies.
Is there a register that is guaranteed (by mostly used calling conventions at least) to be preserved?
You can choose any free register in your own code if your loop code will not call any external entity.
If your loop code will call an external entity where the only guaranteed contract is the ABI and calling convention then you must save/restore your registers and make the register choice case-by-case.
Quoting Agner Fog's excellent paper Calling conventions for different C++ compilers and operating systems:
6 Register usage
The rules for register usage depend on the operating system, as shown in table 4. Scratch registers are registers that can be used for temporary storage without restrictions (also called caller-save or volatile registers). Callee-save registers are registers that you have to save before using them and restore after using them (also called non-volatile registers). You can rely on these registers having the same value after a call as before the call...
...
See also:
Wikipedia: x86 calling conventions

inline assembly error: can't find a register in class 'GENERAL_REGS' while reloading 'asm'

I have an inline AT&T style assembly block, which works with XMM registers and there are no problems in Release configuration of my XCode project, however I've stumbled upon this strange error (which is supposedly a GCC bug) in Debug configuration... Can I fix it somehow? There is nothing special in assembly code, but I am using a lot of memory constraints (12 constraints), can this cause this problem?
Not a complete answer, sorry, but the comments section is too short for this ...
Can you post a sample asm("..." :::) line that demonstrates the problem ?
The use of XMM registers is not the issue, the error message indicates that GCC wanted to create code like, say:
movdqa (%rax),%xmm0
i.e. memory loads/stores through pointers held in general registers, and you specified more memory locations than available general-purpose regs (it's probably 12 in debug mode because because RBP, RSP are used for frame/stackpointer and likely RBX for the global offset table and RAX reserved for returns) without realizing register re-use potential.
You might be able to eek things out by doing something like:
void *all_mem_args_tbl[16] = { memarg1, memarg2, ... };
void *trashme;
asm ("movq (%0), %1\n\t"
"movdqa (%1), %xmm0\n\t"
"movq 8(%0), %1\n\t"
"movdqa (%1), %xmm1\n\t"
...
: "r"all_mem_args_tbl : "r"(trashme) : ...);
i.e. put all the mem locations into a table that you pass as operand, and then manage the actual general-purpose register use on your own. It might be two pointer accesses through the indirection table, but whether that makes a difference is hard to say without knowing your complete assembler code piece.
The Debug configuration uses -O0 by default. Since this flag disables optimisations, the compiler is probably not being able to allocate registers given the constraints specified by your inline assembly code, resulting in register starvation.
One solution is to specify a different optimisation level, e.g. -Os, which is the one used by default in the Release configuration.

Resources