Hooking Windows Kernel Dispatcher for System Calls - windows

I'm trying to hook SYSENTER dispatch function from the kernel and during the past few days I was studying about what happens when a program executes SYSENTER and wants to enter to kernel then I realized IA32_SYSENTER_EIP and IA32_SYSENTER_ESP are responsible to set the kernel RIP and RSP after SYSENTER.
Yesterday I read Intel Software Developer Manuals about SWAPGS :
SWAPGS exchanges the current GS base register value with the value contained in MSR address C0000102H (IA32_KERNEL_GS_BASE). The SWAPGS instruction is a privileged instruction intended for use by system software.
When using SYSCALL to implement system calls, there is no kernel stack
at the OS entry point. Neither is there a straightforward method to
obtain a pointer to kernel structures from which the kernel stack
pointer could be read. Thus, the kernel cannot save general purpose
registers or reference memory.
From the second paragraph, there is no kernel stack at the OS entry point seems that OS kernel executes SWAPGS to set the GS and then get the kernel stack pointer but as I read, in a SYSENTER kernel RIP(EIP) and RSP (ESP) should set from IA32_SYSENTER_EIP and IA32_SYSENTER_ESP so the kernel has its stack pointer in IA32_SYSENTER_ESP !
My Questions :
If kernel stack address should come from GS then what's the purpose of IA32_SYSENTER_ESP?
What are differences between AMD LSTAR (0xC0000082) and IA32_SYSENTER_EIP? I ask it because I saw Windows set 0xc0000082 on my Intel processor.
Is there any special problem with hooking kernels SYSENTER dispatcher?It's because whenever I put a breakpoint in Windows function which is responsible for dispatching SYSENTER calls (KiSystemCall64Shadow) on a remote debugging machine (Not VM) then it causes BSOD with UNEXPECTED_KERNEL_MODE_TRAP.

Related

Why does Windows use RCX, RDX for pointers in a fresh x64 process, different from EAX, EBX in a newly created 32-bit process?

When I create a Windows x86 process in a suspended state (CREATE_SUSPENDED) its CONTEXT contains:
Virtual Address of Entry Point in Eax register;
Virtual Address of Process Environment Block structure in Ebx register.
But when I do the same for x86_64 process then CONTEXT contains:
Virtual Address of Entry Point in Rcx register (why not Rax?)
Virtual Address of PEB structure in Rdx register (why not Rbx?)
It seems logical to me to take Rax in x64 in place of Eax in x86 and Rbx in x64 in place of Ebx in x86 .
But instead of Eax→Rax and Ebx→Rbx we see Eax→Rcx and Ebx→Rdx.
Also, I see that 64-bit Cheat Engine is aware of this when opening the 32-bit process (notice the migration of the values eax↔ecx and ebx↔edx:
What was the reason to move from *ax register to *cx and from *bx to *dx in 64-bit processes?
Is it somehow connected to calling conventions?
Is it related to Windows only or do other OSes also have this kind of register repurposing?
Update:
Screenshots of just created x64 process in a suspended state:
It seems logical to me to take Rax in x64 in place of Eax in x86 and Rbx in x64 in place of Ebx in x86.
I don't see why it would be logical to assume so.
Even if, at MS, they had defined an internal ABI documenting the context of a just-created 32-bit process, the 64-bit version of would have been designed anew, so there is no reason to assume it carries anything over from the old 32-bit ABI.
If Windows uses sysret to return to user space, a process created with a suspended state may leak the target address in rcx.
Returning via other mechanisms (e.g. iret/retf), as could be the case for 32-bit code, will of course leak different data in different registers.
What you are seeing is probably an artifact of how Windows returns to user mode. I don't know exactly what the Windows kernel code to return to user mode is, but it is reasonable to assume that MS kept the same interface for 32-bit processes and that this interface was designed before sysret was widely used.
Note that at the PE entry-point rcx contains a pointer to the PEB and rdx to the entry-point (not the other way around). The former appears to be an undocumented parameter passed to the entry-point function, the latter may be just an artifact of how the entry-point is called.
In fact, a 32-bit process will find a pointer to the PEB in the stack, as the first parameter for the PE entry-point code.
Regarding other OSes, anything that is not documented to be stable is free to change at any time (including what's left in the registers). This is true in general.
As far as stability goes, passing from a 32-bit to a 64-bit implementation is a pretty big step and, again, there is no reason to keep using a very old interface (but with wider registers) instead of improving it with all the recent knowledge.
You can easily see that, for example, Linux "repurposed" the registers in the 64-bit system call ABI.

When kernel stack's esp is stored to TSS for interrupt return iret?

When I read Intel's X86 programmer's manual, see the following for interrupt & interrupt return with stack switching:
interrupt:
If a stack switch does occur, the processor does the following:
Temporarily saves (internally) the current contents of the SS, ESP, EFLAGS, CS, and EIP registers.
Loads the segment selector and stack pointer for the new stack (that is, the stack for the privilege level being called) from the TSS into the SS and ESP registers and switches to the new stack.
Pushes the temporarily saved SS, ESP, EFLAGS, CS, and EIP values for the interrupted procedure’s stack onto the new stack.
Pushes an error code on the new stack (if appropriate).
Loads the segment selector for the new code segment and the new instruction pointer (from the interrupt gate or trap gate) into the CS and EIP registers, respectively.
If the call is through an interrupt gate, clears the IF flag in the EFLAGS register.
Begins execution of the handler procedure at the new privilege level.
On return:
Performs a privilege check.
Restores the CS and EIP registers to their values prior to the interrupt or exception.
Restores the EFLAGS register.
Restores the SS and ESP registers to their values prior to the interrupt or exception, resulting in a stack switch back to the stack of the interrupted procedure.
Resumes execution of the interrupted procedure.
For example, one linux process P:
It's initially in kernel mode
It returns to user mode by iret. But from the manual, there is no change to TSS
It traps into kernel by int. Here it needs to find the kernel stack from ESP & SS in TSS. How is this kernel stack value set up, since they are not stored to TSS in step 2?
Once the kernel returns to user-space for a given task, it's done with that task's kernel stack until the next interrupt / exception. There's no useful data on it, so the TSS can hold a fixed SS:[ER]SP value that points to the top of the virtual page[s] allocated as the kernel stack for the current task.
Kernel state doesn't live on the kernel stack between entries into the kernel; it's kept elsewhere in a process control block. (Context switches between asks actually happen in the kernel, switching kernel stacks to the formerly-sleeping task's kernel stack, so eventually returning to user-space means returning up the call-chain of whatever that task was doing in the kernel first).
BTW, unless the kernel pushes a new CS:EIP / EFLAGS / SS:ESP for iret to pop, the stuff it pops will be the stuff pushed by hardware at the address specified in the TSS. So even if there was some desire to re-enter the kernel with the stack as you left it, that would normally be at the TSS location anyway. But this is irrelevant because Linux doesn't keep stuff on a task's kernel stack while user-space is running, except for a pointer to per-task stuff at the bottom of the region where the kernel can find it with [ER]SP & -16384.
(I think this is right; I've looked at a few bits of Linux kernel code but haven't really gotten my hands dirty experimenting with things. I think this is how Linux works, and a consistent viable design.)

How to get current process in Linux-4.9 (and above) through registers in X86_64?

Since Linux 4.9 in X86, kernel changed the kernel stack by putting thread_info into task_struct and put current process into per_cpu section.
So it is NOT possible to get the current process in kernel through X86's SP register.
I am curious if there is still a way to get current process through CPU registers instead of using macro of get_current().

Getting stack pointer x86_64 linux syscall

I have implemented a syscall on x86_64 Linux 3.0, and would like to know how to get the calling process's stack pointer (%rsp). My syscall is a plain vanilla syscall...
I'm used to using task_pt_regs to get the stack frame of the calling process, but from arxh/x86/include/asm/ptrace.h, comments in struct pt_regs note that non-tracing syscalls don't read all registers: ip, cs, flags, sp and ss are not set when the CPU syscall instruction is invoked and my actual syscall being called. In other words, in my syscall task_pt_regs(current)->ss is garbage.
For calls like sys_fork, a special macro in arch/x86/kernel/entry_64.S (PTREGSCALL) sets up the sys_fork function to be called with a proper pt_regs stack frame.
How can I extract values like IP and SS in my syscall without forcing an extra argument onto my custom system call like sys_fork with PTREGSCALL?
If can understand well when a syscall is invoked the CPU jumps to the kernel code (jump of privileged), in that moment the CPU fills the stack with the CS, RIP, RSP and Eflags registers in order to return to user code when the handler executes an IRET (Return from Interruption).
This means that you may find the RSP and RIP of the calling process just looking in the stack when the syscall is executed.
You may get more information in the "AMD64 Architecture, Programmer’s Manual, Volume 2: System Programming", page 292. It's called "Long-Mode Stack After Interrupt—Higher Privilege".
In the previous answer, I've ignored a few stuff around the way that Linux kernel handles the syscalls but it doesn't change the answer.

User to kernel mode big picture?

I've to implement a char device, a LKM.
I know some basics about OS, but I feel I don't have the big picture.
In a C programm, when I call a syscall what I think it happens is that the CPU is changed to ring0, then goes to the syscall vector and jumps to a kernel memmory space function that handle it. (I think that it does int 0x80 and in eax is the offset of the syscall vector, not sure).
Then, I'm in the syscall itself, but I guess that for the kernel is the same process that was before, only that it is in kernel mode, I mean the current PCB is the process that called the syscall.
So far... so good?, correct me if something is wrong.
Others questions... how can I write/read in process memory?.
If in the syscall handler I refer to address, say, 0xbfffffff. What it means that address? physical one? Some virtual kernel one?
To read/write memory from the kernel, you need to use function calls such as get_user or __copy_to_user.
See the User Space Memory Access API of the Linux Kernel.
You can never get to ring0 from a regular process.
You'll have to write a kernel module to get to ring0.
And you never have to deal with any physical addresses, 0xbfffffff represents an address in a virtual address space of your process.
Big picture:
Everything happens in assembly. So in Intel assembly, there is a set of privilege instruction which can only be executed in Ring0 mode (http://en.wikipedia.org/wiki/Privilege_level). To make the transition into Ring0 mode, you can use the "Int" or "Sysenter" instruction:
what all happens in sysenter instruction is used in linux?
And then inside the Ring0 mode (which is your kernel mode), accessing the memory will require the privilege level to be matched via DPL/CPL/RPL attributes bits tagged in the segment register:
http://duartes.org/gustavo/blog/post/cpu-rings-privilege-and-protection/
You may asked, how the CPU initialize the memory and register in the first place: it is because when bootup, x86 CPU is running in realmode, unprotected (no Ring concept), and so everything is possible and lots of setup work is done.
As for virtual vs non-virtual memory address (or physical address): just remember that anything in the register used for memory addressing, is always via virtual address (if the MMU is setup, protected mode enabled). Look at the picture here (noticed that anything from the CPU is virtual address, only the memory bus will see physical address):
http://en.wikipedia.org/wiki/Memory_management_unit
As for memory separation between userspace and kernel, you can read here:
http://www.inf.fu-berlin.de/lehre/SS01/OS/Lectures/Lecture14.pdf

Resources