Trace from user space code to kernel space - linux-kernel

I recently set up my system for kernel debug using qemu+gdb. At present, I can set breakpoints at, for example, __do_page_fault() and trace the call via gdb (with win command). Now I want the following task: A simple C program having a "hello world" printfstatement. Trace the call sequence starting from the userspace down to the write() system call ( or anything in the kernel space that is invoked during the execution of that particular userspace program). I want to learn how userspace program traps into system call w.r.t Linux kernel specifically.
Now my doubt is where to set the breakpoint? We have kernel code as well as the C code of the program. How to go about this situation ? Please give us an explanation with example.
Thank You !

The most easiest way in my opinion is to separate this into two pieces.
Place breakpoint in guest kernel using host gdb.
Place breakpoint in user code before trap instruction, using in-guest target gdb, when hit - print stack using target (in-qemu) gdb. You will get user space stack trace.
Continue execution in guest gdb
In-kernel breakpoint (we have set it at stage 1) will be hit in host gdb. Print kernel stack trace.
P.S.
If your kernel will continuously hit breakpoint (f.e. write syscall is definitely used widely), you can use a conditional breakpoint to hit a breakpoint only with a certain parameters passed.

Related

Wrong line number in stack-trace in debugging Linux kernel using kgdb

I am trying to debug a driver for an Ethernet-MAC in the Linux kernel using kgdb over serial.
I halt the execution by making a call to "kgdb_breakpoint()" at the desired location in the code and recompile the kernel.
But after the code halts, as you may notice in the screenshot the backtrace shows correct function-graph and source filenames but, for some reason corresponding line numbers are not correct.
Please note: I have compiled this kernel with "CONFIG_FRAME_POINTER" set and "nokaslr" in the boot-args.
Is there a way I can see a stack trace with correct line number here ?
(I have used QtCreator during this screenshot, although behavior is similar with gdb over command-line or TUI)
Edit: No matter, whichever function I put the `kgdb_breakpoint()`
in, (inside the driver source) , line number in the stacktrace always say the same line number for the halted function.

how to force gdb to stop right after the start of program execution?

I've tried to set breakpoint on every function that makes any sense but program exit before reaching any of those. Is there a way to make program run in step-by-step mode from the start so I can see what's going on?
I'm trying to debug /usr/bin/id if it's important (we have custom plugin for it and it's misbehaved)
P.S. Start command doesn't work for me here(it should be a comment, but I don't have enough rep for it)
Get the program entry point address and insert a breakpoint at that address.
One way to do this is to do info files which gives you for example "Entry point: 0x4045a4". Then do "break *0x4045a4". After run-ning program, it will immediately stop.
From here on you can use single stepping instructions (like step or stepi) to proceed.
You did not tell what system you are trying to debug. If code is in read-only memory you may need to use hardware breakpoints (hbreak) if they are supported by that system.
Use start command
The ‘start’ command does the equivalent of setting a temporary breakpoint at the beginning of the main procedure and then invoking the ‘run’ command.
e.g.
a program with debug info main, and usage like this: main arg1 arg2
gdb main
(gdb) start arg1 arg2
Use starti. Unlike start this stops at the actual first instruction, not at main().
You can type record full right after running the program. This will record all instructions and make them possible for replaying/going back.
For main function, you'd need to type this before reaching the breakpoint so you can set an earlier one by break _start -> _start is a function always called before the standard main function. (apparently applies only to the gcc compiler or similar)
Then continue to main breakpoint and do reverse-stepi to go exactly one instruction back
For more info about recording look here: link

How does a debugger set breakpoints if the image is in read-only memory?

How does a debugger set breakpoints if the image is in read-only memory? I know there are hardware breakpoints, but in the debugger I use (OllyDbg) those have to be set specially using a different dialog than normal breakpoints.
Explanation:
Here is a routine in a debugger that is comparing itself to a copy of itself. EDX points to the running image, EBX points to the known good copy of the image. The breakpoint on 4010CE only is reached if there is a mismatch. The character being compared is in the AL register. As you can see the debugger shows EB F6 at 10CE, but this is false. 10CE actually has CC in it, as you can see by looking at the AL register. This is because the debugger has secretely inserted the CC to perform the breakpoint.
The debugger first has to change the memory protection of the page it wants to write to. This can be done with VirtualProtectEx. After that it is able to write with WriteProcessMemory and then set the protection back to the original value.
Let me preface this with a disclaimer that I'm not familiar with your particular toolset.
If you haven't enabled hardware breakpoints, the only remaining breakpoint type is a software breakpoint. These are only hit (on x86 because that's what I'm most familiar with) when you replace the first byte of an instruction with a trap instruction, and will only be routed through the breakpoint mechanism of your OS to your debugger if the correct trap instruction for your OS is used and the debugger has already registered itself with the OS as a debugger for this process. In order to cause the software breakpoint to happen at the correct moment, the trap instruction must be written into your code segment over the first byte of your correct instruction.
The two answers that got here first explain the two scenarios which could get you here (at least, the only two I can think of):
The kernel always has write access everywhere, except for hardware-protected pages (ie on some sort of ROM), which your process' memory is almost certainly not. It has the ability to write the breakpoint instruction regardless of the permissions exposed to the user process being debugged.
The debugger must use some syscall to change the access rights on the memory of the target process before inserting the breakpoint.
Personally, I'm guessing the first thing is happening. The segment permissions are only in place to protect your target process from itself, not from a debugger process or from the kernel. Debugging mechanisms in operating systems pretty regularly violate "normal" permissions to allow the debugger to do whatever it wants to the target process. This, of course, is why some operating systems require you to enter a password before you're allowed to use the debugger in certain scenarios.
However, you can test if it's the second one by attempting to write to the code segment from inside the target process after a breakpoint has been set. If the write succeeds, you know the permissions have been lowered by the OS (to allow the process to be debugged). It would be pretty awkward for the OS to require a debugger to jump through this hoop since it can already insert arbitrary code into the writeable parts of memory and then force a jump to it by generating a stack frame overflow.
The debugger takes advantage of the WriteProcessMemory() function to alter the instruction in place. It'll keep a copy of the instruction. When the bp is hit it will reset the old byte value and set EIP back to the previous instruction so the real instruction can execute.

How to switch from user mode to kernel mode?

I'm learning about the Linux kernel but I don't understand how to switch from user mode to kernel mode in Linux. How does it work? Could you give me some advice or give me some link to refer or some book about this?
The only way an user space application can explicitly initiate a switch to kernel mode during normal operation is by making an system call such as open, read, write etc.
Whenever a user application calls these system call APIs with appropriate parameters, a software interrupt/exception(SWI) is triggered.
As a result of this SWI, the control of the code execution jumps from the user application to a predefined location in the Interrupt Vector Table [IVT] provided by the OS.
This IVT contains an adress for the SWI exception handler routine, which performs all the necessary steps required to switch the user application to kernel mode and start executing kernel instructions on behalf of user process.
To switch from user mode to kernel mode you need to perform a system call.
If you just want to see what the stuff is going on under the hood, go to TLDP is your new friend and see the code (it is well documented, no need of additional knowledge to understand an assembly code).
You are interested in:
movl $len,%edx # third argument: message length
movl $msg,%ecx # second argument: pointer to message to write
movl $1,%ebx # first argument: file handle (stdout)
movl $4,%eax # system call number (sys_write)
int $0x80 # call kernel
As you can see, a system call is just a wrapper around the assembly code, that performs an interruption (0x80) and as a result a handler for this system call will be called.
Let's cheat a bit and use a C preprocessor here to build an executable (foo.S is a file where you put a code from the link below):
gcc -o foo -nostdlib foo.S
Run it via strace to ensure that we'll get what we write:
$ strace -t ./foo
09:38:28 execve("./foo", ["./foo"], 0x7ffeb5b771d8 /* 57 vars */) = 0
09:38:28 stat(NULL, Hello, world!
NULL) = 14
09:38:28 write(0, NULL, 14)
I just read through this, and it's a pretty good resource. It explains user mode and kernel mode, why changes happen, how expensive they are, and gives some interesting related reading.
https://blog.codinghorror.com/understanding-user-and-kernel-mode
Here's a short excerpt:
Kernel Mode
In Kernel mode, the executing code has complete and unrestricted access to the underlying hardware. It can execute any CPU instruction and reference any memory address. Kernel mode is generally reserved for the lowest-level, most trusted functions of the operating system. Crashes in kernel mode are catastrophic; they will halt the entire PC.
User Mode
In User mode, the executing code has no ability to directly access hardware or reference memory. Code running in user mode must delegate to system APIs to access hardware or memory. Due to the protection afforded by this sort of isolation, crashes in user mode are always recoverable. Most of the code running on your computer will execute in user mode.

What happens if TF(trap flag) is set to 0 in 8086 microprocessors?

Here I searched that:
Trap Flag (T) – This flag is used for on-chip debugging. Setting trap
flag puts the microprocessor into single step mode for debugging. In
single stepping, the microprocessor executes a instruction and enters
into single step ISR.
If trap flag is set (1), the CPU automatically generates an internal
interrupt after each instruction, allowing a program to be inspected
as it executes instruction by instruction.
If trap flag is reset (0), no function is performed.
https://en.wikipedia.org/wiki/Trap_flag
Now I am coding on emu-8086. As explained, TF must be set in order to debugger work.
Should I set a TF always myself or it is set automatically?
If I somehow set a TF to 0, will the whole computer systems debuggers work or just emu-8086 wont debug?
I've never used emu8086 but by looking at some screenshot of it and judging by its name it's probably an emulator - this means it is not running the code natively.
Each instruction is changing the state of a virtual 8086 CPU (represented as a data structure in memory) and not the state of your real CPU.
With this emulation, emu8086 doesn't need to rely on the TF flag to single-step your program, it just needs to stop after one step of emulation and wait for you to hit another button.
This is also why you can find a thing such as "Step back".
If you were wondering what would happen if a debugged program (and not an emulated one) sets the TF flag then the answer is that it depends on the debugger.
The correct behaviour is the one where the debuggee receives the exceptions but this is hard to handle correctly (since the debugger itself uses the TF flag).
Some debugger just don't care and swallow the exception (i.e. they don't forward it to the program under debug) assuming that a well written program doesn't need to use the TF flag.
Unfortunately malwares routinely use a set of anti-debug technique including setting the TF and checking it back/waiting for exceptions to detect the presence of a debugger.
A truly transparent debugger has to handle the RFLAGS register carefully.
When debugging with breakpoints the TF is not set while the program is executing, so there is nothing to worry about.
However when single stepping the TF is set during the next instruction, this is problematic during a pushfd/q and the debugger must explicitly handle that case to avoid detection.
If the debuggee sets the TF the debugger must pass the debug exception to the program - under current OS the TF won't last more than an instruction because the OS will catch the exception,
trasnform it in a signal and dispatch it to the program while clearing the TF. So the debugger can simply do a check before stepping into a popfd/q instruction.
Where the TF doesn't get cleared by the OS the debugger must effectively emulate RFLAGS with a copy.
The debugger sets TF according to what it needs to do. The code being debugged should not modify TF.

Resources