I have compiled my own Kernel module and now I would like to be able to load it
into the GNU Debugger GDB. I did this once, a year ago or so to have a look
at the memory layout. It worked fine then, but of course I was too silly to
write down the single steps I took to accomplish this... Can anyone enlighten
me or point me to a good tutorial?
Thank you so much
For kernels > 2.6.26 (i.e. after May 2008), the preferred way is probably to use "kgdb light" (not to be confused with its ancestor kgdb, available as a set of kernel patches).
"kgdb light" is now part of the kernel (in by default in current Ubuntu kernels, for instance), and it's capabilities are improving fast (Jason Wessel is working on it - possible google key).
Drawback: You need two machines, the one you're debugging and the development machine (host) where gdb runs. Currently, those two machines can only be linked through a serial link.
kgdb runs in the target machine where it handles the breakpoints, stepping, etc. and the remote debugging protocol use to talk with the development machine.
gdb runs in the development machine where it handles the user interface.
An USB-to-serial adapter works OK on the development machine, but currently, you need a real UART on the target machine - and that's not so frequent anymore on recent hardware.
The (terse) kgdb documentation is in the kernel sources, in
Documentation/DocBook
I suggest you google around for "kgdb light" for the complete story.
Again, don't confuse kgdb and kgdb light, they come together in google searches but are mostly different animals. In particular, info from linsyssoft.com relate to the "ancestor" kgdb, so try queries like:
kgdb module debugging -"linsyssoft.com" -site:linsyssoft.com
and discard articles prior to May 2008 / 2.6.26 kernel.
Finally, for module debugging, you need to manually load the module symbols in the dev machine for all the code and sections you are interested in. That's a bit too long to address here, but some clues there, there and there.
Bottom line is, kgdb is a very welcome improvement but don't expect this trip to be as easy as running gdb in user mode. Yet. :)
It has been a while since I was actively developing drivers for Linux, so maybe my answer is a bit out of date. I would say you cannot use GDB. If at all, only to debug post mortem on dump files. To debug you should rather use a kernel debugger. Build the kernel with a kernel debugger enabled (there is one out-of-the box debugger for 2.6, which was lacking at the time I was active). I used the kernel patches for KDB from Sun ftp://oss.sgi.com/www/projects/kdb/download/, which I was quite happy with. A user space tool won't be of much use unless new gdb communicate somehow with the internal kernel debugger (which anyway you would have to activate)
I hope this gives you at least some hints, while not being a detailled answer. Better than no answer at all. Regards.
I suspect what you did was
gdb /boot/vmlinux /proc/kcore
Of course you can't actually do any debugging, but it's certainly good enough to have a poke around the kernel.
Related
From what I understand, on a high level, user mode debugging provides you with access to the private virtual address for a process. A debug session is limited to that process and it cannot overwrite or tamper w/ other process' virtual address space/data.
Kernel mode debug, I understand, provides access to other drivers and kernel processes that need full access to multiple resources, in addition to the original process address space.
From this, I get to thinking that kernel mode debugging seems more robust than user mode debugging. This raises the question for me: is there a time, when both options of debug mode are available, that it makes sense to choose user mode over a more robust kernel mode?
I'm still fairly new to the concept, so perhaps I am thinking of the two modes incorrectly. I'd appreciate any insight there, as well, to better understand anything I may be missing. I just seem to notice that a lot of people seem to try to avoid kernel debugging. I'm not entirely sure why, as it seems more robust.
The following is mainly from a Windows background, but I guess it should be fine for Linux too. The concepts are not so different.
Some inline answers first
From what I understand, on a high level, user mode debugging provides you with access to the private virtual address for a process.
Correct.
A debug session is limited to that process
No. You can attach to several processes at the same time, e.g. with WinDbg's .tlist/.attach command.
and it cannot overwrite or tamper w/ other process' virtual address space/data.
No. You can modify the memory, e.g. with WinDbg's ed command.
Kernel mode debug, I understand, provides access to other drivers and kernel processes that need full access to multiple resources,
Correct.
in addition to the original process address space.
As far as I know, you have access to physical RAM only. Some of the virtual address space may be swapped, so not the full address space is available.
From this, I get to thinking that kernel mode debugging seems more robust than user mode debugging.
I think the opposite. If you write incorrect values somewhere in kernel mode, the PC crashes with a blue screen. If you do that in user mode, it's only the application that crashes.
This raises the question for me: is there a time, when both options of debug mode are available, that it makes sense to choose user mode over a more robust kernel mode?
If you debug an application only and no drivers are involved, I prefer user mode debugging.
IMHO, kernel mode debugging is not more robust, it's more fragile - you can really break everything at the lowest level. User mode debugging provides the typical protection against crashes of the OS.
I just seem to notice that a lot of people seem to try to avoid kernel debugging
I observe the same. And usually it's not so difficult once they try it. In my debugging workshops, I explain processes and threads from kernel point of view and do it live in the kernel. And once people try kernel debugging, it's not such a mystery any more.
I'm not entirely sure why, as it seems more robust.
Well, you really can blow up everything in kernel mode.
User mode debugging
User mode debugging is the default that any IDE will do. The integration is usually good, in some IDEs it feels quite native.
During user mode debugging, things are easy. If you access memory that is paged out to disk, the OS is still running and will simply page it in, so you can read and write it.
You have access to everything that you know from application development. There are threads and you can suspend or resume them. The knowledge you have from application development will be sufficient to operate the debugger.
You can set breakpoints and inspect variables (as long as you have correct symbols).
Some kinds of debugging is only available in user mode. E.g. the SOS extension for WinDbg to debug .NET application only works in user mode.
Kernel debugging
Kernel debugging is quite complex. Typically, you can't simply do local kernel debugging - if you stop somewhere in the kernel, how do you control the debugger? The system will just freeze. So, for kernel debugging, you need 2 PCs (or virtual PCs).
During kernel mode debugging, things are complex. While you are just inside an application, a millisecond later, some interrupt occurs and does something completely different. You don't only have threads, you also need to deal with call stacks that are outside your application, you'll see CPU register content, instruction pointers etc. That's all stuff a "normal" app developer does not want to care about.
You don't only have access to everything that you implemented. You also have access to everything that Microsoft, Intel, NVidia and lots of other companies developed.
You cannot simply access all memory, because some memory that is paged out to the swap file will first generate a page fault, then involve some disk driver to fetch the data, potentially page out some other data, etc.
There is so much giong on in kernel mode and in order to not break it, you need to have really professional comprehension of all those topics.
Conclusion
Most developers just want to care about their source code. So if they are writing programs (aka. applications, scripts, tools, games), they just want user mode debugging. If "their code" is driver code, of course they want kernel debugging.
And of course Security Specialists and Crackers want kernel mode debugging because they want privileges.
I am using gdb attached to a serial port of a virtual machine to debug linux kernel.
I am wondering, if there is any patches/plugins which can make the gdb understand some of linux kernel's data structure and make it "thread aware"?
By that I mean under gdb I can see how many kernel threads are there, their status, and for each thread, their stack information.
libvmi
https://github.com/libvmi/libvmi
This project does "LibVMI: Simplified Virtual Machine Introspection" which sounds really close.
This project in particular https://github.com/Wenzel/pyvmidbg uses libvmi and features a demo video of debugging a Windows userland application form inside it, without memory conflicts.
As of May 2019, there are two limitations however as of May 2019, both of which could be overcome with some work: https://github.com/Wenzel/pyvmidbg/issues/24
Linux memory parsing is not yet complete
requires Xen
The developer of that project also answered further at: https://stackoverflow.com/a/56369454/895245
Implementing it with those libraries would be in my opinion the best way to achieve this goal today.
Linaro lkd-python
First, this Linaro page claims to have a working setup: https://wiki.linaro.org/LandingTeams/ST/GDB that allows you to do usual thread operations such as thread, bt, etc., but it relies on a GDB fork. I will test it out later. In 2016, https://youtu.be/pqn5hIrz3A8 says that the implementation was in C, not as Python scripts unfortunately, which would be better and avoid forking. The sketch for lkd-python can be found at: https://git.linaro.org/people/lee.jones/kieran.bingham/binutils-gdb.git/log/?h=lkd-python
Linux kernel in-tree GDB scripts + my brain
I then tried to see what I could do with the kernel in-tree Python scripts at v4.17 + some manual intervention as a prototype, but didn't quite get there yet.
I have tested using this highly automated QEMU + Buildroot setup.
First follow the procedure I described at: How to debug the Linux kernel with GDB and QEMU? to get GDB working.
Then, as described at: How to debug Linux kernel modules with QEMU? run GDB with:
gdb -ex add-auto-load-safe-path /full/path/to/linux/kernel
This loads the in-tree GDB Python scripts from scripts/gdb.
One of those scripts provides:
lx-ps
which lists all threads with format:
0xffff88000ed08000 1 init
0xffff88000ed08ac0 2 kthreadd
The first field is the address of the task_struct struct, so we can see the entire struct with:
p (struct task_struct)*0xffff88000ed08000
which should in theory allow us to get any information we want about the process.
Now I wanted to find the PC. For ARM, I've seen: Find program counter of process in kernel and I tried:
task_pt_regs((struct thread_info *)((struct task_struct)*0xffffffc00e8f8000))->uregs[ARM_pc]
but task_pt_regs is a #define and GDB cannot see defines without -ggdb3: How do I print a #defined constant in GDB? which are apparently not set?
I don't think GDB understands kernel data structures, that would make them version dependent. GDB uses ptrace for gathering information on any running process.
That's all I know :(
pyvmidbg developer here.
I will add some clarifications:
yes the goal of the project is indeed to have a cross-platform, guest-aware GDB stub.
Most of the implementation is already done for Windows, where we are aware of processes and their threads context.
It's possible to intercept a specific process (cmd.exe in the demo) and singlestep its execution (this is limited to 1 process with 1 thread for now), as well as attaching to a new process's entrypoint.
Regarding Linux, I looked at the internals and the resources that I could find, but I'm lacking the whole picture to figure out how I can:
- intercept a task when it's being scheduled (core/sched.c:switch_to() ?)
- read the task state (Windows's KTRAP_FRAME equivalent for Linux ?)
I asked a question on SO, but nobody answered :/
Linux context switch internals: how does a process goes back to userland after the switch?
If you can help with this, I can guide you through the implementation :)
Regarding the hypervisor support, only Xen is fully supported in the Libvmi interface at the moment.
I added a section in the README to describe where we are in terms of VMI APIs with other hypervisors.
Thanks !
I understand how powerful windbg can be at debugging, but when is an appropriate or best time to use it to debug an issue? Is it more issue specific, or just experience, intuition, and knowing that using it can just get the job done best?
It's a little bit of all those things, and a lot of personal perference. Many WinDbg people only use WinDbg so that's what they are best at debugging with.
WinDbg also has some good extensions out there like SOS. So a particular extension might provide you with the specific piece of information that another debugger does not.
One reason to use a different debugger in certain circumstances is if you believe the debugger is incorrect. This is rare of course. For things like stack walking for instance, the debuggers use different methods, so you can confirm the stack is what you expect by using the other.
So sum up, for most issues it doesn't matter. It's whatever you are best at using. For some particular issues it's what you say, knowing which tool is the best for that particular issue.
While Windbg is also a fine tool for user-mode debugging, if you end up doing kernel-mode debugging it is really the only serious choice.
The kernel-mode debugging scenario often involves two machines, a debugger and a debuggee. You will be running Windbg on a debugger machine which is connected to the debuggee over a serial connection, Firewire or USB. In this scenario you can "freeze" the target machine and have full control over everything running on it. Often your debuggee (the target) will be a virtual machine running under VMWare or similar -- in this case the connection also typically uses virtual serial ports.
Here are instructions from VMWare on how to set up kernel debugging of a virtual machine:
http://www.vmware.com/support/ws5/doc/ws_devices_serial_advanced_example_debugging.html
You can also use VirtualKD which makes the setup easier and the connection much faster:
http://virtualkd.sysprogs.org/
You can also use Windbg for local kernel debugging. In this case, you only have a single machine where you connect Windbg to the running kernel. You cannot "freeze" the machine, as it would also freeze Windbg running on the same machine, but you can analyze the contents of memory and so on.
Good point. Another good solution for virtual kernel debugging is LiveKd from sysinternals.
http://technet.microsoft.com/en-us/sysinternals/bb897415.aspx
I'm just cross posting the same question I did on virtualbox.org. http://forums.virtualbox.org/viewtopic.php?f=9&t=26702&p=119139#p119139
If not breaking any rule, I'd appreciate to kwon more about it since stackoverflow promisses to be more dynamic!
"Hi,
I did some search and could not find any tool to debug a guest system from the early boot in virtual box. Although, I came across JCP, a x86 emulator in java that is not so powerful and beautyful but has a debug mode where one can view the Physical Memory, the CPU registers along other things. It also makes it possible to execute CPU instructions step by step and set break points, watchpoints and conditional ones. There's such thing in Virtual Box?
I think would be amazing to have it and be able to inspect the system while its running. For learn about PC architecture or as a tool to develop a kernel as well.
In the case you think its good idea (I think it is) how can it be achieved? I'm interested in develop such sort o things and would like to know if it is feasible if not already implemented somewhere."
EDT: Are modern x86 able to interrupt its execution just after a cpu cycle and pass execution addres to another code to just do this? Yes, the trap flag can be set to put the processor in step by step execution mode. x86 will execute one instruction and call INT 3.
Contrary to what is stated above, VirtualBox now contains a (limited) debugger. Add --dbg to the command line when starting the VM. For more information consult:
12.1.3. The built-in VM debugger
The OSDev wiki has some useful information on debugging a guest operating system, though according to this page VirtualBox doesn't have a debugger at present. I've been using QEmu with the GDB stub and it works quite nicely, so you might like to give that a go instead.
I have big system that make my system crash hard. When I boot up, I don't even have
a coredump. If I log every line that
get executed until my system goes down. I will find that evil code.
Can I log every source code line in GDB to a file?
UPDATE:
ok, I found the bug. It was nasty. The application I started did not
take the system down. After learning about coredump inspection with mdb, and some gdb stepping I found out that the systemcall causing the dump, was not implemented. Updating the system to latest kernel will fix my problem. Thanks to all of you.
MY LESSON:
make sure you know what process causes the coredump. It's not always the one you started.
Sounds like a tricky little problem.
I often try to eliminate as many possible suspects as I can by commenting out large chunks of code, configuring the system to not run certain pieces (if it allows you to do that) etc. This amounts to doing an ad-hoc binary search on the problem, and is a surprisingly effective way of zooming in on offending code relatively quickly.
A potential problem with logging is that the log might not hit the disk before the system locks up - if you don't get a core dump, you might not get the log.
Speaking of core dumps, make sure you don't have a limit on your core dump size (man ulimit.)
You could try to obtain a list of all the functions in your code using objdump, process it a little bit and create a bunch of GDB trace statements on those functions - basically creating a GDB script automatically. If that turns out to be overkill, then a binary search on the code using tracepoints can also help you zoom in on the problem.
And don't panic. You're smarter than the bug - you'll find it.
You can not reasonably track every line of your source using GDB (too slow). Besides, a system crash is most likely a result of a system call, and libc is probably doing the system call on your behalf. Even if you find the line of the application that caused OS crash, you still don't really know anything.
You should start by clarifying which OS is crashing. For Linux, you can try the following approaches:
strace -fo trace.out /path/to/app
After reboot, trace.out will contain syscalls the application was doing just before the crash. If you are lucky, you'll see the last syscall-of-death, but I wouldn't count on it.
Alternatively, try to reproduce the crash on the user-mode Linux, or on kernel with KGDB compiled in.
These will tell you where the problem in the kernel is. Finding the matching system call in your application will likely be trivial.
Please clarify your problem: What part of the system is crashing?
Is it an application?
If so, which application? Is this an application which you have written yourself? Is this an application you have obtained from elsewhere? Can you obtain a clean interrupt if you use a debugger? Can you obtain a backtrace showing which functions are calling the section of code which crashes?
Is it a new hardware driver?
Is it based on an older driver? If so, what has changed? Is it based on a manufacturer's data sheet? Is that data sheet the latest and most correct?
Is it somewhere in the kernel? Which kernel?
What is the OS? I assume it is linux, seeing that you are using the GNU debugger. But of course, that is not necessarily so.
You say you have no coredump. Have you enabled coredumps on your machine? Most systems these days do not have coredumps enabled by default.
Regarding logging GDB output, you may have some success, but it depends where the problem is whether or not you will have the right output logged before the system crashes. There is plenty of delay in writing to disk. You may not catch it in time.
I'm not familiar with the gdb way of doing this, but with windbg the way to go is to have a debugger attached to the kernel and control the debugger remotely over a serial cable (or firewire) from a second debugger. I'm pretty sure gdb has similar capabilities, I could quickly find some hints here: http://www.digipedia.pl/man/gdb.4.html