I am trying to debug a driver for an Ethernet-MAC in the Linux kernel using kgdb over serial.
I halt the execution by making a call to "kgdb_breakpoint()" at the desired location in the code and recompile the kernel.
But after the code halts, as you may notice in the screenshot the backtrace shows correct function-graph and source filenames but, for some reason corresponding line numbers are not correct.
Please note: I have compiled this kernel with "CONFIG_FRAME_POINTER" set and "nokaslr" in the boot-args.
Is there a way I can see a stack trace with correct line number here ?
(I have used QtCreator during this screenshot, although behavior is similar with gdb over command-line or TUI)
Edit: No matter, whichever function I put the `kgdb_breakpoint()`
in, (inside the driver source) , line number in the stacktrace always say the same line number for the halted function.
Related
Consider the following Linux kernel dump stack trace; e.g., you can trigger a panic from the kernel source code by calling panic("debugging a Linux kernel panic");:
[<001360ac>] (unwind_backtrace+0x0/0xf8) from [<00147b7c>] (warn_slowpath_common+0x50/0x60)
[<00147b7c>] (warn_slowpath_common+0x50/0x60) from [<00147c40>] (warn_slowpath_null+0x1c/0x24)
[<00147c40>] (warn_slowpath_null+0x1c/0x24) from [<0014de44>] (local_bh_enable_ip+0xa0/0xac)
[<0014de44>] (local_bh_enable_ip+0xa0/0xac) from [<0019594c>] (bdi_register+0xec/0x150)
In unwind_backtrace+0x0/0xf8 what does +0x0/0xf8 stand for?
How can I see the C code of unwind_backtrace+0x0/0xf8?
How to interpret the panic's content?
It's just an ordinary backtrace, those functions are called in reverse order (first one called was called by the previous one and so on):
unwind_backtrace+0x0/0xf8
warn_slowpath_common+0x50/0x60
warn_slowpath_null+0x1c/0x24
ocal_bh_enable_ip+0xa0/0xac
bdi_register+0xec/0x150
The bdi_register+0xec/0x150 is the symbol + the offset/length there's more information about that in Understanding a Kernel Oops and how you can debug a kernel oops. Also there's this excellent tutorial on Debugging the Kernel
Note: as suggested below by Eugene, you may want to try addr2line first, it still needs an image with debugging symbols though, for example
addr2line -e vmlinux_with_debug_info 0019594c(+offset)
Here are two alternatives for addr2line. Assuming you have the proper target's toolchain, you can do one of the following:
Use objdump:
locate your vmlinux or the .ko file under the kernel root directory, then disassemble the object file :
objdump -dS vmlinux > /tmp/kernel.s
Open the generated assembly file, /tmp/kernel.s. with a text editor such as vim. Go to
unwind_backtrace+0x0/0xf8, i.e. search for the address of unwind_backtrace + the offset. Finally, you have located the problematic part in your source code.
Use gdb:
IMO, an even more elegant option is to use the one and only gdb. Assuming you have the suitable toolchain on your host machine:
Run gdb <path-to-vmlinux>.
Execute in gdb's prompt: list *(unwind_backtrace+0x10).
For additional information, you may checkout the following resources:
Kernel Debugging Tricks.
Debugging The Linux Kernel Using Gdb
In unwind_backtrace+0x0/0xf8 what the +0x0/0xf8 stands for?
The first number (+0x0) is the offset from the beginning of the function (unwind_backtrace in this case). The second number (0xf8) is the total length of the function. Given these two pieces of information, if you already have a hunch about where the fault occurred this might be enough to confirm your suspicion (you can tell (roughly) how far along in the function you were).
To get the exact source line of the corresponding instruction (generally better than hunches), use addr2line or the other methods in other answers.
When I was debugging my .c file using lldb on terminal for Mac, I some how cannot find the location of the segmentation fault. I have debugged the code numerous of times and it is still producing the same error. Can someone help me on why I can find the location of segmentation fault. enter image description here
Use the bt command in lldb to see the call stack. You've called a libc function like scanf() and are most likely passing an invalid argument to it. When you see the call stack, you will see a stack frame with your own code on it, say it is frame #3. You can select that frame with f 3, and you can look at variables with the v command to understand what arguments were passed to the libc function that led to a crash.
Without knowing what your code is doing, I would suggest using a tool like valgrind instead of just a normal debugger. It's designed to look for memory issues for lower-level languages like C/C++/FORTRAN. For example, it will tell you if you're trying to use an index that is too large for an array.
From the quick start guide, try valgrind --leak-check=yes myprog arg1 arg2
I recently set up my system for kernel debug using qemu+gdb. At present, I can set breakpoints at, for example, __do_page_fault() and trace the call via gdb (with win command). Now I want the following task: A simple C program having a "hello world" printfstatement. Trace the call sequence starting from the userspace down to the write() system call ( or anything in the kernel space that is invoked during the execution of that particular userspace program). I want to learn how userspace program traps into system call w.r.t Linux kernel specifically.
Now my doubt is where to set the breakpoint? We have kernel code as well as the C code of the program. How to go about this situation ? Please give us an explanation with example.
Thank You !
The most easiest way in my opinion is to separate this into two pieces.
Place breakpoint in guest kernel using host gdb.
Place breakpoint in user code before trap instruction, using in-guest target gdb, when hit - print stack using target (in-qemu) gdb. You will get user space stack trace.
Continue execution in guest gdb
In-kernel breakpoint (we have set it at stage 1) will be hit in host gdb. Print kernel stack trace.
P.S.
If your kernel will continuously hit breakpoint (f.e. write syscall is definitely used widely), you can use a conditional breakpoint to hit a breakpoint only with a certain parameters passed.
My program is panicking so I followed its advice to run RUST_BACKTRACE=1 and I get this (just a little snippet).
1: 0x800c05b5 - std::sys::imp::backtrace::tracing::imp::write::hf33ae72d0baa11ed
at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:42
2: 0x800c22ed - std::panicking::default_hook::{{closure}}::h59672b733cc6a455
at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/panicking.rs:351
If the program panics it stops the whole program, so where can I figure out at which line it's panicking on?
Is this line telling me there is a problem at line 42 and line 351?
The whole backtrace is on this image, I felt it would be to messy to copy and paste it here.
I've never heard of a stack trace or a back trace. I'm compiling with warnings, but I don't know what debugging symbols are.
What is a stack trace?
If your program panics, you encountered a bug and would like to fix it; a stack trace wants to help you here. When the panic happens, you would like to know the cause of the panic (the function in which the panic was triggered). But the function directly triggering the panic is usually not enough to really see what's going on. Therefore we also print the function that called the previous function... and so on. We trace back all function calls leading to the panic up to main() which is (pretty much) the first function being called.
What are debug symbols?
When the compiler generates the machine code, it pretty much only needs to emit instructions for the CPU. The problem is that it's virtually impossible to quickly see from which Rust-function a set of instructions came. Therefore the compiler can insert additional information into the executable that is ignored by the CPU, but is used by debugging tools.
One important part are file locations: the compiler annotates which instruction came from which file at which line. This also means that we can later see where a specific function is defined. If we don't have debug symbols, we can't.
In your stack trace you can see a few file locations:
1: 0x800c05b5 - std::sys::imp::backtrace::tracing::imp::write::hf33ae72d0baa11ed
at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:42
The Rust standard library is shipped with debug symbols. As such, we can see where the function is defined (gcc_s.rs line 42).
If you compile in debug mode (rustc or cargo build), debug symbols are activated by default. If you, however, compile in release mode (rustc -O or cargo build --release), debug symbols are disabled by default as they increase the executable size and... usually aren't important for the end user. You can tweak whether or not you want debug symbols in your Cargo.toml in a specific profile section with the debug key.
What are all these strange functions?!
When you first look at a stack trace you might be confused by all the strange function names you're seeing. Don't worry, this is normal! You are interested in what part of your code triggered the panic, but the stack trace shows all functions somehow involved. In your example, you can ignore the first 9 entries: those are just functions handling the panic and generating the exact message you are seeing.
Entry 10 is still not your code, but might be interesting as well: the panic was triggered in the index() function of Vec<T> which is called when you use the [] operator. And finally, entry 11 shows a function you defined. But you might have noticed that this entry is missing a file location... the above section describes how to fix that.
What do to with a stack trace? (tl;dr)
Activate debug symbols if you haven't already (e.g. just compile in debug mode).
Ignore any functions from std and core at the top of the stack trace.
Look at the first function you defined, find the corresponding location in your file and fix the bug.
If you haven't already, change all camelCase function and method names to snake_case to stick to the community wide style guide.
list commands prints a set of lines, but I need one single line, where I am and where an error has probably occurred.
The 'frame' command will give you what you are looking for. (This can be abbreviated just 'f'). Here is an example:
(gdb) frame
\#0 zmq::xsub_t::xrecv (this=0x617180, msg_=0x7ffff00008e0) at xsub.cpp:139
139 int rc = fq.recv (msg_);
(gdb)
Without an argument, 'frame' just tells you where you are at (with an argument it changes the frame). More information on the frame command can be found here.
Command where or frame can be used. where command will give more info with the function name
I do get the same information while debugging. Though not while I am checking the stacktrace. Most probably you would have used the optimization flag I think. Check this link - something related.
Try compiling with -g3 remove any optimization flag.
Then it might work.
HTH!
Keep in mind that gdb is a powerful command -capable of low level instructions- so is tied to assembly concepts.
What you are looking for is called de instruction pointer, i.e:
The instruction pointer register points to the memory address which the processor will next attempt to execute. The instruction pointer is called ip in 16-bit mode, eip in 32-bit mode,and rip in 64-bit mode.
more detail here
all registers available on gdb execution can be shown with:
(gdb) info registers
with it you can find which mode your program is running (looking which of these registers exist)
then (here using most common register rip nowadays, replace with eip or very rarely ip if needed):
(gdb)info line *$rip
will show you line number and file source
(gdb) list *$rip
will show you that line with a few before and after
but probably
(gdb) frame
should be enough in many cases.
All the answers above are correct, What I prefer is to use tui mode (ctrl+X A or 'tui enable') which shows your location and the function in a separate window which is very helpful for the users.
Hope that helps too.