Force GDB to use frame-pointer based unwinding - debugging

I have a process where one .o file is built without any .eh_frame or .debug_frame section (via an assembler) but with other types of debug info such as .debug_info. Apparently this triggers gdb to stop using frame-pointer (rbp) based unwinding for any functions from that object, and it produces invalid backtraces (it isn't clear how it is trying to unwind the stack at all).
Now the functions in this binary set up the stack frame properly (i.e., rbp points to correctly to the base of the frame) and if GDB were just to use that to unwind, everything would be great. Is there some way I can tell it to ignore the dwarf2 info and use frame-pointer based unwinding?

if gcc were just to use that to unwind, everything would be great.
You mean GDB.
I use the following routine in my ~/.gdbinit to unwind $rbp frame chain:
define xbt
set $xbp = (void **)$arg0
while 1
x/2a $xbp
set $xbp = (void **)$xbp[0]
end
end
Call it with the initial base pointer address you want to start from, e.g., xbt $rbp to use the current base pointer.
This isn't as good as allowing GDB to do it (no access to parameters or locals), but it does get at least the call trace.
For making GDB to ignore existing DWARF unwind info, you'll have to patch it out and build your own GDB.
P.S. Using --strip-dwo will not help.
Update:
why stripping isn't feasible?
Well, --strip-dwo only strips .dwo sections, and that's not where unwind info is (it's in .eh_frame and .debug_frame sections).
That said, you should try to strip .debug_frame with strip -g bad.o -- if your file only has bad .debug_frame but correct (or missing) .eh_frame, then removing .debug_frame should work.
strip doesn't remove .eh_frame because that info is usually required for unwinding.
If .eh_frame is also bad, you may be able to remove it with objcopy.
Some more info on unwinding here.

I've found a very simple hack that was good enough for my purposes.
In my case there is a single function that didn't work with up command.
Here are the steps:
set $rip = *((void**)$rbp+ 1)
set $rbp = *((void**)$rbp)
First line manually patches the instruction pointer. This seems similar to calling up on gdb, but function arguments are still broken. Second line sets rbp to it's value from caller - this fixes arguments for me.
It's probably ok to call this multiple times to go up multiple functions. In my case after single iteration of these commands up and frame start to work. You might also need to set rsp.
Warning: there is no easy way to go back (down)

Related

How do symbols solve walking the stack with FPO in x86 debugging?

In this answer: https://stackoverflow.com/a/8646611/192359 , it is explained that when debugging x86 code, symbols allow the debugger to display the callstack even when FPO (Frame Pointer Omission) is used.
The given explanation is:
On the x86 PDBs contain FPO information, which allows the debugger to reliably unwind a call stack.
My question is what's this information? As far as I understand, just knowing whether a function has FPO or not does not help you finding the original value of the stack pointer, since that depends on runtime information.
What am I missing here?
Fundamentally, it is always possible to walk the stack with enough information1, except in cases where the stack or execution context has been irrecoverably corrupted.
For example, even if rbp isn't used as the frame pointer, the return address is still on the stack somewhere, and you just need to know where. For a function that doesn't modify rsp (indirectly or directly) in the body of the function it would be at a simple fixed offset from rsp. For functions that modify rsp in the body of the function (i.e., that have a variable stack size), the offset from rsp might depend on the exact location in the function.
The PDB file simply contains this "side band" information which allows someone to determine the return address for any instruction in the function. Hans linked a relevant in-memory structure above - you can see that since it knows the size of the local variables and so on it can calculate the offset between rsp and the base of the frame, and hence get at the return address. It also knows how many instruction bytes are part of the "prolog" which is important because if the IP is still in that region, different rules apply (i.e., the stack hasn't been adjusted to reflect the locals in this function yet).
In 64-bit Windows, the exact function call ABI has been made a bit more concrete, and all functions generally have to provide unwind information: not in a .pdb but directly in a section included in the binary. So even without .pdb files you should be able to unwind a properly structured 64-bit Windows program. It allows any register to be used as the frame pointer, and still allows frame-pointer omission (with some restrictions). For details, start here.
1 If this weren't true, ask yourself how the currently running function could ever return? Now, technically you could design a program which clobbers or forgets the stack in a way that it cannot return, and either never exits or uses a method like exit() or abort() to terminate. This is highly unusual and not possibly outside of assembly.

What is RUST_BACKTRACE supposed to tell me?

My program is panicking so I followed its advice to run RUST_BACKTRACE=1 and I get this (just a little snippet).
1: 0x800c05b5 - std::sys::imp::backtrace::tracing::imp::write::hf33ae72d0baa11ed
at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:42
2: 0x800c22ed - std::panicking::default_hook::{{closure}}::h59672b733cc6a455
at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/panicking.rs:351
If the program panics it stops the whole program, so where can I figure out at which line it's panicking on?
Is this line telling me there is a problem at line 42 and line 351?
The whole backtrace is on this image, I felt it would be to messy to copy and paste it here.
I've never heard of a stack trace or a back trace. I'm compiling with warnings, but I don't know what debugging symbols are.
What is a stack trace?
If your program panics, you encountered a bug and would like to fix it; a stack trace wants to help you here. When the panic happens, you would like to know the cause of the panic (the function in which the panic was triggered). But the function directly triggering the panic is usually not enough to really see what's going on. Therefore we also print the function that called the previous function... and so on. We trace back all function calls leading to the panic up to main() which is (pretty much) the first function being called.
What are debug symbols?
When the compiler generates the machine code, it pretty much only needs to emit instructions for the CPU. The problem is that it's virtually impossible to quickly see from which Rust-function a set of instructions came. Therefore the compiler can insert additional information into the executable that is ignored by the CPU, but is used by debugging tools.
One important part are file locations: the compiler annotates which instruction came from which file at which line. This also means that we can later see where a specific function is defined. If we don't have debug symbols, we can't.
In your stack trace you can see a few file locations:
1: 0x800c05b5 - std::sys::imp::backtrace::tracing::imp::write::hf33ae72d0baa11ed
at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:42
The Rust standard library is shipped with debug symbols. As such, we can see where the function is defined (gcc_s.rs line 42).
If you compile in debug mode (rustc or cargo build), debug symbols are activated by default. If you, however, compile in release mode (rustc -O or cargo build --release), debug symbols are disabled by default as they increase the executable size and... usually aren't important for the end user. You can tweak whether or not you want debug symbols in your Cargo.toml in a specific profile section with the debug key.
What are all these strange functions?!
When you first look at a stack trace you might be confused by all the strange function names you're seeing. Don't worry, this is normal! You are interested in what part of your code triggered the panic, but the stack trace shows all functions somehow involved. In your example, you can ignore the first 9 entries: those are just functions handling the panic and generating the exact message you are seeing.
Entry 10 is still not your code, but might be interesting as well: the panic was triggered in the index() function of Vec<T> which is called when you use the [] operator. And finally, entry 11 shows a function you defined. But you might have noticed that this entry is missing a file location... the above section describes how to fix that.
What do to with a stack trace? (tl;dr)
Activate debug symbols if you haven't already (e.g. just compile in debug mode).
Ignore any functions from std and core at the top of the stack trace.
Look at the first function you defined, find the corresponding location in your file and fix the bug.
If you haven't already, change all camelCase function and method names to snake_case to stick to the community wide style guide.

GDB doesn't disassemble program running in RAM correctly

I have an application compiled using GCC for an STM32F407 ARM processor. The linker stores it in Flash, but is executed in RAM. A small bootstrap program copies the application from Flash to RAM and then branches to the application's ResetHandler.
memcpy(appRamStart, appFlashStart, appRamSize);
// run the application
__asm volatile (
"ldr r1, =_app_ram_start\n\t" // load a pointer to the application's vectors
"add r1, #4\n\t" // increment vector pointer to the second entry (ResetHandler pointer)
"ldr r2, [r1, #0x0]\n\t" // load the ResetHandler address via the vector pointer
// bit[0] must be 1 for THUMB instructions otherwise a bus error will occur.
"bx r2" // jump to the ResetHandler - does not return from here
);
This all works ok, except when I try to debug the application from RAM (using GDB from Eclipse) the disassembly is incorrect. The curious thing is the debugger gets the source code correct, and will accept and halt on breakpoints that I have set. I can single step the source code lines. However, when I single step the assembly instructions, they make no sense at all. It also contains numerous undefined instructions. I'm assuming it is some kind of alignment problem, but it all looks correct to me. Any suggestions?
It is possible that GDB relies on symbol table to check instruction set mode which can be Thumb(2)/ARM. When you move code to RAM it probably can't find this information and opts back to ARM mode.
You can use set arm force-mode thumb in gdb to force Thumb mode instruction.
As a side note, if you get illegal instruction when you debugging an ARM binary this is generally the problem if it is not complete nonsense like trying to disassembly data parts.
I personally find it strange that tools doesn't try a heuristic approach when disassembling ARM binaries. In case of auto it shouldn't be hard to try both modes and do an error count to decide which mode to use as a last resort.

Why does GCC use frame pointer when I call Win32 functions with arguments?

When I compile 32-bit C code with GCC and the -fomit-frame-pointer option, the frame pointer (ebp) is not used unless my function calls Windows API functions with stdcall and atleast one parameter.
For example, if I only use GetCommandLine() from the Windows API, which has no parameters/arguments, GCC will omit the frame pointer and use ebp for other things, speeding up the code and not having that useless prologue.
But the moment I call a stdcall Win32 function that accepts at least one argument, GCC completely ignores the -fomit-frame-pointer and uses the frame pointer anyway, and the code is worse in inspection as it can't use ebp for general purpose things. Not to mention I find the frame pointer quite pointless. I mean, I want to compile for release and distribution, why should I care about debugging? (if I want to debug I'll just use a debug build instead after reproducing the bug)
My stack most certainly does NOT contain dynamic allocation like alloca. So, the stack has a defined structure yet GCC chooses the dumb method despite my options? Is there something I'm missing to force it to not use frame pointer?
My second grip I have with it is that it refuses to use "push" instructions for Win32 functions. Every other compiler I tried, they used push instructions to push on the stack, resulting in much better more compact code, not to mention it is the most natural way to push arguments for stdcall. Yet GCC stubbornly uses "mov" instructions to move in each spot, manually, at offsets relative to esp because it needs to keep the stack pointer completely static. stdcall is made to be easy on the caller, and yet GCC completely misses the point of stdcall since it generates this crappy code when interfacing with it. What's worse, since the stack pointer is static, it still uses a frame pointer? Just why?
I tried -mpush-args, it doesn't do anything.
I also noticed that if I make my stack big enough for it to exceed a page (4096 bytes), GCC will add a prologue with a function that does nothing but "bitwise or" the stack every 4096 bytes with zero (which does nothing). I assume it's for touching the stack and automatically commiting memory with page faults if the stack was reserved? Unfortunately, it does this even if I set the initial commit of the stack (not reserve) to high enough to hold my stack, not to mention this shouldn't even be needed in the first place. Redundant code at its best.
Are these bugs in GCC? Or something I'm missing in options? Should I use something else? Please tell me if I'm missing some options.
I seriously hope I won't have to make an inline asm macro just to call stdcall functions and use push instructions (and this will avoid frame pointer too I guess). That sounds really overkill for something so basic that should be in compilers of today. And yes I use GCC 4.8.1 so not an ancient version.
As extra question, is it possible to force GCC to not save registers on the stack at function prologue? I use my own direct entry point with -nostartfiles argument, because it is a pure Windows application and it works just fine without standard lib startup. If I use attribute((noreturn)), it will discard the epilogue restoring the registers but it will still push them on the stack at prologue, I don't know if there's a way to force it to not save registers for this entry point function. Either way not a big deal in the least, it would just feel more complete I guess. Thanks!
See the answer Force GCC to push arguments on the stack before calling function (using PUSH instruction)
I.e. try -mpush-args -mno-accumulate-outgoing-args. It may also require -mno-stack-arg-probe if gcc complains.
It looks like supplying the -mpush-args -mno-accumulate-outgoing-args -mno-stack-arg-probe works, specifically the last one. Now the code is cleaner and more normal like other compilers, and it uses PUSH for arguments, even makes it easier to track in OllyDbg this way.
Unfortunately, this FORCES the stupid frame pointer to be used, even in small functions that absolutely do not need it at all. Seriously is there a way to absolutely force GCC to disable the frame pointer?!

gdb: how to print the current line or find the current line number?

list commands prints a set of lines, but I need one single line, where I am and where an error has probably occurred.
The 'frame' command will give you what you are looking for. (This can be abbreviated just 'f'). Here is an example:
(gdb) frame
\#0 zmq::xsub_t::xrecv (this=0x617180, msg_=0x7ffff00008e0) at xsub.cpp:139
139 int rc = fq.recv (msg_);
(gdb)
Without an argument, 'frame' just tells you where you are at (with an argument it changes the frame). More information on the frame command can be found here.
Command where or frame can be used. where command will give more info with the function name
I do get the same information while debugging. Though not while I am checking the stacktrace. Most probably you would have used the optimization flag I think. Check this link - something related.
Try compiling with -g3 remove any optimization flag.
Then it might work.
HTH!
Keep in mind that gdb is a powerful command -capable of low level instructions- so is tied to assembly concepts.
What you are looking for is called de instruction pointer, i.e:
The instruction pointer register points to the memory address which the processor will next attempt to execute. The instruction pointer is called ip in 16-bit mode, eip in 32-bit mode,and rip in 64-bit mode.
more detail here
all registers available on gdb execution can be shown with:
(gdb) info registers
with it you can find which mode your program is running (looking which of these registers exist)
then (here using most common register rip nowadays, replace with eip or very rarely ip if needed):
(gdb)info line *$rip
will show you line number and file source
(gdb) list *$rip
will show you that line with a few before and after
but probably
(gdb) frame
should be enough in many cases.
All the answers above are correct, What I prefer is to use tui mode (ctrl+X A or 'tui enable') which shows your location and the function in a separate window which is very helpful for the users.
Hope that helps too.

Resources