My distribution (Debian) ships debug files in separate packages. So what happens often is that I run a program in gdb until it crashes, in order to obtain a usable backtrace for a bug report. But bt is rather useless, missing the symbol information – because I did not install the corresponding -dbg package.
If I install the package now, is there a way to make gdb search for the symbol files again, without losing my current backtrace?
There is a trick you can use to make gdb try to read symbol files again:
(gdb) nosharedlibrary
(gdb) sharedlibrary
The first command tells it to forget all the symbol information it has, and the second command tells it to re-read it.
I am going to suggest an alternative approach with gdb gcore command, possibly it is suitable for you.
This is gcore description:
(gdb) help gcore
Save a core file with the current state of the debugged process.
Argument is optional filename. Default filename is 'core.<process_id>'
So I have a program that causes a crash:
include <iostream>
int f()
{
time_t curr_ts = time(0);
std::cout << "Before crash " << curr_ts << std::endl;
int * ptr = 0;
*ptr = *ptr +1 ;
std::cout << "After crash " << curr_ts << std::endl;
return *ptr;
}
int main()
{
std::cout << "Before f() " << std::endl;
f();
std::cout << "After f() " << std::endl;
return 0;
}
I compiled it with debug info. However I put the executable with debug info in an archive and for tests use a stripped version.
So it crashes under gdb:
$ gdb ./a.out
Reading symbols from ./a.out...(no debugging symbols found)...done.
(gdb) r
Starting program: /home/crash/a.out
Before f()
Before crash 1435322344
Program received signal SIGSEGV, Segmentation fault.
0x000000000040097d in ?? ()
(gdb) bt
#0 0x000000000040097d in ?? ()
#1 0x00000000004009e0 in ?? ()
#2 0x000000314981ed1d in __libc_start_main () from /lib64/libc.so.6
#3 0x00000000004007f9 in ?? ()
#4 0x00007fffffffde58 in ?? ()
#5 0x000000000000001c in ?? ()
#6 0x0000000000000001 in ?? ()
#7 0x00007fffffffe1a9 in ?? ()
#8 0x0000000000000000 in ?? ()
(gdb) gcore crash2.core
Saved corefile crash2.core
I simply generate core file with gcore and leave gdb. Then I get from the archive the version with debug symbols and I can see all symbols:
$ gdb ./a.out ./crash2.core
Reading symbols from ./a.out...done.
warning: exec file is newer than core file.
[New LWP 15215]
Core was generated by `/home/crash/a.out'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x000000000040097d in f () at main.cpp:8
8 *ptr = *ptr +1 ;
(gdb) bt
#0 0x000000000040097d in f () at main.cpp:8
#1 0x00000000004009e0 in main () at main.cpp:17
(gdb) info locals
curr_ts = 1435322344
ptr = 0x0
Update
if you set backtrace past-main on you will see at least this __libc_start_main. What is above __libc_start_main is not printed if you analyze only core file (possibly even not saved there) saved wit gcore:
$ gdb ./a.out crash2.core
Reading symbols from ./a.out...done.
warning: exec file is newer than core file.
[New LWP 15215]
Core was generated by `/home/crash/a.out'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x000000000040097d in f () at main.cpp:8
8 *ptr = *ptr +1 ;
(gdb) set backtrace past-main on
(gdb) bt
#0 0x000000000040097d in f () at main.cpp:8
#1 0x00000000004009e0 in main () at main.cpp:17
#2 0x000000314981ed1d in __libc_start_main () from /lib64/libc.so.6
Backtrace stopped: Cannot access memory at address 0x4007d0
(gdb)
But if I reproduce the crash under gdb with my test program (with debug info in it) I can see all (see set backtrace past-main on && set backtrace past-entry on):
$ gdb ./a.out
Reading symbols from ./a.out...done.
(gdb) r
Starting program: /home/crash/a.out
Before f()
Before crash 1435328858
Program received signal SIGSEGV, Segmentation fault.
0x000000000040097d in f () at main.cpp:8
8 *ptr = *ptr +1 ;
(gdb) bt
#0 0x000000000040097d in f () at main.cpp:8
#1 0x00000000004009e0 in main () at main.cpp:17
(gdb) set backtrace past-main on
(gdb) set backtrace past-entry on
(gdb) bt
#0 0x000000000040097d in f () at main.cpp:8
#1 0x00000000004009e0 in main () at main.cpp:17
#2 0x000000314981ed1d in __libc_start_main () from /lib64/libc.so.6
#3 0x00000000004007f9 in _start ()
#4 0x00007fffffffde58 in ?? ()
#5 0x000000000000001c in ?? ()
#6 0x0000000000000001 in ?? ()
#7 0x00007fffffffe1a9 in ?? ()
#8 0x0000000000000000 in ?? ()
(gdb)
Related
Per https://www.gnu.org/software/libc/manual/html_node/Backtraces.html:
Note that certain compiler optimizations may interfere with obtaining a valid backtrace. Function inlining causes the inlined function to not have a stack frame; tail call optimization replaces one stack frame with another; frame pointer elimination will stop backtrace from interpreting the stack contents correctly.
I want to know why it should happen when enabling tail optimization and how to avoid it.
I want to know why it should happen when enabling tail optimization
It will happen with tail call optimization because your recursive function will not be recursive after the compiler is done with it, or a tail recursive function will be entirely missing from the call stack.
Example:
// t.c
int foo() { return 42; }
int bar() { return foo(); }
int main() { return bar(); }
gcc -O2 -fno-inline t.c && gdb -q ./a.out
(gdb) b foo
Breakpoint 1 at 0x1140
(gdb) run
Starting program: /tmp/a.out
Breakpoint 1, 0x0000555555555140 in foo ()
(gdb) bt
#0 0x0000555555555140 in foo ()
#1 0x00007ffff7dfad0a in __libc_start_main (main=0x555555555040 <main>, argc=1, argv=0x7fffffffdc08, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffdbf8) at ../csu/libc-start.c:308
#2 0x000055555555507a in _start ()
Where did main() and bar() go? We know that main() must still be on the stack.
They were tail call optimized:
(gdb) disas main
Dump of assembler code for function main:
0x0000555555555040 <+0>: xor %eax,%eax
0x0000555555555042 <+2>: jmpq 0x555555555150 <bar>
End of assembler dump.
(gdb) disas bar
Dump of assembler code for function bar:
0x0000555555555150 <+0>: xor %eax,%eax
0x0000555555555152 <+2>: jmp 0x555555555140 <foo>
End of assembler dump.
how to avoid it.
The only way to avoid it is to disable tail call optimization (with -fno-optimize-sibling-calls):
gcc -O2 -fno-inline t.c -fno-optimize-sibling-calls && gdb -q ./a.out
(gdb) b foo
Breakpoint 1 at 0x1140
(gdb) run
Starting program: /tmp/a.out
Breakpoint 1, 0x0000555555555140 in foo ()
(gdb) bt
#0 0x0000555555555140 in foo ()
#1 0x0000555555555157 in bar ()
#2 0x0000555555555047 in main ()
Consider following (broken) code:
#include <iostream>
#include <memory>
using namespace std;
class Test {
public:
unique_ptr<string> s;
Test() : s(NULL) {
}
void update(string& st) {
s = unique_ptr<string>(&(st));
}
};
void update(Test& t) {
string s("Hello to you");
t.update(s);
}
int main() {
Test t;
update(t);
cout << *t.s << endl;
}
Here we have error in method Test::update() we do not make a uniq copy of an object. So when the program is run under macOS, you'll get:
$ ./test
Hello t��E]�
test(44981,0x7fff99ba93c0) malloc: *** error for object 0x7fff5d45b690: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
[1] 44981 abort ./test
I've been able to to debug this case successfully using lldb. Even without setting a breakpoint in malloc_error_break, just running application until it gets caught in SIGABRT handler.
lldb ./test
(lldb) target create "./test"
Current executable set to './test' (x86_64).
(lldb) run
Process 44993 launched: './test' (x86_64)
Hello t��_�
test(44993,0x7fff99ba93c0) malloc: *** error for object 0x7fff5fbff680: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
Process 44993 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
frame #0: 0x00007fff90d6cd42 libsystem_kernel.dylib`__pthread_kill + 10
libsystem_kernel.dylib`__pthread_kill:
-> 0x7fff90d6cd42 <+10>: jae 0x7fff90d6cd4c ; <+20>
0x7fff90d6cd44 <+12>: movq %rax, %rdi
0x7fff90d6cd47 <+15>: jmp 0x7fff90d65caf ; cerror_nocancel
0x7fff90d6cd4c <+20>: retq
Target 0: (test) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
* frame #0: 0x00007fff90d6cd42 libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x00007fff90e5a457 libsystem_pthread.dylib`pthread_kill + 90
frame #2: 0x00007fff90cd2420 libsystem_c.dylib`abort + 129
frame #3: 0x00007fff90dc1fe7 libsystem_malloc.dylib`free + 530
frame #4: 0x0000000100001f7b test`Test::~Test() [inlined] std::__1::default_delete<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >::operator(this=0x00007fff5fbff730, __ptr="\a\x94\x99�\x7f\0\0��_�\x7f\0\0\x80�_�\x7f\0\00�_�\x7f\0\00�_�\x7f\0\00�_�\x7f\0\00�_�\x7f\0\00�_�\x7f\0\0��_�\x7f\0\0\x15\x1e\0\0\x01\0\0\0\x80�_�\x7f\0\n0")(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*) const at memory:2397
frame #5: 0x0000000100001f46 test`Test::~Test() [inlined] std::__1::unique_ptr<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::default_delete<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >::reset(this=0x00007fff5fbff730, __p="") at memory:2603
frame #6: 0x0000000100001ef3 test`Test::~Test() [inlined] std::__1::unique_ptr<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::default_delete<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >::~unique_ptr(this=0x00007fff5fbff730) at memory:2571
frame #7: 0x0000000100001ef3 test`Test::~Test() [inlined] std::__1::unique_ptr<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::default_delete<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >::~unique_ptr(this=0x00007fff5fbff730) at memory:2571
frame #8: 0x0000000100001ef3 test`Test::~Test(this=0x00007fff5fbff730) at main.cpp:6
frame #9: 0x0000000100001e15 test`Test::~Test(this=0x00007fff5fbff730) at main.cpp:6
frame #10: 0x0000000100001ab6 test`main at main.cpp:28
frame #11: 0x00007fff90c3e235 libdyld.dylib`start + 1
Now I see that the problem is in Test destructor, and from here it's a piece of cake.
Unfortunately, trying to debug this case using gdb under macOS was a total failure. Here is what I've done:
$ gdb ./test
GNU gdb (GDB) 8.0.1
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-apple-darwin16.7.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from test...Reading symbols from /Users/bazhenov/Developer/linear-counter/tests/test/test.dSYM/Contents/Resources/DWARF/test...done.
done.
(gdb) run
Starting program: /Users/bazhenov/Developer/linear-counter/tests/test/test
[New Thread 0x1403 of process 45204]
warning: unhandled dyld version (15)
Hello tQ�_�
test(45204,0x7fff99ba93c0) malloc: *** error for object 0x7fff5fbff650: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
Thread 2 received signal SIGABRT, Aborted.
0x00007fff90d6cd42 in ?? ()
(gdb) bt
#0 0x00007fff90d6cd42 in ?? ()
#1 0x00007fff90e5a457 in ?? ()
#2 0x00007fff5fbff590 in ?? ()
#3 0x0000030700000000 in ?? ()
#4 0x00007fff5fbff590 in ?? ()
#5 0x00007fff5fbff650 in ?? ()
#6 0x00007fff5fbff5a0 in ?? ()
#7 0x00007fff90cd2420 in ?? ()
#8 0xffffffff00000018 in ?? ()
#9 0x00007fff5fbff5b0 in ?? ()
#10 0x00007fffffffffdf in ?? ()
#11 0x00000001000c4000 in ?? ()
#12 0x00007fff5fbff5f0 in ?? ()
#13 0x00007fff90dc1fe7 in ?? ()
#14 0x378b45e65b700074 in ?? ()
#15 0x00007fff99ba00ac in ?? ()
#16 0x0000000000000000 in ?? ()
(gdb)
The question is: why gdb fails to unwind the stack correctly and what options do I have if I need to get correct backtrace using gdb?
why gdb fails to unwind the stack correctly
There are some problems on Mac OS X Sierra with gdb, see this post and gdb bug report.
what options do I have if I need to get correct backtrace using gdb
You can try to downgrade Mac OS (don't know whether is it possible) or try to apply temporary hack patch from above bug report.
I saw this nifty one liner for gdb to dump out backtrace for all threads after a core dump. So I tried a quick:
int main() {int* x = new int[5]; for(int i = 0; true; ++i) x[i] = i; }
to get a core dump and then ran this:
gdb --batch --quiet -ex "thread apply all bt full" -ex "quit" a.out core.box-name.a.out.27459.8515.11
And I get the output:
[New LWP 27459]
warning: Can't read pathname for load map: Input/output error.
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fff9e503000
Core was generated by `./a.out'.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004005ca in main () at <stdin>:6
6 <stdin>: No such file or directory.
Thread 1 (LWP 27459):
#0 0x00000000004005ca in main () at <stdin>:6
i = 33788
x = 0x1a460010
I see a backtrace which is nice, but am wondering what the two warnings I also see are about?
Seems from this link, it was a bug in gdb and has been fixed in the recent releases.
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=248898
got this coredump when the application started.
Core was generated by `/opt/SURGE/bin/SIM.run 0 0 1'.
Program terminated with signal 7, Bus error.
#0 0xf79d7ddb in __gxx_personality_v0 () from /opt/SURGE/lib/libTsdThreadedInput_ix86-linux-sles9-mt.so
(gdb) bt
#0 0xf79d7ddb in __gxx_personality_v0 () from /opt/SURGE/lib/libTsdThreadedInput_ix86-linux-sles9-mt.so
#1 0x32709808 in ?? ()
#2 0xeecb6414 in ?? ()
#3 0xeecb6418 in ?? ()
#4 0x00000000 in ?? ()
(gdb)
what does it mean?
and where does the problem happen?
looks like it is trying to access memory location at the address 0x00000000. but it is not saying which routine does that.
Is it possible to inspect the return value of a function in gdb assuming the return value is not assigned to a variable?
I imagine there are better ways to do it, but the finish command executes until the current stack frame is popped off and prints the return value -- given the program
int fun() {
return 42;
}
int main( int argc, char *v[] ) {
fun();
return 0;
}
You can debug it as such --
(gdb) r
Starting program: /usr/home/hark/a.out
Breakpoint 1, fun () at test.c:2
2 return 42;
(gdb) finish
Run till exit from #0 fun () at test.c:2
main () at test.c:7
7 return 0;
Value returned is $1 = 42
(gdb)
The finish command can be abbreviated as fin. Do NOT use the f, which is abbreviation of frame command!
Yes, just examine the EAX register by typing print $eax. For most functions, the return value is stored in that register, even if it's not used.
The exceptions to this are functions returning types larger than 32 bits, specifically 64-bit integers (long long), doubles, and structs or classes.
The other exception is if you're not running on an Intel architecture. In that case, you'll have to figure out which register is used, if any.
Here's how todo this with no symbols.
gdb ls
This GDB was configured as "ppc64-yellowdog-linux-gnu"...
(no debugging symbols found)
Using host libthread_db library "/lib64/libthread_db.so.1".
(gdb) break __libc_start_main
Breakpoint 1 at 0x10013cb0
(gdb) r
Starting program: /bin/ls
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
Breakpoint 1 at 0xfdfed3c
(no debugging symbols found)
[Thread debugging using libthread_db enabled]
[New Thread 4160418656 (LWP 10650)]
(no debugging symbols found)
(no debugging symbols found)
[Switching to Thread 4160418656 (LWP 10650)]
Breakpoint 1, 0x0fdfed3c in __libc_start_main () from /lib/libc.so.6
(gdb) info frame
Stack level 0, frame at 0xffd719a0:
pc = 0xfdfed3c in __libc_start_main; saved pc 0x0
called by frame at 0x0
Arglist at 0xffd71970, args:
Locals at 0xffd71970, Previous frame's sp is 0xffd719a0
Saved registers:
r24 at 0xffd71980, r25 at 0xffd71984, r26 at 0xffd71988, r27 at 0xffd7198c,
r28 at 0xffd71990, r29 at 0xffd71994, r30 at 0xffd71998, r31 at 0xffd7199c,
pc at 0xffd719a4, lr at 0xffd719a4
(gdb) frame 0
#0 0x0fdfed3c in __libc_start_main () from /lib/libc.so.6
(gdb) info fr
Stack level 0, frame at 0xffd719a0:
pc = 0xfdfed3c in __libc_start_main; saved pc 0x0
called by frame at 0x0
Arglist at 0xffd71970, args:
Locals at 0xffd71970, Previous frame's sp is 0xffd719a0
Saved registers:
r24 at 0xffd71980, r25 at 0xffd71984, r26 at 0xffd71988, r27 at 0xffd7198c,
r28 at 0xffd71990, r29 at 0xffd71994, r30 at 0xffd71998, r31 at 0xffd7199c,
pc at 0xffd719a4, lr at 0xffd719a4
Formatting kinda messed up there, note the use of "info frame" to inspect frames, and "frame #" to navigate your context to another context (up and down the stack)
bt also show's an abbreviated stack to help out.