Running a program with segmentation fault works well with gdb - debugging

I implemented the program that uses mmap() system call, but Segmentation Fault occurs during process runtime.
So, I ran this program with gdb, but when I did it, it worked well without segment fault.
I wonder if it is possible that running with gdb can affect segment fault.
Could you tell me about it?

if it is possible that running with gdb can affect segment fault.
One possibility: GDB disables address randomization (so as to make reproducing the bug easier). You can re-enable it with:
(gdb) set disable-randomization off
GDB may also affect timing of threads, but you didn't mention threads, so that's less likely.

You are probably invoking Undefined Behavior somewhere in your code that is breaking C or C++ rules. Try to run the program under Valgrind. It should give you more information if this is the case.

Related

Debugging a Fortran program using Emacs (on MacOS)

I am trying to debug a fortran program within Emacs using GDB. My compiler is intel fortran 2017.4. The problem is with a particular subroutine which inverts a matrix. The program runs without a problem when the size of the matrix is "small" i.e. 100x100. When I increase the size of the matrix to, for example 600x600, I get the following message: "Thread 3 received signal SIGSEGV, Segmentation fault."
Now, if I try to debug the program launching GDB from the terminal, everything works fine. I strongly prefer to debug the program from within Emacs since it would save me a lot of time. Any ideas about how can I fix this issue?
I already try to increase the stack size to the max (which is 65532 for MAC) and all the arrays are allocated on the heap.
Thanks for your help,
Now, if I try to debug the program launching GDB from the terminal, everything works fine.
It appears that your program does not crash when run from GDB (whether GDB is invoked from within Emacs or from a terminal), in which case your references to Emacs are superfluous.
Some of the reasons why a program may not crash under GDB are listed here.

Can't go through libhdfs.so in gdb env

my program is using libhdfs.so for hdfs read/write, I want to set a break point for debugging, but when this program runs to the point of hdfsConnect, it exits with a segmentation fault.
interesting thing is that when I run the program normally, segmentation fault does't happen at all.
what is likely the root cause? is there some runtime environment I should setup when debugging libhdfs.so?
it turns out to be a JNI problem anther than a libhdfs.so specific problem, the solution can be found here:
Strange sigsegv while calling java code from c++ through jni
what is likely the root cause?
The likely root cause is a bug in your program, which manifests itself as a crash under GDB, but remains hidden when run outside of GDB.
This makes the problem easier to debug: the opposite (crashes outside of GDB, works under GDB) is often harder.
Your first step should be to run the program under Valgrind and make sure it's clean.

slow down gdb to reproduce bug

I have a timing bug in my application which only occurs when I use valgrind, because valgrind slows down the process so much.
(it's actually a boost::weak_ptr-exception that I cannot localize)
Now I wonder how to reproduce the bug with gdb. I don't see a way to combine gdb + valgrind.
Thanks.
You can start gdb when an error is detected by valgrind (--db-attach=yes). Even if the exception doesn't trigger a memory error at the moment, it's easy to fake a bad memory access in that path.

In GDB, how to find out who malloc'ed an address on the heap?

I have a pointer in GDB, how can I find out where it was first allocated on the heap?
In WinDBG, this can be done by !heap -p -a <0x12345678> after turning on gflags /i <*exe> +ust
Since Valgrind can tell me where the memory is allocated (when it detects some leaks), I guess this is also possible?
(This is NOT about watchpoint. This is given the situation where I randomly break into the In GDB, application, look at a pointer and want to know "who created this piece of memory"?)
Using reverse debugging in GDB is a very novel way and probably the correct way to solve this problem. I encountered some problem with that approach with GDB 7.1 -- the latest stable version. Reverse debugging is a rather new feature in GDB so I needed to check out HEAD (7.2) to fix it.
It probably says something about the matureness of the GDB approach but I think it should definitely be used when it's more mature. (Awesome feature!)
Maybe reverse debugging will help here. Try to set watchpoint on memory address and reverse-continue until memory written.
(gdb) watch *0x12345678
(gdb) reverse-continue
Valgrind hijacks memory management calls, that's how heap checkers work. There's no facility in GDB itself to tell you where given address was returned by malloc(3). I suggest looking into mtrace and glibc allocation debugging.
record DOES run on a Hello World program.
Heck I use record to debug gdb itself!

I need to find the point in my userland code that crash my kernel

I have big system that make my system crash hard. When I boot up, I don't even have
a coredump. If I log every line that
get executed until my system goes down. I will find that evil code.
Can I log every source code line in GDB to a file?
UPDATE:
ok, I found the bug. It was nasty. The application I started did not
take the system down. After learning about coredump inspection with mdb, and some gdb stepping I found out that the systemcall causing the dump, was not implemented. Updating the system to latest kernel will fix my problem. Thanks to all of you.
MY LESSON:
make sure you know what process causes the coredump. It's not always the one you started.
Sounds like a tricky little problem.
I often try to eliminate as many possible suspects as I can by commenting out large chunks of code, configuring the system to not run certain pieces (if it allows you to do that) etc. This amounts to doing an ad-hoc binary search on the problem, and is a surprisingly effective way of zooming in on offending code relatively quickly.
A potential problem with logging is that the log might not hit the disk before the system locks up - if you don't get a core dump, you might not get the log.
Speaking of core dumps, make sure you don't have a limit on your core dump size (man ulimit.)
You could try to obtain a list of all the functions in your code using objdump, process it a little bit and create a bunch of GDB trace statements on those functions - basically creating a GDB script automatically. If that turns out to be overkill, then a binary search on the code using tracepoints can also help you zoom in on the problem.
And don't panic. You're smarter than the bug - you'll find it.
You can not reasonably track every line of your source using GDB (too slow). Besides, a system crash is most likely a result of a system call, and libc is probably doing the system call on your behalf. Even if you find the line of the application that caused OS crash, you still don't really know anything.
You should start by clarifying which OS is crashing. For Linux, you can try the following approaches:
strace -fo trace.out /path/to/app
After reboot, trace.out will contain syscalls the application was doing just before the crash. If you are lucky, you'll see the last syscall-of-death, but I wouldn't count on it.
Alternatively, try to reproduce the crash on the user-mode Linux, or on kernel with KGDB compiled in.
These will tell you where the problem in the kernel is. Finding the matching system call in your application will likely be trivial.
Please clarify your problem: What part of the system is crashing?
Is it an application?
If so, which application? Is this an application which you have written yourself? Is this an application you have obtained from elsewhere? Can you obtain a clean interrupt if you use a debugger? Can you obtain a backtrace showing which functions are calling the section of code which crashes?
Is it a new hardware driver?
Is it based on an older driver? If so, what has changed? Is it based on a manufacturer's data sheet? Is that data sheet the latest and most correct?
Is it somewhere in the kernel? Which kernel?
What is the OS? I assume it is linux, seeing that you are using the GNU debugger. But of course, that is not necessarily so.
You say you have no coredump. Have you enabled coredumps on your machine? Most systems these days do not have coredumps enabled by default.
Regarding logging GDB output, you may have some success, but it depends where the problem is whether or not you will have the right output logged before the system crashes. There is plenty of delay in writing to disk. You may not catch it in time.
I'm not familiar with the gdb way of doing this, but with windbg the way to go is to have a debugger attached to the kernel and control the debugger remotely over a serial cable (or firewire) from a second debugger. I'm pretty sure gdb has similar capabilities, I could quickly find some hints here: http://www.digipedia.pl/man/gdb.4.html

Resources