I'm debugging a tree of processes using gdb's very handy multiple-inferior support:
(gdb) set detach-on-fork off
(gdb) set schedule-multiple on
(gdb) set follow-fork-mode parent
(gdb) break PostgresMain
(gdb) break PostmasterMain
and now need to let things run until I hit one of the future breakpoints in some yet to be spawned inferior.
However, gdb seems to be "helpfully" pausing whenever an inferior exits normally, or at least blocking cleanup of the inferior so that its parent's wait() can return:
(gdb) c
[New process 16505]
process 16505 is executing new program: /home/craig/pg/bdr/bin/pg_config
Reading symbols from /home/craig/pg/bdr/bin/pg_config...done.
[Inferior 2 (process 16505) exited normally]
(gdb) info inferior
Num Description Executable
* 2 <null> /home/craig/pg/bdr/bin/pg_config
1 process 16501 /usr/bin/make
(gdb) inferior 1
[Switching to inferior 1 [process 16501] (/usr/bin/make)]
[Switching to thread 1 (process 16501)]
#0 0x0000003bc68bc502 in __libc_wait (stat_loc=0x7fffffffbc78) at ../sysdeps/unix/sysv/linux/wait.c:30
30 return INLINE_SYSCALL (wait4, 4, WAIT_ANY, stat_loc, 0,
(gdb)
so I have to endlessly:
(gdb) inferior 1
(gdb) c
to carry on. About 70 times, before I hit the desired breakpoint in a child of a child of a child.
I think what's happening is that gdb treats process exit as a stop event, and since non-stop is set to off (the default) it stops all threads in all inferiors when one thread stops. However, this inferior has terminated, it isn't a normal stop event, so you can't just cont it, you have to switch to another process first.
Is there some way to stop gdb pausing at each inferior exit? I would've expected follow-fork-mode parent with schedule-multiple on to do the trick, but gdb seems to still want to stop when an inferior exits.
I guess I'm looking for something like a "skip proc-exit", or a virtual signal I can change the handler policy on so it doesn't stop.
set non-stop on seems like it should be the right answer, but I suspect it's broken for multiple inferiors.
If I use non-stop on, then after the first exit trap, gdb's internal state indicates that inferior 1 is running:
(gdb) info inferior
Num Description Executable
* 1 process 20540 /usr/bin/make
(gdb) info thread
Id Target Id Frame
* 1 process 20540 "make" (running)
(gdb) cont
Continuing.
Cannot execute this command while the selected thread is running.
but the kernel sees it as blocked on ptrace_stop:
$ ps -o "cmd,wchan" -p 20540
CMD WCHAN
/usr/bin/make check ptrace_stop
... and it makes no progress until gdb is detached, or it's killed. Signals to the process are ignored, and interrupt in gdb has no effect.
I'm using GNU gdb (GDB) Fedora 7.7.1-18.fc20 on x86_64.
After stumbling on a post that references it in passing I found that the missing magic is set target-async on alongside set non-stop on.
non-stop mode, as expected, means gdb won't stop everything whenever an inferior exits. target-async is required to make it actually work correctly on gdb 7.7; it's the default on 7.8.
So the full incantation is:
set detach-on-fork off
set schedule-multiple on
set follow-fork-mode parent
set non-stop on
set target-async on
For 7.8, remove target-async on and, to reduce noise, add set print symbol-loading off.
The following Python extension to gdb will switch back to the first inferior and resume execution after each stop.
It feels like a total hack, but it works. When a process exits it sets a flag indicating that it stopped on an exit, then switches to the original process. gdb will then stop execution, delivering a stop event. We check to see if the stop was caused by our stop event and if so, we immediately continue.
The code also sets up the breakpoints I'm using and the multi-process settings, so I can just source thescript.py and run .
gdb.execute("set python print-stack full")
gdb.execute("set detach-on-fork off")
gdb.execute("set schedule-multiple on")
gdb.execute("set follow-fork-mode parent")
gdb.execute("set breakpoint pending on")
gdb.execute("break PostgresMain")
gdb.execute("break PostmasterMain")
gdb.execute("set breakpoint pending off")
def do_continue():
gdb.execute("continue")
def exit_handler(event):
global my_stop_request
has_threads = [ inferior.num for inferior in gdb.inferiors() if inferior.threads() ]
if has_threads:
has_threads.sort()
gdb.execute("inferior %d" % has_threads[0])
my_stop_request = True
gdb.events.exited.connect(exit_handler)
def stop_handler(event):
global my_stop_request
if isinstance(event, gdb.SignalEvent):
pass
elif isinstance(event, gdb.BreakpointEvent):
pass
elif my_stop_request:
my_stop_request = False
gdb.post_event(do_continue)
gdb.events.stop.connect(stop_handler)
There must be an easier way than this. It's ugly.
Related
Gdb, like any other program, isn't perfect, and every now and then I encounter bugs that render the current Gdb instance unusable. At this point, if I have a debugging session with a lot of valuable state in the inferior, I'd like to be able to just start a new Gdb session on it. That is, detach, quit Gdb and start a new Gdb instance to restart where I left off.
However, when detaching Gdb, it resumes the inferior so that it continues running where it was, which ruins the point of the whole exercise. Therefore, I'm wondering if it's possible to detach in such a state that the inferior is as if it had been sent a SIGSTOP, basically.
I've tried simply killing Gdb, but interestingly, that seems to take the inferior with it. Not sure how that works.
when detaching Gdb, it resumes the inferior
GDB doesn't, the kernel does (assuming Linux).
I've tried simply killing Gdb, but interestingly, that seems to take the inferior with it
The kernel sends it SIGHUP, which normally kills the inferior. You can prevent that with either SIG_IGN in the inferior, or simply (gdb) call signal(1, 1).
After that, you can detach and quit GDB, but the kernel will resume the inferior with SIGCONT (see Update below), so you are back to square one.
However, there is a solution. Consider the following program:
int main()
{
while (1) {
printf("."); fflush(0); sleep(1);
}
}
gdb -q ./a.out
(gdb) run
Starting program: /tmp/a.out
.....^C
Program received signal SIGINT, Interrupt.
0x00007ffff7ad5de0 in __nanosleep_nocancel () at ../sysdeps/unix/syscall-template.S:81
81 ../sysdeps/unix/syscall-template.S: No such file or directory.
We want the program to not run away on detach, so we send it SIGSTOP:
(gdb) signal SIGSTOP
Continuing with signal SIGSTOP.
Program received signal SIGSTOP, Stopped (signal).
0x00007ffff7ad5de0 in __nanosleep_nocancel () at ../sysdeps/unix/syscall-template.S:81
81 in ../sysdeps/unix/syscall-template.S
(gdb) detach
Detaching from program: /tmp/a.out, process 25382
Note that at this point, gdb is detached (but still alive), and the program is not running (stopped).
Now in a different terminal:
gdb -q -ex 'set prompt (gdb2) ' -p 25382
0x00007ffff7ad5de0 in __nanosleep_nocancel () at ../sysdeps/unix/syscall-template.S:81
81 ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb2) c
Continuing.
Program received signal SIGSTOP, Stopped (signal).
0x00007ffff7ad5de0 in __nanosleep_nocancel () at ../sysdeps/unix/syscall-template.S:81
81 in ../sysdeps/unix/syscall-template.S
(gdb2) sig 0
Continuing with no signal.
The program continues running, printing dots in the first terminal.
Update:
SIGHUP -- Interesting. By what mechanism, though?
Good question. I didn't know, but this appears to be the answer:
From setpgid man page:
If the exit of the process causes a process group to become orphaned,
and if any member of the newly orphaned process group is stopped,
then a SIGHUP signal followed by a SIGCONT signal will be sent to
each process in the newly orphaned process group.
I have verified that if I detach and quit GDB without stopping the inferior, it doesn't get SIGHUP and continues running without dying.
If I do send it SIGSTOP and arrange for SIGHUP to be ignored, then I see both SIGHUP and SIGCONT being sent in strace, so that matches the man page exactly:
(gdb) detach
Detaching from program: /tmp/a.out, process 41699
In another window: strace -p 41699. Back to GDB:
(gdb) quit
strace output:
--- stopped by SIGSTOP ---
--- SIGHUP {si_signo=SIGHUP, si_code=SI_KERNEL} ---
--- SIGCONT {si_signo=SIGCONT, si_code=SI_KERNEL} ---
restart_syscall(<... resuming interrupted call ...>) = 0
write(1, ".", 1.) = 1
...
Is there a way to exit from gdb connnection without stopping / exiting running program ? I need that running program continues after gdb connection closed.
Is there a way to exit from gdb connnection without stopping / exiting running program ?
(gdb) help detach
Detach a process or file previously attached.
If a process, it is no longer traced, and it continues its execution. If
you were debugging a file, the file is closed and gdb no longer accesses it.
List of detach subcommands:
detach checkpoint -- Detach from a checkpoint (experimental)
detach inferiors -- Detach from inferior ID (or list of IDS)
Type "help detach" followed by detach subcommand name for full documentation.
Type "apropos word" to search for commands related to "word".
Command name abbreviations are allowed if unambiguous.
Since the accepted (only other) answer does not specifically address how to shut down gdb without stopping the program under test, I'm throwing my hat into the ring.
Option 1
Kill the server from the terminal in which it's running by holding Ctrl+c.
Option 2
Kill the gdb server and/or client from another terminal session.
$ ps -u username | grep gdb
667511 pts/6 00:00:00 gdbserver
667587 pts/7 00:00:00 gdbclient
$ kill 667587
$ kill 667511
These options are for a Linux environment. A similar approach (killing the process) would probably also work in Windows.
A legacy program most likely gets into an infinite loop on certain pathological inputs. I have >1000 such instances, however, I suspect that the vast majority of them trigger the same bug. Therefore, I would like to reduce the >1000 instances to the fundamentally different ones. The first step is to pause the application after, say, 10 seconds and collect the backtrace.
If I run:
gdb --batch --command=backtrace.txt --args ./legacy_program
with backtrace.txt
run
bt
and I hit Ctrl + C after 10 seconds in the same terminal I get exactly the backtrace I want.
Now, I would like to do that automatically. I have tried sending SIGINT (the expected equivalent of Ctrl + C) from another terminal but I do not get the backtrace anymore. Here are some of my failed attempts based on
GDB how to stop execution without a breakpoint?
Neither of these have any effect:
pkill -SIGINT gdb
kill -SIGINT 5717
where 5717 is the pid of the only gdb running. Sending SIGINT to the legacy_program the same way does kill it but then I do not get the backtrace:
Program received signal SIGINT, Interrupt.
Quit
How can I programmatically pause the execution of the legacy_program after 10 seconds and get a backtrace?
This post was motivated by my frustration not being able to find an answer to this question here at StackOverflow.
Also note that
[it is not merely OK to ask and answer your own question, it is explicitly encouraged.](https://blog.stackoverflow.com/2011/07/its-ok-to-ask-and-answer-your-own-questions/)
Apparently, it is a known (bug) feature in gdb, see
GDB is not trapping SIGINT. Ctrl+C terminates program when should break gdb. Try sending SIGSTOP instead from the other terminal:
pkill -STOP legacy_program
It works on my machine.
Note that you do not have to run the legacy_program in the debugger. Enable core dumps
ulimit -c unlimited
and send the program SIGTRAP to make it crash, then get the backtrace from the core dump. So, start the program:
./legacy_program
From another terminal:
pkill -TRAP legacy_program
The backtrace can be obtained like this:
gdb --batch -ex=bt ./legacy_program core
I'm trying to run binaries built with clang's address sanitizer under the control of ptrace, and I'm having a problem with spurious SIGTRAPs.
My program uses ptrace in the standard manner: child does ptrace(PT_TRACE_ME,...) then exec; parent waits for SIGTRAP in child that indicates the call to exec was made; parent does ptrace(PT_CONTINUE,...) to set the child running.
This all works fine for normal binaries. When running a binary built with the address sanitizer, on the other hand, after doing the PT_CONTINUE to resume the process, the child process receives an unexpected SIGTRAP straight away.
This can be demonstrated using gdb, which interacts with ptrace in a similar sort of way.
Build a simple test program:
$ echo 'int main(){return 50;}' | clang -fsanitize=address -o test -xc -
$ ./test
$ echo $?
50
Run it in gdb:
$ ggdb ./test
<<snip>>
(gdb) run
(Ignore messages about symbols.)
Note that the process has not exited with code 062, but has stopped with a SIGTRAP:
Program received signal SIGTRAP, Trace/breakpoint trap.
0x00007fff5fc01000 in ?? ()
To run the process, continue it manually.
(gdb) continue
Continuing.
[Inferior 1 (process 14536) exited with code 062]
This is all very well for interactive use, but it's a bit tiresome for automated tests, because you need special handling for the address sanitizer builds. I'd much rather keep my test processes the same across all build types if possible.
Anybody know what is going on here?
I'm using clang-700.1.76 (from Xcode). (And gdb 7.9.1 (from MacPorts) - but this looks like a more general problem as my own code suffers from it too.)
I couldn't reproduce this in Linux (gcc 4.8.4/clang 3.8.0, gdb 7.7.1).
I am trying to run a app using gdb in emulator shell. I use following command
gdb <path of exe>
However, The app does not launch and I get following error
Starting program: <path of exe>
[Thread debugging using libthread_db enabled]
Program exited normally.
However, when I attach a running process to gdb, it works fine.
gdb -pid <process_id>
What could be the reason?
****<Update>****
On Employed Russian's advice, I did these steps
(gdb) b _start
Breakpoint 1 at 0xb40
(gdb) b main
Breakpoint 2 at 0xc43
(gdb) catch syscall exit
Catchpoint 3 (syscall 'exit' [1])
(gdb) catch syscall exit_group
Catchpoint 4 (syscall 'exit_group' [252])
(gdb) r
Starting program: <Exe Path>
[Thread debugging using libthread_db enabled]
Breakpoint 1, 0x80000b40 in _start ()
(gdb) c
Continuing.
Breakpoint 2, 0x80000c43 in main ()
(gdb) c
Continuing.
Catchpoint 4 (call to syscall 'exit_group'), 0xb7fe1424 in __kernel_vsyscall
()
(gdb) c
Continuing.
Program exited normally.
(gdb)
What does Catchpoint 4 (call to syscall 'exit_group'), 0xb7fe1424 in __kernel_vsyscall
this mean?
I probed further and i found this
Single stepping until exit from function main,
which has no line number information.
__libc_start_main (main=0xb6deb030 <main>, argc=1, ubp_av=0xbffffce4,
init=0x80037ab0 <__libc_csu_init>, fini=0x80037b10 <__libc_csu_fini>,
rtld_fini=0xb7ff1000 <_dl_fini>, stack_end=0xbffffcdc) at libc-start.c:258
258 libc-start.c: No such file or directory.
in libc-start.c
However, libc.so is present and i have exported its path also using
export LD_LIBRARY=$LD_LIBRARY:/lib
Why is not loading?
The app does not launch and I get following error
You are mistaken: the app does launch (and the output you get is not an error), and then immediately exits with 0 exit status.
Therefore, you should look at the problem with the app, not a problem with GDB. One way to look at the problem is to set a breakpoint on _start and main, and check whether either of these functions is reached.
If they are, using catch syscall exit or catch syscall exit_group may give you a clue for why the application exits.
Perhaps your application employs anti-reverse-engineering techniques, and detects that it is being debugged?
Update:
you've verified that the application in fact starts, reaches main, and then calls exit. Now all you have to do is figure out why it calls exit. The way to do that is to find out where exit_group system call is coming from.
And to do that, you should get to that system call (Catchpoint 4), issue GDB where command. That will tell you how your application decides to exit.
You also (apparently) built your application without debugging info (usually -g flag). You'll make your debugging easier if you build a debug version of the application.