Detaching Gdb without resuming the inferior - debugging

Gdb, like any other program, isn't perfect, and every now and then I encounter bugs that render the current Gdb instance unusable. At this point, if I have a debugging session with a lot of valuable state in the inferior, I'd like to be able to just start a new Gdb session on it. That is, detach, quit Gdb and start a new Gdb instance to restart where I left off.
However, when detaching Gdb, it resumes the inferior so that it continues running where it was, which ruins the point of the whole exercise. Therefore, I'm wondering if it's possible to detach in such a state that the inferior is as if it had been sent a SIGSTOP, basically.
I've tried simply killing Gdb, but interestingly, that seems to take the inferior with it. Not sure how that works.

when detaching Gdb, it resumes the inferior
GDB doesn't, the kernel does (assuming Linux).
I've tried simply killing Gdb, but interestingly, that seems to take the inferior with it
The kernel sends it SIGHUP, which normally kills the inferior. You can prevent that with either SIG_IGN in the inferior, or simply (gdb) call signal(1, 1).
After that, you can detach and quit GDB, but the kernel will resume the inferior with SIGCONT (see Update below), so you are back to square one.
However, there is a solution. Consider the following program:
int main()
{
while (1) {
printf("."); fflush(0); sleep(1);
}
}
gdb -q ./a.out
(gdb) run
Starting program: /tmp/a.out
.....^C
Program received signal SIGINT, Interrupt.
0x00007ffff7ad5de0 in __nanosleep_nocancel () at ../sysdeps/unix/syscall-template.S:81
81 ../sysdeps/unix/syscall-template.S: No such file or directory.
We want the program to not run away on detach, so we send it SIGSTOP:
(gdb) signal SIGSTOP
Continuing with signal SIGSTOP.
Program received signal SIGSTOP, Stopped (signal).
0x00007ffff7ad5de0 in __nanosleep_nocancel () at ../sysdeps/unix/syscall-template.S:81
81 in ../sysdeps/unix/syscall-template.S
(gdb) detach
Detaching from program: /tmp/a.out, process 25382
Note that at this point, gdb is detached (but still alive), and the program is not running (stopped).
Now in a different terminal:
gdb -q -ex 'set prompt (gdb2) ' -p 25382
0x00007ffff7ad5de0 in __nanosleep_nocancel () at ../sysdeps/unix/syscall-template.S:81
81 ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb2) c
Continuing.
Program received signal SIGSTOP, Stopped (signal).
0x00007ffff7ad5de0 in __nanosleep_nocancel () at ../sysdeps/unix/syscall-template.S:81
81 in ../sysdeps/unix/syscall-template.S
(gdb2) sig 0
Continuing with no signal.
The program continues running, printing dots in the first terminal.
Update:
SIGHUP -- Interesting. By what mechanism, though?
Good question. I didn't know, but this appears to be the answer:
From setpgid man page:
If the exit of the process causes a process group to become orphaned,
and if any member of the newly orphaned process group is stopped,
then a SIGHUP signal followed by a SIGCONT signal will be sent to
each process in the newly orphaned process group.
I have verified that if I detach and quit GDB without stopping the inferior, it doesn't get SIGHUP and continues running without dying.
If I do send it SIGSTOP and arrange for SIGHUP to be ignored, then I see both SIGHUP and SIGCONT being sent in strace, so that matches the man page exactly:
(gdb) detach
Detaching from program: /tmp/a.out, process 41699
In another window: strace -p 41699. Back to GDB:
(gdb) quit
strace output:
--- stopped by SIGSTOP ---
--- SIGHUP {si_signo=SIGHUP, si_code=SI_KERNEL} ---
--- SIGCONT {si_signo=SIGCONT, si_code=SI_KERNEL} ---
restart_syscall(<... resuming interrupted call ...>) = 0
write(1, ".", 1.) = 1
...

Related

how to close gdb connection without stopping running program

Is there a way to exit from gdb connnection without stopping / exiting running program ? I need that running program continues after gdb connection closed.
Is there a way to exit from gdb connnection without stopping / exiting running program ?
(gdb) help detach
Detach a process or file previously attached.
If a process, it is no longer traced, and it continues its execution. If
you were debugging a file, the file is closed and gdb no longer accesses it.
List of detach subcommands:
detach checkpoint -- Detach from a checkpoint (experimental)
detach inferiors -- Detach from inferior ID (or list of IDS)
Type "help detach" followed by detach subcommand name for full documentation.
Type "apropos word" to search for commands related to "word".
Command name abbreviations are allowed if unambiguous.
Since the accepted (only other) answer does not specifically address how to shut down gdb without stopping the program under test, I'm throwing my hat into the ring.
Option 1
Kill the server from the terminal in which it's running by holding Ctrl+c.
Option 2
Kill the gdb server and/or client from another terminal session.
$ ps -u username | grep gdb
667511 pts/6 00:00:00 gdbserver
667587 pts/7 00:00:00 gdbclient
$ kill 667587
$ kill 667511
These options are for a Linux environment. A similar approach (killing the process) would probably also work in Windows.

How to pause the execution of a program after 10 seconds and get a backtrace?

A legacy program most likely gets into an infinite loop on certain pathological inputs. I have >1000 such instances, however, I suspect that the vast majority of them trigger the same bug. Therefore, I would like to reduce the >1000 instances to the fundamentally different ones. The first step is to pause the application after, say, 10 seconds and collect the backtrace.
If I run:
gdb --batch --command=backtrace.txt --args ./legacy_program
with backtrace.txt
run
bt
and I hit Ctrl + C after 10 seconds in the same terminal I get exactly the backtrace I want.
Now, I would like to do that automatically. I have tried sending SIGINT (the expected equivalent of Ctrl + C) from another terminal but I do not get the backtrace anymore. Here are some of my failed attempts based on
GDB how to stop execution without a breakpoint?
Neither of these have any effect:
pkill -SIGINT gdb
kill -SIGINT 5717
where 5717 is the pid of the only gdb running. Sending SIGINT to the legacy_program the same way does kill it but then I do not get the backtrace:
Program received signal SIGINT, Interrupt.
Quit
How can I programmatically pause the execution of the legacy_program after 10 seconds and get a backtrace?
This post was motivated by my frustration not being able to find an answer to this question here at StackOverflow.
Also note that
[it is not merely OK to ask and answer your own question, it is explicitly encouraged.](https://blog.stackoverflow.com/2011/07/its-ok-to-ask-and-answer-your-own-questions/)
Apparently, it is a known (bug) feature in gdb, see
GDB is not trapping SIGINT. Ctrl+C terminates program when should break gdb. Try sending SIGSTOP instead from the other terminal:
pkill -STOP legacy_program
It works on my machine.
Note that you do not have to run the legacy_program in the debugger. Enable core dumps
ulimit -c unlimited
and send the program SIGTRAP to make it crash, then get the backtrace from the core dump. So, start the program:
./legacy_program
From another terminal:
pkill -TRAP legacy_program
The backtrace can be obtained like this:
gdb --batch -ex=bt ./legacy_program core

Preventing debugging session from pausing after each inferior exits

I'm debugging a tree of processes using gdb's very handy multiple-inferior support:
(gdb) set detach-on-fork off
(gdb) set schedule-multiple on
(gdb) set follow-fork-mode parent
(gdb) break PostgresMain
(gdb) break PostmasterMain
and now need to let things run until I hit one of the future breakpoints in some yet to be spawned inferior.
However, gdb seems to be "helpfully" pausing whenever an inferior exits normally, or at least blocking cleanup of the inferior so that its parent's wait() can return:
(gdb) c
[New process 16505]
process 16505 is executing new program: /home/craig/pg/bdr/bin/pg_config
Reading symbols from /home/craig/pg/bdr/bin/pg_config...done.
[Inferior 2 (process 16505) exited normally]
(gdb) info inferior
Num Description Executable
* 2 <null> /home/craig/pg/bdr/bin/pg_config
1 process 16501 /usr/bin/make
(gdb) inferior 1
[Switching to inferior 1 [process 16501] (/usr/bin/make)]
[Switching to thread 1 (process 16501)]
#0 0x0000003bc68bc502 in __libc_wait (stat_loc=0x7fffffffbc78) at ../sysdeps/unix/sysv/linux/wait.c:30
30 return INLINE_SYSCALL (wait4, 4, WAIT_ANY, stat_loc, 0,
(gdb)
so I have to endlessly:
(gdb) inferior 1
(gdb) c
to carry on. About 70 times, before I hit the desired breakpoint in a child of a child of a child.
I think what's happening is that gdb treats process exit as a stop event, and since non-stop is set to off (the default) it stops all threads in all inferiors when one thread stops. However, this inferior has terminated, it isn't a normal stop event, so you can't just cont it, you have to switch to another process first.
Is there some way to stop gdb pausing at each inferior exit? I would've expected follow-fork-mode parent with schedule-multiple on to do the trick, but gdb seems to still want to stop when an inferior exits.
I guess I'm looking for something like a "skip proc-exit", or a virtual signal I can change the handler policy on so it doesn't stop.
set non-stop on seems like it should be the right answer, but I suspect it's broken for multiple inferiors.
If I use non-stop on, then after the first exit trap, gdb's internal state indicates that inferior 1 is running:
(gdb) info inferior
Num Description Executable
* 1 process 20540 /usr/bin/make
(gdb) info thread
Id Target Id Frame
* 1 process 20540 "make" (running)
(gdb) cont
Continuing.
Cannot execute this command while the selected thread is running.
but the kernel sees it as blocked on ptrace_stop:
$ ps -o "cmd,wchan" -p 20540
CMD WCHAN
/usr/bin/make check ptrace_stop
... and it makes no progress until gdb is detached, or it's killed. Signals to the process are ignored, and interrupt in gdb has no effect.
I'm using GNU gdb (GDB) Fedora 7.7.1-18.fc20 on x86_64.
After stumbling on a post that references it in passing I found that the missing magic is set target-async on alongside set non-stop on.
non-stop mode, as expected, means gdb won't stop everything whenever an inferior exits. target-async is required to make it actually work correctly on gdb 7.7; it's the default on 7.8.
So the full incantation is:
set detach-on-fork off
set schedule-multiple on
set follow-fork-mode parent
set non-stop on
set target-async on
For 7.8, remove target-async on and, to reduce noise, add set print symbol-loading off.
The following Python extension to gdb will switch back to the first inferior and resume execution after each stop.
It feels like a total hack, but it works. When a process exits it sets a flag indicating that it stopped on an exit, then switches to the original process. gdb will then stop execution, delivering a stop event. We check to see if the stop was caused by our stop event and if so, we immediately continue.
The code also sets up the breakpoints I'm using and the multi-process settings, so I can just source thescript.py and run .
gdb.execute("set python print-stack full")
gdb.execute("set detach-on-fork off")
gdb.execute("set schedule-multiple on")
gdb.execute("set follow-fork-mode parent")
gdb.execute("set breakpoint pending on")
gdb.execute("break PostgresMain")
gdb.execute("break PostmasterMain")
gdb.execute("set breakpoint pending off")
def do_continue():
gdb.execute("continue")
def exit_handler(event):
global my_stop_request
has_threads = [ inferior.num for inferior in gdb.inferiors() if inferior.threads() ]
if has_threads:
has_threads.sort()
gdb.execute("inferior %d" % has_threads[0])
my_stop_request = True
gdb.events.exited.connect(exit_handler)
def stop_handler(event):
global my_stop_request
if isinstance(event, gdb.SignalEvent):
pass
elif isinstance(event, gdb.BreakpointEvent):
pass
elif my_stop_request:
my_stop_request = False
gdb.post_event(do_continue)
gdb.events.stop.connect(stop_handler)
There must be an easier way than this. It's ugly.

Send Ctrl-C to app in LLDB

I have an CLI app that is seg faulting during termination (After sending a Ctrl-C)
Pressing Ctrl-C in lldb naturally pauses execution.
Then I try:
(lldb)process signal SIGINT
(lldb)process continue
But that doesn't actually seem to do anything to terminate the app.
Also tried:
(lldb)process signal 2
The debugger uses ^C for interrupting the target, so it assumes that you don't actually want the ^C propagated to the target. You can change this behavior by using the "process handle" command:
(lldb) process handle SIGINT -p true
telling lldb to "pass" SIGINT to the process.
If you had stopped the process in lldb by issuing a ^C, then when you change the process handle as shown here and continue, that SIGINT will be forwarded to the process.
If you stopped for some other reason, after specifying SIGINT to be passed to the process, you can generate a SIGINT by getting the PID of the debugee using the process status and send a SIGINT directly to said process using platform shell:
(lldb) process status
(lldb) platform shell kill -INT {{PID from previous step}}
(lldb) continue
The easiest way I found was just to send the process a SIGINT directly. Take the pid of the debuggee process (which process status will show you) and run
kill -INT <pid>
from another terminal.

shell script process termination issue

/bin/sh -version
GNU sh, version 1.14.7(1)
exitfn () {
# Resore signal handling for SIGINT
echo "exiting with trap" >> /tmp/logfile
rm -f /var/run/lockfile.pid # Growl at user,
exit # then exit script.
}
trap 'exitfn; exit' SIGINT SIGQUIT SIGTERM SIGKILL SIGHUP
The above is my function in shell script.
I want to call it in some special conditions...like
when:
"kill -9" fires on pid of this script
"ctrl + z" press while it is running on -x mode
server reboots while script is executing ..
In short, with any kind of interrupt in script, should do some action
eg. rm -f /var/run/lockfile.pid
but my above function is not working properly; it works only for terminal close or "ctrl + c"
Kindly don't suggest to upgrade "bash / sh" version.
SIGKILL cannot be trapped by the trap command, or by any process. It is a guarenteed kill signal, that by it's definition cannot be trapped. Thus upgrading you sh/bash will not work anyway.
You can't trap kill -9 that's the whole point of it, to destroy processes violently that don't respond to other signals (there's a workaround for this, see below).
The server reboot should first deliver a signal to your script which should be caught with what you have.
As to the CTRL-Z, that also gives you a signal, SIGSTOP from memory, so you may want to add that. Though that wouldn't normally be a reason to shut down your process since it may be then put into the background and restarted (with bg).
As to what do do for those situations where your process dies without a catchable signal (like the -9 case), the program should check for that on startup.
By that, I mean lockfile.pid should store the actual PID of the process that created it (by using echo $$ >/var/run/myprog_lockfile.pid for example) and, if you try to start your program, it should check for the existence of that process.
If the process doesn't exist, or it exists but isn't the right one (based on name usually), your new process should delete the pidfile and carry on as if it was never there. If the old process both exists and is the right one, your new process should log a message and exit.

Resources