Why does vmlinux get SIGKILL when I try to run it? - linux-kernel

I have unpacked my vmlinuz into a vmlinux and tried to execute it, just to see what would happen. However, the binary gets SIGKILL on startup!
Why does this happen?
I was expecting a SIGILL (kernel tries to do something that is not allowed in user space) or a SIGSEGV (trying to access kernel memory not allowed in user-mode not allowed to access), but not a SIGKILL!
Is the process sending SIGKILL to itself, or is it being killed? GDB doesn't help -- the message is During startup program terminated with signal SIGKILL, Killed.
The max resident memory is only 412kB per /bin/time so the OOM killer is not the culprit. In fact, the SIGKILL is sent even if I have the OOM killer disabled by echo 2 | sudo dd of=/proc/sys/vm/overcommit_memory.

Related

Detaching Gdb without resuming the inferior

Gdb, like any other program, isn't perfect, and every now and then I encounter bugs that render the current Gdb instance unusable. At this point, if I have a debugging session with a lot of valuable state in the inferior, I'd like to be able to just start a new Gdb session on it. That is, detach, quit Gdb and start a new Gdb instance to restart where I left off.
However, when detaching Gdb, it resumes the inferior so that it continues running where it was, which ruins the point of the whole exercise. Therefore, I'm wondering if it's possible to detach in such a state that the inferior is as if it had been sent a SIGSTOP, basically.
I've tried simply killing Gdb, but interestingly, that seems to take the inferior with it. Not sure how that works.
when detaching Gdb, it resumes the inferior
GDB doesn't, the kernel does (assuming Linux).
I've tried simply killing Gdb, but interestingly, that seems to take the inferior with it
The kernel sends it SIGHUP, which normally kills the inferior. You can prevent that with either SIG_IGN in the inferior, or simply (gdb) call signal(1, 1).
After that, you can detach and quit GDB, but the kernel will resume the inferior with SIGCONT (see Update below), so you are back to square one.
However, there is a solution. Consider the following program:
int main()
{
while (1) {
printf("."); fflush(0); sleep(1);
}
}
gdb -q ./a.out
(gdb) run
Starting program: /tmp/a.out
.....^C
Program received signal SIGINT, Interrupt.
0x00007ffff7ad5de0 in __nanosleep_nocancel () at ../sysdeps/unix/syscall-template.S:81
81 ../sysdeps/unix/syscall-template.S: No such file or directory.
We want the program to not run away on detach, so we send it SIGSTOP:
(gdb) signal SIGSTOP
Continuing with signal SIGSTOP.
Program received signal SIGSTOP, Stopped (signal).
0x00007ffff7ad5de0 in __nanosleep_nocancel () at ../sysdeps/unix/syscall-template.S:81
81 in ../sysdeps/unix/syscall-template.S
(gdb) detach
Detaching from program: /tmp/a.out, process 25382
Note that at this point, gdb is detached (but still alive), and the program is not running (stopped).
Now in a different terminal:
gdb -q -ex 'set prompt (gdb2) ' -p 25382
0x00007ffff7ad5de0 in __nanosleep_nocancel () at ../sysdeps/unix/syscall-template.S:81
81 ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb2) c
Continuing.
Program received signal SIGSTOP, Stopped (signal).
0x00007ffff7ad5de0 in __nanosleep_nocancel () at ../sysdeps/unix/syscall-template.S:81
81 in ../sysdeps/unix/syscall-template.S
(gdb2) sig 0
Continuing with no signal.
The program continues running, printing dots in the first terminal.
Update:
SIGHUP -- Interesting. By what mechanism, though?
Good question. I didn't know, but this appears to be the answer:
From setpgid man page:
If the exit of the process causes a process group to become orphaned,
and if any member of the newly orphaned process group is stopped,
then a SIGHUP signal followed by a SIGCONT signal will be sent to
each process in the newly orphaned process group.
I have verified that if I detach and quit GDB without stopping the inferior, it doesn't get SIGHUP and continues running without dying.
If I do send it SIGSTOP and arrange for SIGHUP to be ignored, then I see both SIGHUP and SIGCONT being sent in strace, so that matches the man page exactly:
(gdb) detach
Detaching from program: /tmp/a.out, process 41699
In another window: strace -p 41699. Back to GDB:
(gdb) quit
strace output:
--- stopped by SIGSTOP ---
--- SIGHUP {si_signo=SIGHUP, si_code=SI_KERNEL} ---
--- SIGCONT {si_signo=SIGCONT, si_code=SI_KERNEL} ---
restart_syscall(<... resuming interrupted call ...>) = 0
write(1, ".", 1.) = 1
...

how to close gdb connection without stopping running program

Is there a way to exit from gdb connnection without stopping / exiting running program ? I need that running program continues after gdb connection closed.
Is there a way to exit from gdb connnection without stopping / exiting running program ?
(gdb) help detach
Detach a process or file previously attached.
If a process, it is no longer traced, and it continues its execution. If
you were debugging a file, the file is closed and gdb no longer accesses it.
List of detach subcommands:
detach checkpoint -- Detach from a checkpoint (experimental)
detach inferiors -- Detach from inferior ID (or list of IDS)
Type "help detach" followed by detach subcommand name for full documentation.
Type "apropos word" to search for commands related to "word".
Command name abbreviations are allowed if unambiguous.
Since the accepted (only other) answer does not specifically address how to shut down gdb without stopping the program under test, I'm throwing my hat into the ring.
Option 1
Kill the server from the terminal in which it's running by holding Ctrl+c.
Option 2
Kill the gdb server and/or client from another terminal session.
$ ps -u username | grep gdb
667511 pts/6 00:00:00 gdbserver
667587 pts/7 00:00:00 gdbclient
$ kill 667587
$ kill 667511
These options are for a Linux environment. A similar approach (killing the process) would probably also work in Windows.

How to pause the execution of a program after 10 seconds and get a backtrace?

A legacy program most likely gets into an infinite loop on certain pathological inputs. I have >1000 such instances, however, I suspect that the vast majority of them trigger the same bug. Therefore, I would like to reduce the >1000 instances to the fundamentally different ones. The first step is to pause the application after, say, 10 seconds and collect the backtrace.
If I run:
gdb --batch --command=backtrace.txt --args ./legacy_program
with backtrace.txt
run
bt
and I hit Ctrl + C after 10 seconds in the same terminal I get exactly the backtrace I want.
Now, I would like to do that automatically. I have tried sending SIGINT (the expected equivalent of Ctrl + C) from another terminal but I do not get the backtrace anymore. Here are some of my failed attempts based on
GDB how to stop execution without a breakpoint?
Neither of these have any effect:
pkill -SIGINT gdb
kill -SIGINT 5717
where 5717 is the pid of the only gdb running. Sending SIGINT to the legacy_program the same way does kill it but then I do not get the backtrace:
Program received signal SIGINT, Interrupt.
Quit
How can I programmatically pause the execution of the legacy_program after 10 seconds and get a backtrace?
This post was motivated by my frustration not being able to find an answer to this question here at StackOverflow.
Also note that
[it is not merely OK to ask and answer your own question, it is explicitly encouraged.](https://blog.stackoverflow.com/2011/07/its-ok-to-ask-and-answer-your-own-questions/)
Apparently, it is a known (bug) feature in gdb, see
GDB is not trapping SIGINT. Ctrl+C terminates program when should break gdb. Try sending SIGSTOP instead from the other terminal:
pkill -STOP legacy_program
It works on my machine.
Note that you do not have to run the legacy_program in the debugger. Enable core dumps
ulimit -c unlimited
and send the program SIGTRAP to make it crash, then get the backtrace from the core dump. So, start the program:
./legacy_program
From another terminal:
pkill -TRAP legacy_program
The backtrace can be obtained like this:
gdb --batch -ex=bt ./legacy_program core

Send Ctrl-C to app in LLDB

I have an CLI app that is seg faulting during termination (After sending a Ctrl-C)
Pressing Ctrl-C in lldb naturally pauses execution.
Then I try:
(lldb)process signal SIGINT
(lldb)process continue
But that doesn't actually seem to do anything to terminate the app.
Also tried:
(lldb)process signal 2
The debugger uses ^C for interrupting the target, so it assumes that you don't actually want the ^C propagated to the target. You can change this behavior by using the "process handle" command:
(lldb) process handle SIGINT -p true
telling lldb to "pass" SIGINT to the process.
If you had stopped the process in lldb by issuing a ^C, then when you change the process handle as shown here and continue, that SIGINT will be forwarded to the process.
If you stopped for some other reason, after specifying SIGINT to be passed to the process, you can generate a SIGINT by getting the PID of the debugee using the process status and send a SIGINT directly to said process using platform shell:
(lldb) process status
(lldb) platform shell kill -INT {{PID from previous step}}
(lldb) continue
The easiest way I found was just to send the process a SIGINT directly. Take the pid of the debuggee process (which process status will show you) and run
kill -INT <pid>
from another terminal.

device /dev/ttyusb0 lock failed: operation not permitted

I was playing around with a router earlier this evening using minicom and I must not have closed it cleanly. Here is the error message that I get when I try to open minicom:
device /dev/ttyusb0 lock failed: operation not permitted
I have two questions, 1) how would I go about getting out of this state, and 2) how do I exit minicom cleanly so that I can avoid this happening again.
I found I was able to fix the situation on my CentOS box by running minicom -S <device> -o and the do the normal exit key sequence (CTRL-a, x).
In your situation it would have been
sudo minicom -S ttyusb0 -o
This cleared the lock files minicom had placed in /var/lock/
Good luck
Ash
I ran into a similar issue with using gtkterm from a remote terminal. I had shutdown the terminal without explicitly terminating gtkterm. The result was that subsequent gtkterm sessions gave me the error:
Device /dev/ttyUSB0 is locked.
Checking the process list via ps did not show any gtkterm processes still running.
I corrected this by simply deleting /run/lock/LCK..ttyUSB0. After doing that, gtkterm was able to open ttyUSB0 successfully.
[root#edge-tc lock]# minicom'
Device /dev/ttyUSB0 lock failed: Operation not permitted.'
Solution:'
Check the process which have locked and kill the process'
[root#edge-tc lock]# fuser /dev/ttyUSB0'
/dev/ttyUSB0: 18328
[root#edge-tc lock]# kill -9 18328
[root#edge-tc lock]#'
[root#edge-tc lock]#'
[root#edge-tc lock]# minicom'
Welcome to minicom 2.1'
The canonical way is to use lockdev. This manages the lock files on a per-device basis in /run/lock/lockdev/ (at least under CentOS 7.x).
lockdev <device> can be used without being root, and returns non-zero if the device has already been locked, in which case it can be unlocked with lockdev -u <device>.
This is apparently obsolete these days, but minicom (at least as of version 2.6.2) still uses it.

Resources