SIGTERM signal handling confusion - bash

I am running a program which invokes a shell script (for discussion sh1 with pid 100).
This script in turn invokes another script (for discussion sh2 with pid 101) and waits for it to finish. sh2(child script) takes about 50 seconds to finish.
The way I invoke the sh2 (/bin/sh2.sh )
During waiting for child to be done, I try to Terminate sh1 (using kill -15 100). I have a handler function in sh1 to handle this signal. However, I observe that my sh1(parent script) doesnot get terminated till the child is done with its work (for 50 seconds) and only after that this Signal is handled.
I modified my child script, to take 30 seconds to finish and I observe that after issuing the SIGTERM to sh1, it then takes around 30 seconds to terminate.
Is this the behavior while handling the SIGTERM ? that is to remain blocked by the child process ? and only then handle the signal. Doesn't the process get interrupted for signal handling?
SIGNAL Handling in parent script.
function clean_up()
{
//Do the cleanup
}
trap "clean_up $$; exit 0" TERM

If sh1 invokes sh2 and waits for it to finish, then it doesn't run the trap for the signal until after sh2 finishes. That is, if this is sh1:
#!/bin/sh
trap 'echo caught signal delayed' SIGTERM
sh2
then sh1 will catch the signal and do nothing until sh2 finishes, and then it will execute the trap. If you want the trap to fire as soon as the signal is sent, you can run sh2 asynchronously and wait for it explicitly:
#!/bin/sh
trap 'echo caught signal' SIGTERM
sh2&
wait
Unfortunately, this doesn't re-enter the wait. If you need to continue waiting, it's not really possible to do reliably, but you can get close with:
#!/bin/sh
trap 'echo caught signal' SIGTERM
sh2&
(exit 129) # prime the loop (eg, set $?) to simulate do/while
while test $? -gt 128; do wait; done
This isn't reliable because you can't tell the difference between catching a signal yourself and sh2 being terminated by a signal. If you need this to be reliable, you should re-write sh1 in a language which allows better control of signals.

Related

Why may USR1 signals sent from background jobs in a Bash script not be reliably received by the parent shell process waiting for their completion?

I have a Bash script running a bunch of background jobs in parallel.
Under certain conditions, before a background job completes, it sends
a USR1 signal to the spawning Bash process (say, to inform that
some process that was run as a part of the job had terminated with
a nonzero exit code).
In a simplified form, the script is equivalent to the one shown below.
Here, for simplicity, each background job always sends a USR1 signal
before completion, unconditionally (via the signalparent() function).
signalparent() { kill -USR1 $$; }
handlesignal() { echo 'USR1 signal caught' >&2; }
trap handlesignal USR1
for i in {1..10}; do
{
sleep 1
echo "job $i finished" >&2
signalparent
} &
done
wait
When I run the above script (using Bash 3.2.57 on macOS 11.1, at least),
I observe some behavior that I cannot explain, which makes me think
that there is something in the interplay of Bash job management and
signal trapping that I overlook.
Specifically, I would like to acquire an explanation for the following
behaviors.
Almost always, when I run the script, I see fewer “signal caught”
lines in the output (from the handlesignal() function) than there
are jobs started in the for-loop—most of the time it is one to
four of those lines that are printed for ten jobs being started.
Why is it that, by the time the wait call completes, there
are still background jobs whose signaling kill commands had
not been yet executed?
At the same time, every so often, in some invocations of the script,
I observe the kill command (from the signalparent() function)
report an error regarding the originating process running the script
(i.e., the one with the $$ PID) no longer being present—see the
output below.
How come there are jobs whose signaling kill commands are still
running while the parent shell process had already terminated?
It was my understanding that it is impossible for the parent
process to terminate before all background jobs do, due to the
wait call.
job 2 finished
job 3 finished
job 5 finished
job 4 finished
job 1 finished
job 6 finished
USR1 signal caught
USR1 signal caught
job 10 finished
job 7 finished
job 8 finished
job 9 finished
bash: line 3: kill: (19207) - No such process
bash: line 3: kill: (19207) - No such process
bash: line 3: kill: (19207) - No such process
bash: line 3: kill: (19207) - No such process
Both of these behaviors signalize to me a presence of a race condition
of some kind, whose origins I do not quite understand. I would
appreciate if anyone could enlighten me on those, and perhaps even
suggest how the script could be changed to avoid such race conditions.
This is explained in the Bash Reference Manual as follows.
When bash is waiting for an asynchronous command via the wait builtin, the reception of a signal for which a trap has been set will cause the wait builtin to return immediately with an exit status greater than 128, immediately after which the trap is executed.
So, you need to repeat wait until it returns 0 to make sure all background jobs have terminated, e.g.:
until wait; do
:
done
It was my understanding that it is impossible for the parent process to terminate before all background jobs do, due to the wait call.
That is a misunderstanding; wait may return due to reception of a signal for which a trap has been set while there are running jobs at the background, and that may lead to normal completion of the program, with the side effect of leaving those jobs orphaned.
Regarding ‘Almost always, when I run the script, I see fewer “signal caught” lines in the output’—
According to signal(7):
Standard signals do not queue. If multiple instances of a standard signal are generated while that signal is blocked, then only one instance of the signal is marked as pending (and the signal will be delivered just once when it is unblocked).
One way to change your script so that the signals do not arrive at the same time is as follows:
signalparent() {
kill -USR1 $$
}
ncaught=0
handlesignal() {
(( ++ncaught ))
echo "USR1 signal caught (#=$ncaught)" >&2
}
trap handlesignal USR1
for i in {1..10}; do
{
sleep $i
signalparent
} &
done
nwaited=0
while (( nwaited < 10 )); do
wait && (( ++nwaited ))
done
Here is the output of the modified script with Bash 5.1 on macOS 10.15:
USR1 signal caught (#=1)
USR1 signal caught (#=2)
USR1 signal caught (#=3)
USR1 signal caught (#=4)
USR1 signal caught (#=5)
USR1 signal caught (#=6)
USR1 signal caught (#=7)
USR1 signal caught (#=8)
USR1 signal caught (#=9)
USR1 signal caught (#=10)

How can I send a signal without the shell waiting for the currently running program to finish?

If I send a signal using kill, it seems to wait until the current program (in this example sleep 1000) finishes running. When I instead send SIGINT via pressing Ctrl+C in the shell, it receives the interrupt immediately however.
What I want, however, is for the interrupt to be received immediately after sending the signal via kill. Also, why does it behave like I would want it to when I press Ctrl+C?
#!/usr/bin/env sh
int_after_a_while() {
local pid=$1
sleep 2
echo "Attempting to kill $pid with SIGINT"
# Here I want to kill the process immediately, but it waits until sleep finishes
kill -s INT $pid
}
trap "echo Interrupt received!" INT
int_after_a_while $$ &
sleep 1000
I would appreciate any help on this issue. Thanks in advance!
As noted in the referenced answer https://unix.stackexchange.com/questions/282525/why-did-my-trap-not-trigger/282631#282631 the shell will normally wait for a utility to complete before running a trap. Some alternatives are:
Start the long running process in the background, then wait for it using the wait builtin. When a trapped signal is received during such a wait, the wait is interrupted and the trap is taken. Unfortunately, the exit status of wait does not distinguish between the child process exiting on a signal and a trap occurring. For example
sleep 1000 &
p=$!
wait "$p"
Send a signal to the whole process group via kill -s INT 0. The effect is much like if the user had pressed Ctrl+C, but may be more extreme than you want if your script is run from another script.
Use a shell such as zsh or FreeBSD sh that supports set -o trapsasync which allows running traps while waiting for a foreground job.

Handling signals in bash script when it started as a background process

Okay, I have a script like this:
trap 'echo "CTRL-C signal was caught!" ' SIGINT
for ((i=0; i<15; i++))
do
sleep 3
done
When I start my script in a usual way, it immediately reacts to CTRL-C command and echo "CTRL-C signal was caught!", even if there is a sleep 3 command. But when I run my script as a background process, it waits until sleep 3 command is finished, and then echo "CTRL-C signal was caught!"
I do not understand this. I think trap should wait until previous command is finished, and then it should echo something, like when it started as a background process.
Bash manual states:
Background processes (...) are immune to keyboard-generated signals.
If bash is waiting for a command to complete and receives a signal for which a trap has been set, the trap will not be executed until the command completes.
Consequently:
If your script runs in the foreground: when you press "Ctrl-C", a SIGINT is sent to the currently running process (i.e. the sleep command). The exit status of sleep tells Bash that it was interrupted by the SIGINT signal and bash calls your trap.
If your script runs in the background, then the backgrounded sleep does not receive the signal and the SIGINT trap is only executed once sleep has ended.

bash trap propagated to command with custom signal handler

In my script I'm trapping signals in the usual way.
function on_stop {
echo 'On Stop'
sleep 10
echo 'Signalling others to exit'
trap - TERM EXIT INT
kill -s INT "$$"
}
./executable_with_custom_signal_handling &
pid=$!
trap 'on_stop' TERM EXIT INT
wait
If sleep is used instead of ./executable_with_custom_signal_handling everything works as expected. Otherwise, ./executable_with_custom_signal_handling receives signal immediately in parallel with on_stop.
I am wondering does it have something to do with a custom signal handling in the executable?
signal(SIGINT, handler)
Any workarounds known?
By default, the shell runs backgrounded commands with SIGINT (and SIGQUIT) ignored.
Your backgrounded sleep is not interrupted by the Ctrl-C SIGINT to the process group, then, because it never sees the signal. When your custom executable installs a new signal action, replacing SIG_IGN, that executable will receive the SIGINT.

How does trap / kill work in bash on Linux?

My sample file
traptest.sh:
#!/bin/bash
trap 'echo trapped' TERM
while :
do
sleep 1000
done
$ traptest.sh &
[1] 4280
$ kill %1 <-- kill by job number works
Terminated
trapped
$ traptest.sh &
[1] 4280
$ kill 4280 <-- kill by process id doesn't work?
(sound of crickets, process isn't killed)
If I remove the trap statement completely, kill process-id works again?
Running some RHEL 2.6.18-194.11.4.el5 at work. I am really confused by this behaviour, is it right?
kill [pid]
send the TERM signal exclusively to the specified PID.
kill %1
send the TERM signal to the job #1's entire process group, in this case to the script pid + his children (sleep).
I've verified that with strace on sleep process and on script process
Anyway, someone got a similar problem here (but with SIGINT instead of SIGTERM): http://www.vidarholen.net/contents/blog/?p=34.
Quoting the most important sentence:
kill -INT %1 sends the signal to the job’s process group, not the backgrounded pid!
This is expected behavior. Default signal sent by kill is SIGTERM, which you are catching by your trap. Consider this:
#!/bin/bash
# traptest.sh
trap "echo Booh!" SIGINT SIGTERM
echo "pid is $$"
while : # This is the same as "while true".
do
a=1
done
(sleep really creates a new process and the behavior is clearer with my example I guess).
So if you run traptest.sh in one terminal and kill TRAPTEST_PROCESS_ID from another terminal, output in the terminal running traptest will be Booh! as expected (and the process will NOT be killed). If you try sending kill -s HUP TRAPTEST_PROCESS_ID, it will kill the traptest process.
This should clear up the %1 confusion.
Note: the code example is taken from tldp
Davide Berra explained the difference between kill %<jobspec> and kill <PID>, but not how that difference results in what you observed. After all, Unix signal handlers should be called pretty much instantaneously, so why does sending a SIGTERM to the script alone not trigger its trap handler?
The bash man page explains why, in the last paragraph of the SIGNALS section:
If bash is waiting for a command to complete and receives a signal for
which a trap has been set, the trap will not be executed until the
command completes.
So, the signal was delivered immediately, but the handler execution was deferred until sleep exited.
Hence, with kill %<jobspec>:
Both the script and sleep received SIGTERM
bash registered the signal, noticed that a trap was set for it, and queued the handler for future execution
sleep exited immediately
bash noted sleep's exit, and ran the trap handler
whereas with kill <script_PID>:
Only the script received SIGTERM
bash registered the signal, noticed that a trap was set for it, and queued the handler for future execution
sleep exited after 1000 seconds
bash noted sleep's exit, and ran the trap handler
Obviously, you didn't want long enough to see that last bit. :)
If you're interested in the gory details, download the bash source code and look in trap.c, specifically the trap_handler() and run_pending_traps() functions.

Resources