Prevent SIGINT from interrupting current task while still passing information about SIGINT (and preserve the exit code) - shell

I have a quite long shell script and I'm trying to add signal handling to it.
The main task of the script is to run various programs and then clean up their temporary files.
I want to trap SIGINT.
When the signal is caught, the script should wait for the current program to finish execution, then do the cleanup and exit.
Here is an MCVE:
#!/bin/sh
stop_this=0
trap 'stop_this=1' 2
while true ; do
result="$(sleep 2 ; echo success)" # run some program
echo "result: '$result'"
echo "Cleaning up..." # clean up temporary files
if [ $stop_this -ne 0 ] ; then
echo 'OK, time to stop this.'
break
fi
done
exit 0
The expected result:
Cleaning up...
result: 'success'
Cleaning up...
^Cresult: 'success'
Cleaning up...
OK, time to stop this.
The actual result:
Cleaning up...
result: 'success'
Cleaning up...
^Cresult: ''
Cleaning up...
OK, time to stop this.
The problem is that the currently running instruction (result="$(sleep 2 ; echo success)" in this case) is interrupted.
What can I do so it would behave more like I was set trap '' 2?
I'm looking for either a POSIX solution or one that is supported by most of shell interpreters (BusyBox, dash, Cygwin...)
I already saw answers for Prevent SIGINT from closing child process in bash script but this isn't really working for me. All of these solutions require to modify each line which shouldn't be interrupted. My real script is quite long and much more complicated than the example. I would have to modify hundreds of lines.

You need to prevent the SIGINT from going to the echo in the first place (or rewrite the cmd that you are running in the variable assignment to ignore SIGINT). Also, you need to allow the variable assignment to happen, and it appears that the shell is aborting the assignment when it receives the SIGINT. If you're only worried about user generated SIGINT from the tty, you need to disassociate that command from the tty (eg, get it out of the foreground process group) and prevent the SIGINT from aborting the assignment. You can (almost) accomplish both of those with:
#!/bin/sh
stop_this=0
while true ; do
trap 'stop_this=1' INT
{ sleep 1; echo success > tmpfile; } & # run some program
while ! wait; do : ; done
trap : INT
result=$(cat tmpfile& wait)
echo "result: '$result'"
echo "Cleaning up..." # clean up temporary files
if [ $stop_this -ne 0 ] ; then
echo 'OK, time to stop this.'
break
fi
done
exit 0
If you're worried about SIGINT from another source, you'll have to re-implement sleep (or whatever command I presume sleep is a proxy for) to handle SIGINT the way you want. The key here is to run the command in the background and wait for it to prevent the SIGINT from going to it and terminating it early. Note that we've opened at least 2 new cans of worms here. By waiting in a loop, we're effectively ignoring the any errors that the subcommand might raise (we're doing this to try and implement a SIGRESTART), so may potentially hang. Also, if the SIGINT arrives during the cat, we have attempted to prevent the cat from aborting by running it in the background, but now the variable assignment will be terminated and you'll get your original behavior. Signal handling is not clean in the shell! But this gets you closer to your desired goal.

Sighandling in shell scripts can get clumsy. It's pretty much impossible to
do it "right" without the support of C.
The problem with:
result="$(sleep 2 ; echo success)" # run some program
is that $() creates a subshell and in subshells, non-ignored (trap '' SIGNAL is how you ignore SIGNAL)
signals are reset to their default dispositions which for SIGINT is to terminate the process
($( ) gets its own process, thought it will receive the signal too because the terminal-generated SIGINT
is process-group targeted)
To prevent this, you could do something like:
result="$(
trap '' INT #ignore; could get killed right before the trap command
sleep 2; echo success)"
or
result="$( trap : INT; #no-op handler; same problem
sleep 2; while ! echo success; do :; done)"
but as noted, there will be a small race-condition window between the start of the
subshell and the registration of the signal handler during which
the subshell could get killed by the reset-to-default SIGINT signal.

Both answers from #PSkocik and #WilliamPursell have helped me to get on the right track.
I have a fully working solution. It ain't pretty because it needs to use an external file to indicate that the signal didn't occurred but beside that it should work reliably.
#!/bin/sh
touch ./continue
trap 'rm -f ./continue' 2
( # the whole main body of the script is in a separate background process
trap '' 2 # ignore SIGINT
while true ; do
result="$(sleep 2 ; echo success)" # run some program
echo "result: '$result'"
echo "Cleaning up..." # clean up temporary files
if [ ! -e ./continue ] ; then # exit the loop if file "./continue" is deleted
echo 'OK, time to stop this.'
break
fi
done
) & # end of the main body of the script
while ! wait ; do : ; done # wait for the background process to end (ignore signals)
wait $! # wait again to get the exit code
result=$? # exit code of the background process
rm -f ./continue # clean up if the background process ended without a signal
exit $result
EDIT: There are some problems with this code in Cygwin.
The main functionality regarding signals work.
However, it seems like the finished background process doesn't stay in the system as a zombie. This makes the wait $! to not work. The exit code of the script has incorrect value of 127.
Solution to that would be removing lines wait $!, result=$? and result=$? so the script always returns 0.
It should be also possible to keep the proper error code by using another layer of subshell and temporarily store the exit code in a file.

For disallowing interrupting the program:
trap "" ERR HUP INT QUIT TERM TSTP TTIN TTOU
But if a sub-command handles traps by itself, and that command must really complete, you need to prevent passing signals to it.
For people on Linux that don't mind installing extra commands, you can just use:
waitFor [command]
Alternatively you can adapt the latest source code of waitFor into your program as needed, or use the code from Gilles' answer. Although that has the disadvantage of not benefiting from updates upstream.
Just mind that other terminals and the service manager can still terminate "command". If you want the service manager to be unable to close "command", it shall be run as a service with the appropriate kill mode and kill signal set.

You may want to adapt the following:
#!/bin/sh
tmpfile=".tmpfile"
rm -f $tmpfile
trap : INT
# put the action that should not be interrupted in the innermost brackets
# | |
( set -m; (sleep 10; echo success > $tmpfile) & wait ) &
wait # wait will be interrupted by Ctrl+c
while [ ! -r $tmpfile ]; do
echo "waiting for $tmpfile"
sleep 1
done
result=`cat $tmpfile`
echo "result: '$result'"
This seems also to work with programs that install their own SIGINT handler like mpirun and mpiexec and so on.

Related

Bash script: how to give an alert when current program is killed

I'm trying to write a program using bash script. I'd like to give an alert when this program is killed.
The desired action is like this:
#!/bin/bash
... # The original program
if killed ; do
echo "trying to kill the demo program ... "
sleep 5s
echo "demo program killed"
fi
If you expect the signal to be delivered only to the running program and not to the shell running your script, then the basic synopsis might be:
#!/bin/bash
set -euo pipefail
sleep 1 & # The original program
pid="$!"
kill -9 "$pid" # Pick your lethal signal
wait -n "$pid" && status=0 || status="$?"
((status > 128)) && echo "${pid} got signal $((status - 128))" 1>&2 || :
Presumably, here^^^ we run the program in the background, so that we can send it the kill signal from the same snippet. In practice you would probably run it in the foreground and then check its $? return status instead of the status from wait -n.
If the killing signal is delivered to your entire process group, including the shell running your script, that is a different story. For the signal KILL (9) in particular, there is no way to mask it or report it. When the shell gets it, it dies. For other signals you could set up a trap command (see man bash for its syntax) to handle the signal gracefully in the script while still being able to detect and report the child process’ death from the signal.

Why does asynchronous child become a zombie altough parent waits for it?

I use the following code to start some long running task asynchronously but detect if it fails at the very beginning:
sleep 0.3 &
long_running &
wait -n
# [Error handling]
# Do other stuff.
# Wait for completion of 'long_running'.
wait -n
# [Error handling]
If I SIGINT (using Ctrl+C) the script during waiting for the long running child, the long running task just continues and gets a zombie after completion.
Furthermore the parent script consumes full CPU. I have to SIGKILL the parent to get rid of the processes.
I know that SIGINT is ignored by the child (which is probably the reason it continues till completion), but why does the parent get into such confusing state?
It works (like expected) if I kill the child when SIGINT has been received (the commented trap below), but I want to understand why it does not work the other way.
Below is the complete script. Please refer also to https://gist.github.com/doak/08b69c500c91a7fade9f2c61882c93b4 for an even more complete example/try-out:
#!/usr/bin/env bash
count="count=100000" # Adapt that 'dd' lasts about 3s. Comment out to run forever.
#fail=YES # Demonstrates failure of background task.
# This would work.
#trap "jobs -p | xargs kill" SIGINT
echo executing long running asynchronous task ...
sleep 0.3 &
dd if=/dev/zero$fail of=/dev/null bs=1M $count &
wait -n
errcode=$?
if test $errcode -ne -0; then
echo "failed"
exit $errcode
fi
echo waiting for completion ...
wait -n
errcode=$?
echo finished
exit $errcode
It could be that my question is related to this C question, although it discusses the system call wait(): Possible for parent process to HANG on "wait" step if child process becomes a ZOMBIE or CRASHES?

Basic signal communication

I have a bash script, its contents are:
function foo {
echo "Foo!"
}
function clean {
echo "exiting"
}
trap clean EXIT
trap foo SIGTERM
echo "Starting process with PID: $$"
while :
do
sleep 60
done
I execute this on a terminal with:
./my_script
And then do this on another terminal
kill -SIGTERM my_script_pid # obviously the PID is the one echoed from my_script
I would expect to see the message "Foo!" from the other terminal, but It's not working. SIGKILL works and the EXIT code is also executed.
Using Ctrl-C on the terminal my_script is running on triggers foo normally, but somehow I can't send the signal SIGTERM from another terminal to this one.
Replacing SIGTERM with any other signal doesn't change a thing (besides Ctrl-C not triggering anything, it was actually mapped to SIGUSR1 in the beginning).
It may be worth mentioning that just the signal being trapped is not working, and any other signal is having the default behaviour.
So, what am I missing? Any clues?
EDIT: I also just checked it wasn't a privilege issue (that would be weird as I'm able to send SIGKILL anyway), but it doesn't seem to be that.
Bash runs the trap only after sleep returns.
To understand why, think in C / Unix internals: While the signal is dispatched instantly to bash, the corresponding signal handler that bash has setup only does something like received_sigterm = true.
Only when sleep returns, and the wait system call which bash issued after starting the sleep process returns also, bash resumes its normal work and executes your trap (after noticing received_sigterm).
This is done this way for good reasons: Doing I/O (or generally calling into the kernel) directly from a signal handler generally results in undefined behaviour as far as I know - although I can't tell more about that.
Apart from this technical reason, there is another reason why bash doesn't run the trap instantly: This would actually undermine the fundamental semantics of the shell. Jobs (this includes pipelines) are executed strictly in a sequential manner unless you explicitly mess with background jobs.
The PID that you originally print is for the bash instance that executes your script, not for the sleep process that it is waiting on. During sleep, the signal is likely to be ignored.
If you want to see the effect that you are looking for, replace sleep with a shorter-lived process like ps.
function foo {
echo "Foo!"
}
function clean {
echo "exiting"
}
trap clean EXIT
trap foo SIGTERM
echo "Starting process with PID: $$"
while :
do
ps > /dev/null
done

How do I stop a signal from killing my Bash script?

I want an infinite loop to keep on running, and only temporarily be interrupted by a kill signal. I've tried SIGINT, SIGUSR1, SIGUSR2. All of them seem to halt the loop. I even tried SIGINFO, but that wasn't supported by Linux.
#!/bin/bash
echo $$ > /tmp/pid # Save the pid
function do_something {
echo "I am doing stuff" #let's do this now, and go back to doing the thing that is to be done over and over again.
#exit
}
while :
do
echo "This should be done over and over again, but always wait for someething else to be done in between"
trap do_something SIGINT
while `true`
do
sleep 1 #so we're waiting for that other thing.
done
done
My code runs the function once, after getting a INT signal from another script, but then never again. It halts.
EDIT: Although I accidentally put en exit at the end of the function, here on Stack Overflow, I didn't in the actual code I used. Either way, it made no difference. The solution is SIGTERM as described by Tiago.
I believe you're looking for SIGTERM:
Example:
#! /bin/bash
trap -- '' SIGINT SIGTERM
while true; do
date +%F_%T
sleep 1
done
Running this example cTRL+C won't kill it nor kill <pid> you can however kill it with kill -9 <pid>.
If you don't want CTRL+Z to interrupt use: trap -- '' SIGINT SIGTERM SIGTSTP
trap the signal, then either react to it appropriately, in the function associate with the trap, or ignore it by for example associate : as command to get executed when the signal occurs.
to trap signals, bash knows the trap command
Reset trap to former action by executing trap with signal name only.
Therefore you want to (i think that's what you say you want with "only temporarily be interrupted by a kill signal"):
trap the signal at the begin of your script: trap signal custom_action
just before you want the signal to allow interrupting your script, execute: trap signal
At the end of that phase, trap again by: signal custom_action
to specify signals, you can also use their respective signal numbers. A list of signal names is printed with the command:
trap -l
the default signal sent by kill is SIGTERM (15), unless you specify a different signal after the kill command
don't exit in your do_something function. Simply let the function return to the section in your code where it was interrupted when the signal occured.
The mentioned ":" command has another potential use in your script, if you feel thusly inclined:
while :
do
sleep 1
done
can be an alternative to "while true" - no backticks needed for that, btw.
You just want to ignore the exit status.
If you want your script to keep running and not exit, without worrying about handling traps.
(my_command) || true
The parentheses execute that command in a subshell. The true is for compatibility with set -e, if you use it. It simply overrides the status to always report a success.
See the source.
I found this question to be helpful:
How to run a command before a Bash script exits?

How to stop infinite loop in bash script gracefully?

I need to run application in every X seconds, so, as far as cron does not work with seconds this way, I wrote a bash script with infinite loop having X seconds sleep in it.
When I have to stop the running script manually, I would like to do it in a correct way - let the application complete functioning and just do not enter the loop for the next time.
Do you have any idea, how this can be achieved?
I thought about passing arguments, but I could not find how to pass argument to running script.
You could trap a signal, say SIGUSR1:
echo "My pid is: $$"
finish=0
trap 'finish=1' SIGUSR1
while (( finish != 1 ))
do
stuff
sleep 42
done
Then, when you want to exit the loop at the next iteration:
kill -SIGUSR1 pid
Where pid is the process-id of the script. If the signal is raised during the sleep, it will wake (sleep sleeps until any signal occurs).
You may pass some arguments by a file. On each iteration you may read this file and see if your running condtions got changed.
The following structure may be appropriate if you have to do some cleanup anyway. Use kill as shown in cdarke's answer.
#===============================================================
# FUNCTION DEFINITIONS
#===============================================================
# . . .
#=== FUNCTION ================================================
# NAME: cleanup
# DESCRIPTION:
#===============================================================
cleanup ()
{
# wait for active children, remove empty logfile, ..., exit
exit 0
} # ---------- end of function cleanup ----------
#===============================================================
# TRAPS
#===============================================================
trap cleanup SIGUSR1
#===============================================================
# MAIN SCRIPT
#===============================================================
echo -e "Script '$0' / PID ${$}"
while : ; do # infinite loop
# . . .
sleep 10
done
# . . .
#===============================================================
# STATISTICS / CLEANUP
#===============================================================
cleanup
The trap solution posted earlier is good for large loops, but cumbersome for the common case where you are just looping over a single or a few commands. In this case I recommend the following solution:
while whatever; do
command || break
done
This will exit the loop if command has a non-zero exit code, which will happen if it is interrupted. So unless command handles SIGINT, pressing ctrl-C will both stop the current command and exit the loop.
Edit: After reading your question more closely, I see that you want the current command to continue executing. In that case this solution does not apply.

Resources