How does trap / kill work in bash on Linux? - bash

My sample file
traptest.sh:
#!/bin/bash
trap 'echo trapped' TERM
while :
do
sleep 1000
done
$ traptest.sh &
[1] 4280
$ kill %1 <-- kill by job number works
Terminated
trapped
$ traptest.sh &
[1] 4280
$ kill 4280 <-- kill by process id doesn't work?
(sound of crickets, process isn't killed)
If I remove the trap statement completely, kill process-id works again?
Running some RHEL 2.6.18-194.11.4.el5 at work. I am really confused by this behaviour, is it right?

kill [pid]
send the TERM signal exclusively to the specified PID.
kill %1
send the TERM signal to the job #1's entire process group, in this case to the script pid + his children (sleep).
I've verified that with strace on sleep process and on script process
Anyway, someone got a similar problem here (but with SIGINT instead of SIGTERM): http://www.vidarholen.net/contents/blog/?p=34.
Quoting the most important sentence:
kill -INT %1 sends the signal to the job’s process group, not the backgrounded pid!

This is expected behavior. Default signal sent by kill is SIGTERM, which you are catching by your trap. Consider this:
#!/bin/bash
# traptest.sh
trap "echo Booh!" SIGINT SIGTERM
echo "pid is $$"
while : # This is the same as "while true".
do
a=1
done
(sleep really creates a new process and the behavior is clearer with my example I guess).
So if you run traptest.sh in one terminal and kill TRAPTEST_PROCESS_ID from another terminal, output in the terminal running traptest will be Booh! as expected (and the process will NOT be killed). If you try sending kill -s HUP TRAPTEST_PROCESS_ID, it will kill the traptest process.
This should clear up the %1 confusion.
Note: the code example is taken from tldp

Davide Berra explained the difference between kill %<jobspec> and kill <PID>, but not how that difference results in what you observed. After all, Unix signal handlers should be called pretty much instantaneously, so why does sending a SIGTERM to the script alone not trigger its trap handler?
The bash man page explains why, in the last paragraph of the SIGNALS section:
If bash is waiting for a command to complete and receives a signal for
which a trap has been set, the trap will not be executed until the
command completes.
So, the signal was delivered immediately, but the handler execution was deferred until sleep exited.
Hence, with kill %<jobspec>:
Both the script and sleep received SIGTERM
bash registered the signal, noticed that a trap was set for it, and queued the handler for future execution
sleep exited immediately
bash noted sleep's exit, and ran the trap handler
whereas with kill <script_PID>:
Only the script received SIGTERM
bash registered the signal, noticed that a trap was set for it, and queued the handler for future execution
sleep exited after 1000 seconds
bash noted sleep's exit, and ran the trap handler
Obviously, you didn't want long enough to see that last bit. :)
If you're interested in the gory details, download the bash source code and look in trap.c, specifically the trap_handler() and run_pending_traps() functions.

Related

How can I send a signal without the shell waiting for the currently running program to finish?

If I send a signal using kill, it seems to wait until the current program (in this example sleep 1000) finishes running. When I instead send SIGINT via pressing Ctrl+C in the shell, it receives the interrupt immediately however.
What I want, however, is for the interrupt to be received immediately after sending the signal via kill. Also, why does it behave like I would want it to when I press Ctrl+C?
#!/usr/bin/env sh
int_after_a_while() {
local pid=$1
sleep 2
echo "Attempting to kill $pid with SIGINT"
# Here I want to kill the process immediately, but it waits until sleep finishes
kill -s INT $pid
}
trap "echo Interrupt received!" INT
int_after_a_while $$ &
sleep 1000
I would appreciate any help on this issue. Thanks in advance!
As noted in the referenced answer https://unix.stackexchange.com/questions/282525/why-did-my-trap-not-trigger/282631#282631 the shell will normally wait for a utility to complete before running a trap. Some alternatives are:
Start the long running process in the background, then wait for it using the wait builtin. When a trapped signal is received during such a wait, the wait is interrupted and the trap is taken. Unfortunately, the exit status of wait does not distinguish between the child process exiting on a signal and a trap occurring. For example
sleep 1000 &
p=$!
wait "$p"
Send a signal to the whole process group via kill -s INT 0. The effect is much like if the user had pressed Ctrl+C, but may be more extreme than you want if your script is run from another script.
Use a shell such as zsh or FreeBSD sh that supports set -o trapsasync which allows running traps while waiting for a foreground job.

Forwarding signals in bash script which is submitted on the cluster

I have a launch.sh script which I submit on the cluster with
bsub $settings < launch.sh
This launch.sh bash script looks simplified as the following:
function trap_with_arg() {
func="$1" ; shift
for sig ; do
echo "$ES Installing trap for signal $sig"
trap "$func $sig" "$sig"
done
}
function signalHandler() {
# do stuff depending in what stage the script is
}
# Setup the Trap
trap_with_arg signalHandler SIGINT SIGTERM SIGUSR1 SIGUSR2
./start.sh
mpirun process.sh
./end.sh
Where process.sh calls two binaries (as an example) as
./binaryA
./binaryB
My question is the following:
The cluster already sends SIGUSR1 (approx. 10min before SIGTERM) to the process (I think this is the bash shell running my launch.sh script).
At the moment I catch this signal in the launch.sh script and call some signal handler. The problem is, this signal handler only gets executed (at least what I know) after a running command is finished (e.g. that might be mpirun process.sh or ./start.sh )
How can I forward these signals to make the commands/binaries exit gracefully. Forwarding for example to process.sh (mpirun, as I experienced, already forwards somehow these received signals (how does it do that?)
What is the proper way of forwarding signals, (e.g. also to the binaries binaryA, binaryB ?
I have no really good clue how to do this? Making the commands execute in background, creating a child process?
Thanks for some enlightenment :-)
From bash manual at http://www.gnu.org/software/bash/manual/html_node/Signals.html:
If Bash is waiting for a command to complete and receives a signal for which a trap has been set, the trap will not be executed until the command completes. When Bash is waiting for an asynchronous command via the wait builtin, the reception of a signal for which a trap has been set will cause the wait builtin to return immediately with an exit status greater than 128, immediately after which the trap is executed.
Thus, the solution seems to place commands in background and use "wait":
something &
wait

shell script process termination issue

/bin/sh -version
GNU sh, version 1.14.7(1)
exitfn () {
# Resore signal handling for SIGINT
echo "exiting with trap" >> /tmp/logfile
rm -f /var/run/lockfile.pid # Growl at user,
exit # then exit script.
}
trap 'exitfn; exit' SIGINT SIGQUIT SIGTERM SIGKILL SIGHUP
The above is my function in shell script.
I want to call it in some special conditions...like
when:
"kill -9" fires on pid of this script
"ctrl + z" press while it is running on -x mode
server reboots while script is executing ..
In short, with any kind of interrupt in script, should do some action
eg. rm -f /var/run/lockfile.pid
but my above function is not working properly; it works only for terminal close or "ctrl + c"
Kindly don't suggest to upgrade "bash / sh" version.
SIGKILL cannot be trapped by the trap command, or by any process. It is a guarenteed kill signal, that by it's definition cannot be trapped. Thus upgrading you sh/bash will not work anyway.
You can't trap kill -9 that's the whole point of it, to destroy processes violently that don't respond to other signals (there's a workaround for this, see below).
The server reboot should first deliver a signal to your script which should be caught with what you have.
As to the CTRL-Z, that also gives you a signal, SIGSTOP from memory, so you may want to add that. Though that wouldn't normally be a reason to shut down your process since it may be then put into the background and restarted (with bg).
As to what do do for those situations where your process dies without a catchable signal (like the -9 case), the program should check for that on startup.
By that, I mean lockfile.pid should store the actual PID of the process that created it (by using echo $$ >/var/run/myprog_lockfile.pid for example) and, if you try to start your program, it should check for the existence of that process.
If the process doesn't exist, or it exists but isn't the right one (based on name usually), your new process should delete the pidfile and carry on as if it was never there. If the old process both exists and is the right one, your new process should log a message and exit.

In Bash, how can I run multiple infinitely-running commands and cancel them all with ^C?

I would like to write a script that runs a few different infinitely running commands, e.g.
run_development_webserver.sh
watch_sass_files_and_compile_them.sh
watch_coffeescript_files_and_compile_them.sh
I'd like to run each of them in parallel, and kill them all by hitting ^C. Is this possible, and if so how can I do this?
I'll let Admiral Ackbar answer this one.
#!/bin/bash -e
run_development_webserver.sh &
PIDS[0]=$!
watch_sass_files_and_compile_them.sh &
PIDS[1]=$!
watch_coffeescript_files_and_compile_them.sh &
PIDS[2]=$!
trap "kill ${PIDS[*]}" SIGINT
wait
This starts each of your commands in the background (&), puts their process ids ($!) into an array (PIDS[x]=$!), tells bash to kill them all (${PIDS[*]) when your script gets a SIGINT signal (Ctrl+C), and then waits for all the processes to exit.
And I'll proactively mention that "kill ${PIDS[*]}" expands PIDS when you create the trap; if you change the double quotes (") to single quotes ('), it will be expanded when the trap is executed, which means you can add more processes to PIDS after you set the trap and it will kill them too.
If you have a stubborn process that doesn't want to quit after a Ctrl+C (SIGINT), you may need to send it a stronger kill signal - SIGTERM or even SIGKILL (use this as a last resort, it unconditionally kills the process without giving it a chance to clean up). First, try changing the trap line to this:
trap "kill -TERM ${PIDS[*]}" SIGINT
If it doesn't respond to the SIGTERM, save that process's pid separately, say in STUBBORN_PID, and use this:
trap "kill ${PIDS[*]}; kill -KILL $STUBBORN_PID" SIGINT
Remember, this one won't let the stubborn process clean up, but if it needs to die and isn't, you may need to use it anyway.

shell script to spawn processes, terminate children on SIGTERM

I want to write a shell script that spawns several long-running processes in the background, then hangs around. Upon receiving SIGTERM, I want all the subprocesses to terminate as well.
Basically, I want a "master process".
Here's what I got so far:
#!/bin/sh
sleep 600 &
PID1="$!"
sleep 600 &
PID2="$!"
# supposedly this should kill the child processes on SIGTERM.
trap "kill $PID1 $PID2" SIGTERM
wait
The above script fails with trap: 10: SIGTERM: bad trap.
Edit: I'm using Ubuntu 9.04
This works for me:
trap "trap - SIGTERM && kill -- -$$" SIGINT SIGTERM EXIT
kill -- -$$ sends a SIGTERM to the whole process group, thus killing also descendants.
Specifying signal EXIT is useful when using set -e (more details here).
Joe's answer put me on the right track.
I also found out I should trap more signals to cover my bases.
Final script looks like this:
#!/bin/sh
sleep 600 &
PID1="$!"
sleep 600 &
PID2="$!"
trap "kill $PID1 $PID2" exit INT TERM
wait
I suspect your /bin/sh is not a Bash (though you tagged the question as 'Bash').
From the message I guess it's a DASH. Check its manual or just fix your shebang if you need to write Bash code.
This script looks correct and works for me as expected.
How do you send the SIGTERM signal to the "master process"?
Maybe you should execute kill -l to check which signals are supported.
As the error message suggests you send signal "10" which your system doesn't seem to recognize.
And next time you should add operating system, shell version, kernel, ... for such a question

Resources