Consider the following bash script s:
#!/bin/bash
echo "Setting trap"
trap 'ctrl_c' INT
function ctrl_c() {
echo "CTRL+C pressed"
exit
}
sleep 1000
When calling it ./s, it will sleep. If you press Ctrl+C it prints the message and exits.
Now call it, open another terminal and kill the corresponding bash pid with
-INT flag. It does not work or do anything. Why?
Kill without flags works but does not call the ctrl_c() function.
From the POSIX standard for the shell:
When a signal for which a trap has been set is received while the shell is waiting for the completion of a utility executing a foreground command, the trap associated with that signal shall not be executed until after the foreground command has completed.
To verify this, we can run strace on the shell executing your script. Upon sending the SIGINT from another terminal, bash just notes that the signal has been received and returns to waiting for its child, the sleep command, to finish:
rt_sigaction(SIGINT, {0x808a550, [], 0}, {0x80a4600, [], 0}, 8) = 0
waitpid(-1, 0xbfb56324, 0) = ? ERESTARTSYS
--- SIGINT {si_signo=SIGINT, si_code=SI_USER, si_pid=3556, si_uid=1000} ---
sigreturn({mask=[CHLD]}) = -1 EINTR (Interrupted system call)
waitpid(-1,
To make kill have the same effect as Ctrl-C, you should send SIGINT to the process group. The shell by default will put every process of a new command into its own pgrp:
$ ps -f -o pid,ppid,pgid,tty,comm -t pts/1
PID PPID PGID TT COMMAND
3460 3447 3460 pts/1 bash
29087 3460 29087 pts/1 \_ foo.sh
29120 29087 29087 pts/1 \_ sleep
$ kill -2 -29087
And now the trap runs and the shell exits:
waitpid(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGINT}], 0) = 29120
--- SIGINT {si_signo=SIGINT, si_code=SI_USER, si_pid=3556, si_uid=1000} ---
sigreturn({mask=[CHLD]}) = 29120
...
write(1, "CTRL+C pressed\n", 15) = 15
...
exit_group(0) = ?
When we use the ps aux command to find the PID values, we see that there are 2 processes, this program and sleep. Program process does not interrupt with -INT parameter as you say. However, when the -2 or -INT (same meaning) parameter is used for the sleep process, the process is interrupted as expected. The message prints and exits.I did not understand why the main process did not stop, but I wanted to show that the sleep process can be interrupted by this method. I hope it will be good for you.
sleep is not a shell built-in:
$ which sleep
/bin/sleep
This means that when you run sleep it will run as a separate child process. When a child process is created various things are copied from parent to child (inherited) and one of these is the signal mask. Note: only the mask. So if you had set:
trap '' INT
then sleep would have inherited that - it means ignore the signal. However, in your case you expect sleep (which is probably written in C) to execute a bash function. No can do - wrong language for a start. Imagine: it would have to execute bash to execute the bash function.
Caveat - details concerning signal handling can vary across platforms.
It is possible to make sleep a builtin, see here but that's probably more trouble than it is worth.
Related
I'm trying to write a program using bash script. I'd like to give an alert when this program is killed.
The desired action is like this:
#!/bin/bash
... # The original program
if killed ; do
echo "trying to kill the demo program ... "
sleep 5s
echo "demo program killed"
fi
If you expect the signal to be delivered only to the running program and not to the shell running your script, then the basic synopsis might be:
#!/bin/bash
set -euo pipefail
sleep 1 & # The original program
pid="$!"
kill -9 "$pid" # Pick your lethal signal
wait -n "$pid" && status=0 || status="$?"
((status > 128)) && echo "${pid} got signal $((status - 128))" 1>&2 || :
Presumably, here^^^ we run the program in the background, so that we can send it the kill signal from the same snippet. In practice you would probably run it in the foreground and then check its $? return status instead of the status from wait -n.
If the killing signal is delivered to your entire process group, including the shell running your script, that is a different story. For the signal KILL (9) in particular, there is no way to mask it or report it. When the shell gets it, it dies. For other signals you could set up a trap command (see man bash for its syntax) to handle the signal gracefully in the script while still being able to detect and report the child process’ death from the signal.
Okay, I have a script like this:
trap 'echo "CTRL-C signal was caught!" ' SIGINT
for ((i=0; i<15; i++))
do
sleep 3
done
When I start my script in a usual way, it immediately reacts to CTRL-C command and echo "CTRL-C signal was caught!", even if there is a sleep 3 command. But when I run my script as a background process, it waits until sleep 3 command is finished, and then echo "CTRL-C signal was caught!"
I do not understand this. I think trap should wait until previous command is finished, and then it should echo something, like when it started as a background process.
Bash manual states:
Background processes (...) are immune to keyboard-generated signals.
If bash is waiting for a command to complete and receives a signal for which a trap has been set, the trap will not be executed until the command completes.
Consequently:
If your script runs in the foreground: when you press "Ctrl-C", a SIGINT is sent to the currently running process (i.e. the sleep command). The exit status of sleep tells Bash that it was interrupted by the SIGINT signal and bash calls your trap.
If your script runs in the background, then the backgrounded sleep does not receive the signal and the SIGINT trap is only executed once sleep has ended.
I have a launch.sh script which I submit on the cluster with
bsub $settings < launch.sh
This launch.sh bash script looks simplified as the following:
function trap_with_arg() {
func="$1" ; shift
for sig ; do
echo "$ES Installing trap for signal $sig"
trap "$func $sig" "$sig"
done
}
function signalHandler() {
# do stuff depending in what stage the script is
}
# Setup the Trap
trap_with_arg signalHandler SIGINT SIGTERM SIGUSR1 SIGUSR2
./start.sh
mpirun process.sh
./end.sh
Where process.sh calls two binaries (as an example) as
./binaryA
./binaryB
My question is the following:
The cluster already sends SIGUSR1 (approx. 10min before SIGTERM) to the process (I think this is the bash shell running my launch.sh script).
At the moment I catch this signal in the launch.sh script and call some signal handler. The problem is, this signal handler only gets executed (at least what I know) after a running command is finished (e.g. that might be mpirun process.sh or ./start.sh )
How can I forward these signals to make the commands/binaries exit gracefully. Forwarding for example to process.sh (mpirun, as I experienced, already forwards somehow these received signals (how does it do that?)
What is the proper way of forwarding signals, (e.g. also to the binaries binaryA, binaryB ?
I have no really good clue how to do this? Making the commands execute in background, creating a child process?
Thanks for some enlightenment :-)
From bash manual at http://www.gnu.org/software/bash/manual/html_node/Signals.html:
If Bash is waiting for a command to complete and receives a signal for which a trap has been set, the trap will not be executed until the command completes. When Bash is waiting for an asynchronous command via the wait builtin, the reception of a signal for which a trap has been set will cause the wait builtin to return immediately with an exit status greater than 128, immediately after which the trap is executed.
Thus, the solution seems to place commands in background and use "wait":
something &
wait
I am executing a shell script in background from my tcl script. The tcl script ends execution after some time. At this point I assume the background shell script becomes orphan and is adopted by init.
set res [catch { exec sudo $script &}]
Now the problem is I am not able to signal my (orphaned) background script. But why? Ok it now belongs to init but why can't I signal it. Only sigkill seems to work and that kills it - I need to trigger the signal handler I've written to handle SIGUSR2
trap 'process' SIGUSR2
Why can't I signal my orphan background process? Is there no way this can be done? Or is there some workaround?
EDIT: Seems to work fine when the sleep is not involved. See sample code below:
trap 'kill `cat /var/run/sleep.pid`; foo' SIGUSR2;
foo(){ echo test; }
while true; do
echo -n .
sleep 100 &
echo ${!} > /var/run/sleep.pid
wait ${!}
done
Works fine when not orphaned - but in the case of orphan process I think the problem is the true pid of sleep gets overwritten and I'm not able to kill it when the trap arrives.
lets run a small script like that:
bash -c '(trap foo SIGUSR2;foo(){ echo test; };while true; do echo -n .;sleep 1;done) & echo $!'; read
It will fork a background process which just runs and outputs some dots. It will also output the PID of the process, which you can use to check and signal it.
$ ps -f 19489
UID PID PPID C STIME TTY STAT TIME CMD
michas 19489 1 0 23:45 pts/8 S 0:00 bash -c (trap foo SIGUS...
Because the forking shell died directly after running the command in background, the process is now owned by init (PPID=1).
Now you can signal the process to call the handler:
kill -USR2 19489
If you do, you will notice the "test" output at the terminal printing the dots.
There should be no difference, whether you start a background process from shell or tcl. If it runs you can send it a signal and if there is a handler, it will be called.
If it really does not answer to signals it might be blocked, waiting for something. For example in a sleep or waiting for some IO.
My sample file
traptest.sh:
#!/bin/bash
trap 'echo trapped' TERM
while :
do
sleep 1000
done
$ traptest.sh &
[1] 4280
$ kill %1 <-- kill by job number works
Terminated
trapped
$ traptest.sh &
[1] 4280
$ kill 4280 <-- kill by process id doesn't work?
(sound of crickets, process isn't killed)
If I remove the trap statement completely, kill process-id works again?
Running some RHEL 2.6.18-194.11.4.el5 at work. I am really confused by this behaviour, is it right?
kill [pid]
send the TERM signal exclusively to the specified PID.
kill %1
send the TERM signal to the job #1's entire process group, in this case to the script pid + his children (sleep).
I've verified that with strace on sleep process and on script process
Anyway, someone got a similar problem here (but with SIGINT instead of SIGTERM): http://www.vidarholen.net/contents/blog/?p=34.
Quoting the most important sentence:
kill -INT %1 sends the signal to the job’s process group, not the backgrounded pid!
This is expected behavior. Default signal sent by kill is SIGTERM, which you are catching by your trap. Consider this:
#!/bin/bash
# traptest.sh
trap "echo Booh!" SIGINT SIGTERM
echo "pid is $$"
while : # This is the same as "while true".
do
a=1
done
(sleep really creates a new process and the behavior is clearer with my example I guess).
So if you run traptest.sh in one terminal and kill TRAPTEST_PROCESS_ID from another terminal, output in the terminal running traptest will be Booh! as expected (and the process will NOT be killed). If you try sending kill -s HUP TRAPTEST_PROCESS_ID, it will kill the traptest process.
This should clear up the %1 confusion.
Note: the code example is taken from tldp
Davide Berra explained the difference between kill %<jobspec> and kill <PID>, but not how that difference results in what you observed. After all, Unix signal handlers should be called pretty much instantaneously, so why does sending a SIGTERM to the script alone not trigger its trap handler?
The bash man page explains why, in the last paragraph of the SIGNALS section:
If bash is waiting for a command to complete and receives a signal for
which a trap has been set, the trap will not be executed until the
command completes.
So, the signal was delivered immediately, but the handler execution was deferred until sleep exited.
Hence, with kill %<jobspec>:
Both the script and sleep received SIGTERM
bash registered the signal, noticed that a trap was set for it, and queued the handler for future execution
sleep exited immediately
bash noted sleep's exit, and ran the trap handler
whereas with kill <script_PID>:
Only the script received SIGTERM
bash registered the signal, noticed that a trap was set for it, and queued the handler for future execution
sleep exited after 1000 seconds
bash noted sleep's exit, and ran the trap handler
Obviously, you didn't want long enough to see that last bit. :)
If you're interested in the gory details, download the bash source code and look in trap.c, specifically the trap_handler() and run_pending_traps() functions.