kill all subprocesses of a daemon - bash

I am writing an /etc/init.d/mydaemon:
# ...
source functions # LSB compliant
EXEC=/usr/local/bin/mydaemon
PROG=mydaemon
function start() {
daemon --pidfile=/var/run/mydeamon.pid ${EXEC}
}
function stop() {
killproc ${PROG}
}
# ...
my /usr/local/bin/mydaemon:
#!/bin/bash
trap "trap TERM ; kill 0" TERM
binary with some args
AFAIK, this should work because:
daemon records the mydaemon's PID in /var/run/mydaemon.pid
killproc read that PID and send SIGTERM to that PID.
mydaemon trap this signal, disable the trap and send SIGTERM to the entire PGRP, including the process of binary with some args.
However this doesn't work. After stopping the service, mydaemon terminates, but binary is still running.
What am I missing, and what is the best practice for stopping the daemon and all its' children?
BTW:
When my /usr/local/bin/mydaemon is:
#!/bin/bash
binary with some args &
echo $! $$ > /var/run/mydaemon.pid
wait
It works properly, but this seems less robust to me, and there are times where this is not appropriate (when the binary invocation is less straight forward, or it has it's own children, etc).

If you give the parent process' id to pkill, it'll kill all the children:
pkill -TERM -P parentID

You can set up a trap, which takes care of the cleanup process when SIGINT is received. For example:
function cleanup { kill $CHILDPID; exit 0; }
trap cleanup SIGINT SIGTERM
See here for more examples.

For the specific scenario presented in the question, it is also worth considering the following option for /usr/local/bin/mydaemon:
#!/bin/bash
exec binary with some args
Rather than being run in a subprocess with a new PID, binary will instead take over the shell process PID, and hence receive the signals from the init script directly.

Related

Running Child Process In Sequential Statement Before Exiting Parent?

I'm trying to write a Bash script that, when it receives a SIGINT signal, creates a copy of itself before exiting. So, when a user tries to kill this script using a SIGINT signal a copy of the process reapppears.
trap "echo Exiting...?; ./ghoul.sh; exit 1" SIGINT
while :
do
echo Process Number $$, with PPID $PPID!
sleep 1
done
However, whenever I suspend the process and check ps -f, there are multiple processes of the script (children and children of children). The exit command never seems to run since it's waiting for the children to exit. I want to find a way to run the script in the trap statement and exit afterward while maintaining the resulting child process. Is there any way to do this besides creating the child as a background process?
I find it much simpler to put exit code into a function. For example, your unquoted echo contains a bare ? which is a glob (file expansion) character. To avoid the parent killing the child you can use disown, and yes, you need to run it in background.
Try this:
f_exit() {
echo 'Exiting...?'
./ghoul.sh &
disown -h %1
exit 1
}
trap "f_exit" SIGINT
while :
do
echo "Process Number $$, with PPID $PPID!"
sleep 1
done

Forwarding signals in bash script which is submitted on the cluster

I have a launch.sh script which I submit on the cluster with
bsub $settings < launch.sh
This launch.sh bash script looks simplified as the following:
function trap_with_arg() {
func="$1" ; shift
for sig ; do
echo "$ES Installing trap for signal $sig"
trap "$func $sig" "$sig"
done
}
function signalHandler() {
# do stuff depending in what stage the script is
}
# Setup the Trap
trap_with_arg signalHandler SIGINT SIGTERM SIGUSR1 SIGUSR2
./start.sh
mpirun process.sh
./end.sh
Where process.sh calls two binaries (as an example) as
./binaryA
./binaryB
My question is the following:
The cluster already sends SIGUSR1 (approx. 10min before SIGTERM) to the process (I think this is the bash shell running my launch.sh script).
At the moment I catch this signal in the launch.sh script and call some signal handler. The problem is, this signal handler only gets executed (at least what I know) after a running command is finished (e.g. that might be mpirun process.sh or ./start.sh )
How can I forward these signals to make the commands/binaries exit gracefully. Forwarding for example to process.sh (mpirun, as I experienced, already forwards somehow these received signals (how does it do that?)
What is the proper way of forwarding signals, (e.g. also to the binaries binaryA, binaryB ?
I have no really good clue how to do this? Making the commands execute in background, creating a child process?
Thanks for some enlightenment :-)
From bash manual at http://www.gnu.org/software/bash/manual/html_node/Signals.html:
If Bash is waiting for a command to complete and receives a signal for which a trap has been set, the trap will not be executed until the command completes. When Bash is waiting for an asynchronous command via the wait builtin, the reception of a signal for which a trap has been set will cause the wait builtin to return immediately with an exit status greater than 128, immediately after which the trap is executed.
Thus, the solution seems to place commands in background and use "wait":
something &
wait

Basic signal communication

I have a bash script, its contents are:
function foo {
echo "Foo!"
}
function clean {
echo "exiting"
}
trap clean EXIT
trap foo SIGTERM
echo "Starting process with PID: $$"
while :
do
sleep 60
done
I execute this on a terminal with:
./my_script
And then do this on another terminal
kill -SIGTERM my_script_pid # obviously the PID is the one echoed from my_script
I would expect to see the message "Foo!" from the other terminal, but It's not working. SIGKILL works and the EXIT code is also executed.
Using Ctrl-C on the terminal my_script is running on triggers foo normally, but somehow I can't send the signal SIGTERM from another terminal to this one.
Replacing SIGTERM with any other signal doesn't change a thing (besides Ctrl-C not triggering anything, it was actually mapped to SIGUSR1 in the beginning).
It may be worth mentioning that just the signal being trapped is not working, and any other signal is having the default behaviour.
So, what am I missing? Any clues?
EDIT: I also just checked it wasn't a privilege issue (that would be weird as I'm able to send SIGKILL anyway), but it doesn't seem to be that.
Bash runs the trap only after sleep returns.
To understand why, think in C / Unix internals: While the signal is dispatched instantly to bash, the corresponding signal handler that bash has setup only does something like received_sigterm = true.
Only when sleep returns, and the wait system call which bash issued after starting the sleep process returns also, bash resumes its normal work and executes your trap (after noticing received_sigterm).
This is done this way for good reasons: Doing I/O (or generally calling into the kernel) directly from a signal handler generally results in undefined behaviour as far as I know - although I can't tell more about that.
Apart from this technical reason, there is another reason why bash doesn't run the trap instantly: This would actually undermine the fundamental semantics of the shell. Jobs (this includes pipelines) are executed strictly in a sequential manner unless you explicitly mess with background jobs.
The PID that you originally print is for the bash instance that executes your script, not for the sleep process that it is waiting on. During sleep, the signal is likely to be ignored.
If you want to see the effect that you are looking for, replace sleep with a shorter-lived process like ps.
function foo {
echo "Foo!"
}
function clean {
echo "exiting"
}
trap clean EXIT
trap foo SIGTERM
echo "Starting process with PID: $$"
while :
do
ps > /dev/null
done

How does trap / kill work in bash on Linux?

My sample file
traptest.sh:
#!/bin/bash
trap 'echo trapped' TERM
while :
do
sleep 1000
done
$ traptest.sh &
[1] 4280
$ kill %1 <-- kill by job number works
Terminated
trapped
$ traptest.sh &
[1] 4280
$ kill 4280 <-- kill by process id doesn't work?
(sound of crickets, process isn't killed)
If I remove the trap statement completely, kill process-id works again?
Running some RHEL 2.6.18-194.11.4.el5 at work. I am really confused by this behaviour, is it right?
kill [pid]
send the TERM signal exclusively to the specified PID.
kill %1
send the TERM signal to the job #1's entire process group, in this case to the script pid + his children (sleep).
I've verified that with strace on sleep process and on script process
Anyway, someone got a similar problem here (but with SIGINT instead of SIGTERM): http://www.vidarholen.net/contents/blog/?p=34.
Quoting the most important sentence:
kill -INT %1 sends the signal to the job’s process group, not the backgrounded pid!
This is expected behavior. Default signal sent by kill is SIGTERM, which you are catching by your trap. Consider this:
#!/bin/bash
# traptest.sh
trap "echo Booh!" SIGINT SIGTERM
echo "pid is $$"
while : # This is the same as "while true".
do
a=1
done
(sleep really creates a new process and the behavior is clearer with my example I guess).
So if you run traptest.sh in one terminal and kill TRAPTEST_PROCESS_ID from another terminal, output in the terminal running traptest will be Booh! as expected (and the process will NOT be killed). If you try sending kill -s HUP TRAPTEST_PROCESS_ID, it will kill the traptest process.
This should clear up the %1 confusion.
Note: the code example is taken from tldp
Davide Berra explained the difference between kill %<jobspec> and kill <PID>, but not how that difference results in what you observed. After all, Unix signal handlers should be called pretty much instantaneously, so why does sending a SIGTERM to the script alone not trigger its trap handler?
The bash man page explains why, in the last paragraph of the SIGNALS section:
If bash is waiting for a command to complete and receives a signal for
which a trap has been set, the trap will not be executed until the
command completes.
So, the signal was delivered immediately, but the handler execution was deferred until sleep exited.
Hence, with kill %<jobspec>:
Both the script and sleep received SIGTERM
bash registered the signal, noticed that a trap was set for it, and queued the handler for future execution
sleep exited immediately
bash noted sleep's exit, and ran the trap handler
whereas with kill <script_PID>:
Only the script received SIGTERM
bash registered the signal, noticed that a trap was set for it, and queued the handler for future execution
sleep exited after 1000 seconds
bash noted sleep's exit, and ran the trap handler
Obviously, you didn't want long enough to see that last bit. :)
If you're interested in the gory details, download the bash source code and look in trap.c, specifically the trap_handler() and run_pending_traps() functions.

In Bash, how can I run multiple infinitely-running commands and cancel them all with ^C?

I would like to write a script that runs a few different infinitely running commands, e.g.
run_development_webserver.sh
watch_sass_files_and_compile_them.sh
watch_coffeescript_files_and_compile_them.sh
I'd like to run each of them in parallel, and kill them all by hitting ^C. Is this possible, and if so how can I do this?
I'll let Admiral Ackbar answer this one.
#!/bin/bash -e
run_development_webserver.sh &
PIDS[0]=$!
watch_sass_files_and_compile_them.sh &
PIDS[1]=$!
watch_coffeescript_files_and_compile_them.sh &
PIDS[2]=$!
trap "kill ${PIDS[*]}" SIGINT
wait
This starts each of your commands in the background (&), puts their process ids ($!) into an array (PIDS[x]=$!), tells bash to kill them all (${PIDS[*]) when your script gets a SIGINT signal (Ctrl+C), and then waits for all the processes to exit.
And I'll proactively mention that "kill ${PIDS[*]}" expands PIDS when you create the trap; if you change the double quotes (") to single quotes ('), it will be expanded when the trap is executed, which means you can add more processes to PIDS after you set the trap and it will kill them too.
If you have a stubborn process that doesn't want to quit after a Ctrl+C (SIGINT), you may need to send it a stronger kill signal - SIGTERM or even SIGKILL (use this as a last resort, it unconditionally kills the process without giving it a chance to clean up). First, try changing the trap line to this:
trap "kill -TERM ${PIDS[*]}" SIGINT
If it doesn't respond to the SIGTERM, save that process's pid separately, say in STUBBORN_PID, and use this:
trap "kill ${PIDS[*]}; kill -KILL $STUBBORN_PID" SIGINT
Remember, this one won't let the stubborn process clean up, but if it needs to die and isn't, you may need to use it anyway.

Resources