shell script to spawn processes, terminate children on SIGTERM - bash

I want to write a shell script that spawns several long-running processes in the background, then hangs around. Upon receiving SIGTERM, I want all the subprocesses to terminate as well.
Basically, I want a "master process".
Here's what I got so far:
#!/bin/sh
sleep 600 &
PID1="$!"
sleep 600 &
PID2="$!"
# supposedly this should kill the child processes on SIGTERM.
trap "kill $PID1 $PID2" SIGTERM
wait
The above script fails with trap: 10: SIGTERM: bad trap.
Edit: I'm using Ubuntu 9.04

This works for me:
trap "trap - SIGTERM && kill -- -$$" SIGINT SIGTERM EXIT
kill -- -$$ sends a SIGTERM to the whole process group, thus killing also descendants.
Specifying signal EXIT is useful when using set -e (more details here).

Joe's answer put me on the right track.
I also found out I should trap more signals to cover my bases.
Final script looks like this:
#!/bin/sh
sleep 600 &
PID1="$!"
sleep 600 &
PID2="$!"
trap "kill $PID1 $PID2" exit INT TERM
wait

I suspect your /bin/sh is not a Bash (though you tagged the question as 'Bash').
From the message I guess it's a DASH. Check its manual or just fix your shebang if you need to write Bash code.

This script looks correct and works for me as expected.
How do you send the SIGTERM signal to the "master process"?
Maybe you should execute kill -l to check which signals are supported.
As the error message suggests you send signal "10" which your system doesn't seem to recognize.
And next time you should add operating system, shell version, kernel, ... for such a question

Related

How can I send a signal without the shell waiting for the currently running program to finish?

If I send a signal using kill, it seems to wait until the current program (in this example sleep 1000) finishes running. When I instead send SIGINT via pressing Ctrl+C in the shell, it receives the interrupt immediately however.
What I want, however, is for the interrupt to be received immediately after sending the signal via kill. Also, why does it behave like I would want it to when I press Ctrl+C?
#!/usr/bin/env sh
int_after_a_while() {
local pid=$1
sleep 2
echo "Attempting to kill $pid with SIGINT"
# Here I want to kill the process immediately, but it waits until sleep finishes
kill -s INT $pid
}
trap "echo Interrupt received!" INT
int_after_a_while $$ &
sleep 1000
I would appreciate any help on this issue. Thanks in advance!
As noted in the referenced answer https://unix.stackexchange.com/questions/282525/why-did-my-trap-not-trigger/282631#282631 the shell will normally wait for a utility to complete before running a trap. Some alternatives are:
Start the long running process in the background, then wait for it using the wait builtin. When a trapped signal is received during such a wait, the wait is interrupted and the trap is taken. Unfortunately, the exit status of wait does not distinguish between the child process exiting on a signal and a trap occurring. For example
sleep 1000 &
p=$!
wait "$p"
Send a signal to the whole process group via kill -s INT 0. The effect is much like if the user had pressed Ctrl+C, but may be more extreme than you want if your script is run from another script.
Use a shell such as zsh or FreeBSD sh that supports set -o trapsasync which allows running traps while waiting for a foreground job.

Trying to close all child processes when I interrupt my bash script

I have written a bash script to carry out some tests on my system. The tests run in the background and in parallel. The tests can take a long time and sometimes I may wish to abort the tests part way through.
If I Control+C then it aborts the parent script, but leaves the various children running. I wish to make it so that I can hit Control+C or otherwise to quit and then kill all child processes running in the background. I have a bit of code that does the job if I'm running running the background jobs directly from the terminal, but it doesn't work in my script.
I have a minimal working example.
I have tried using trap in combination with pgrep -P $$.
#!/bin/bash
trap 'kill -n 2 $(pgrep -P $$)' 2
sleep 10 &
wait
I was hoping that on hitting control+c (SIGINT) would kill everything that the script started but it actually says:
./breakTest.sh: line 1: kill: (3220) - No such process
This number changes, but doesn't seem to apply to any running processes, so I don't know where it is coming from.
I guess if the contents of the trap command get evaluated where the trap command occurs then it might explain the outcome. The 3220 pid might be for pgrep itself.
I'd appreciate some insight here
Thanks
I have found a solution using pkill. This example also deals with many child processes.
#!/bin/bash
trap 'pkill -P $$' SIGINT SIGTERM
for i in {1..10}; do
sleep 10 &
done
wait
This appears to kill all the child processes elegantly. Though I don't properly understand what the issue was with my original code, apart from sending the correct signal.
in bash whenever you you use & after a command it places that command as a background job ( this background jobs are called job_spec ) which is incremented by one until you exit that terminal session. You can use the jobs command to get the list of the background jobs running. To work with this jobs you have to use the % with the job id. The jobs command also accept other options such as jobs -p to see the proces sids of all jobs , jobs -p %JOB_SPEC to see the process of id of that particular job.
#!/usr/bin/env bash
trap 'kill -9 %1' 2
sleep 10 &
wait
or
#!/usr/bin/env bash
trap 'kill -9 $(jobs -p %1)' 2
sleep 10 &
wait
I implemented something like this few years back, you can take a look at it async bash
You can try something like the following:
pkill -TERM -P <your_parent_id_here>

How does trap / kill work in bash on Linux?

My sample file
traptest.sh:
#!/bin/bash
trap 'echo trapped' TERM
while :
do
sleep 1000
done
$ traptest.sh &
[1] 4280
$ kill %1 <-- kill by job number works
Terminated
trapped
$ traptest.sh &
[1] 4280
$ kill 4280 <-- kill by process id doesn't work?
(sound of crickets, process isn't killed)
If I remove the trap statement completely, kill process-id works again?
Running some RHEL 2.6.18-194.11.4.el5 at work. I am really confused by this behaviour, is it right?
kill [pid]
send the TERM signal exclusively to the specified PID.
kill %1
send the TERM signal to the job #1's entire process group, in this case to the script pid + his children (sleep).
I've verified that with strace on sleep process and on script process
Anyway, someone got a similar problem here (but with SIGINT instead of SIGTERM): http://www.vidarholen.net/contents/blog/?p=34.
Quoting the most important sentence:
kill -INT %1 sends the signal to the job’s process group, not the backgrounded pid!
This is expected behavior. Default signal sent by kill is SIGTERM, which you are catching by your trap. Consider this:
#!/bin/bash
# traptest.sh
trap "echo Booh!" SIGINT SIGTERM
echo "pid is $$"
while : # This is the same as "while true".
do
a=1
done
(sleep really creates a new process and the behavior is clearer with my example I guess).
So if you run traptest.sh in one terminal and kill TRAPTEST_PROCESS_ID from another terminal, output in the terminal running traptest will be Booh! as expected (and the process will NOT be killed). If you try sending kill -s HUP TRAPTEST_PROCESS_ID, it will kill the traptest process.
This should clear up the %1 confusion.
Note: the code example is taken from tldp
Davide Berra explained the difference between kill %<jobspec> and kill <PID>, but not how that difference results in what you observed. After all, Unix signal handlers should be called pretty much instantaneously, so why does sending a SIGTERM to the script alone not trigger its trap handler?
The bash man page explains why, in the last paragraph of the SIGNALS section:
If bash is waiting for a command to complete and receives a signal for
which a trap has been set, the trap will not be executed until the
command completes.
So, the signal was delivered immediately, but the handler execution was deferred until sleep exited.
Hence, with kill %<jobspec>:
Both the script and sleep received SIGTERM
bash registered the signal, noticed that a trap was set for it, and queued the handler for future execution
sleep exited immediately
bash noted sleep's exit, and ran the trap handler
whereas with kill <script_PID>:
Only the script received SIGTERM
bash registered the signal, noticed that a trap was set for it, and queued the handler for future execution
sleep exited after 1000 seconds
bash noted sleep's exit, and ran the trap handler
Obviously, you didn't want long enough to see that last bit. :)
If you're interested in the gory details, download the bash source code and look in trap.c, specifically the trap_handler() and run_pending_traps() functions.

In Bash, how can I run multiple infinitely-running commands and cancel them all with ^C?

I would like to write a script that runs a few different infinitely running commands, e.g.
run_development_webserver.sh
watch_sass_files_and_compile_them.sh
watch_coffeescript_files_and_compile_them.sh
I'd like to run each of them in parallel, and kill them all by hitting ^C. Is this possible, and if so how can I do this?
I'll let Admiral Ackbar answer this one.
#!/bin/bash -e
run_development_webserver.sh &
PIDS[0]=$!
watch_sass_files_and_compile_them.sh &
PIDS[1]=$!
watch_coffeescript_files_and_compile_them.sh &
PIDS[2]=$!
trap "kill ${PIDS[*]}" SIGINT
wait
This starts each of your commands in the background (&), puts their process ids ($!) into an array (PIDS[x]=$!), tells bash to kill them all (${PIDS[*]) when your script gets a SIGINT signal (Ctrl+C), and then waits for all the processes to exit.
And I'll proactively mention that "kill ${PIDS[*]}" expands PIDS when you create the trap; if you change the double quotes (") to single quotes ('), it will be expanded when the trap is executed, which means you can add more processes to PIDS after you set the trap and it will kill them too.
If you have a stubborn process that doesn't want to quit after a Ctrl+C (SIGINT), you may need to send it a stronger kill signal - SIGTERM or even SIGKILL (use this as a last resort, it unconditionally kills the process without giving it a chance to clean up). First, try changing the trap line to this:
trap "kill -TERM ${PIDS[*]}" SIGINT
If it doesn't respond to the SIGTERM, save that process's pid separately, say in STUBBORN_PID, and use this:
trap "kill ${PIDS[*]}; kill -KILL $STUBBORN_PID" SIGINT
Remember, this one won't let the stubborn process clean up, but if it needs to die and isn't, you may need to use it anyway.

Trapping SIGINT in a backgrounded process

I am trying to understand some sample code describing signal handling in bash. In Example 32-7 at http://tldp.org/LDP/abs/html/debugging.html, the writer's comments state that he is capturing a SIGINT, yet the trap is for EXIT.
{
trap "exit" SIGUSR1
sleep $interval; sleep $interval
while true; do
...
done; } & # Start a progress bar as a background process.
pid=$!
trap "echo !; kill -USR1 $pid; wait $pid" EXIT # To handle ^C.
Why does a trap of EXIT send the correct signal (SIGUSR1) to the backgroud process on a SIGINT (Ctl-C)?
Any help is appreciated explaining why this works.
EXIT is a special handler in trap for bash, it's not a signal. There is no exit signal. This trap gets executed whenever the bash processes terminates. So, what this does is make sure that if the user kills the bash process, SIGUSR1 is sent to the background process, which also is trapped and then executes 'exit' on that process. That makes sure if you kill the session, the background process doesn't live on forever but also quits (which is probably what the comment is trying to explain).
edit: I misread this question in my original response
The EXIT pseudo-signal is raised both on normal exit and when the script is being interrupted.

Resources