Why doesn't bash script wait for its child processes to finish before exiting the parent script on receiving Sigterm? - bash

trap exit_gracefully TERM
exit_gracefully() {
echo "start.sh got SIGTERM"
echo "Sending TERM to child_process_1_pid: ${child_process_1_pid}"
echo "Sending TERM to child_process_2_pid: ${child_process_2_pid}"
echo "Sending TERM to child_process_3_pid: ${child_process_3_pid}"
kill -TERM ${child_process_1_pid} ${child_process_2_pid} ${child_process_3_pid}
}
consul watch -http-addr=${hostIP}:8500 -type=key -key=${consul_kv_key} /child_process_1.sh 2>&1 &
child_process_1_pid=$!
/child_process_2.sh &
child_process_2_pid=$!
/child_process_3.sh &
child_process_3_pid=$!
/healthcheck.sh &
/configure.sh
# sleep 36500d &
# wait $!
wait ${child_process_1_pid} ${child_process_2_pid} ${child_process_3_pid}
echo 'start.sh exiting'
start.sh is the parent script. When SIGTERM is trapped, it is forwarded to 3 of its child processes. If # sleep 36500d &
# wait $! is commented (removed from code), start.sh does not wait for child_process_1.sh, child_process_2.sh and child_process_3.sh to receive SIGTERM, handle it and exit before exiting the parent process (start.sh), instead start.sh exits immediately on receiving SIGTERM even before child processes could handle it. But if I keep sleep 36500d & wait $! uncommented in the code, parent process (start.sh) waits for child processes (1, 2, and 3) to receive, handle Sigterm and exit first before exiting itself.
Why does this difference exist even though I wait for 3 pids (of child processes) in either case? Why should I need sleep when I am waiting for 3 pids?

Receiving a signal will cause any wait command in progress to return.
This is because the purpose of a signal is to interrupt a process in whatever it's currently doing.
All the effects you see are simply the result of the current wait returning, the handler running, and the script continuing from where the wait exited.

Related

Bash: spawn child processes that quit when parent script quits

I'd like to spawn several child processes in Bash, but I'd like the parent script to remain running, such that signals send to the parent script also affect the spawned children processes.
This doesn't do that:
parent.bash:
#!/usr/bin/bash
spawnedChildProcess1 &
spawnedChildProcess2 &
spawnedChildProcess3 &
parent.bash ends immediately, and the spawned processes continue running independently of it.
If you want your parent to not exit immediately after spawning its children, then as Barmar told you, use wait.
Now, if you want your child processes to die when the parent exits, then send them a SIGTERM (or any other) signal just before exiting:
kill 0
(0 is a special PID that means "every process in the parent's process group")
If the parent may exit unexpectedly (e.g. upon receiving a signal, because of a set -u or set -e, etc.) then you can use trap to send the TERM signal to the child just before exiting:
trap 'kill 0' EXIT
[edit] In conclusion, this is how you should write your parent process:
#!/usr/bin/bash
trap 'kill 0' EXIT
...
spawnedChildProcess1 &
spawnedChildProcess2 &
spawnedChildProcess3 &
...
wait
That way no need to send your signal to a negative process ID since this won't cover all the cases when your parent process may die.
Use wait to have the parent process wait for all the children to exit.
#!/usr/bin/bash
spawnedChildProcess1 &
spawnedChildProcess2 &
spawnedChildProcess3 &
wait
Keyboard signals are sent to the entire process group, so typing Ctl-c will kill the children and the parent.

Handling signals in bash script when it started as a background process

Okay, I have a script like this:
trap 'echo "CTRL-C signal was caught!" ' SIGINT
for ((i=0; i<15; i++))
do
sleep 3
done
When I start my script in a usual way, it immediately reacts to CTRL-C command and echo "CTRL-C signal was caught!", even if there is a sleep 3 command. But when I run my script as a background process, it waits until sleep 3 command is finished, and then echo "CTRL-C signal was caught!"
I do not understand this. I think trap should wait until previous command is finished, and then it should echo something, like when it started as a background process.
Bash manual states:
Background processes (...) are immune to keyboard-generated signals.
If bash is waiting for a command to complete and receives a signal for which a trap has been set, the trap will not be executed until the command completes.
Consequently:
If your script runs in the foreground: when you press "Ctrl-C", a SIGINT is sent to the currently running process (i.e. the sleep command). The exit status of sleep tells Bash that it was interrupted by the SIGINT signal and bash calls your trap.
If your script runs in the background, then the backgrounded sleep does not receive the signal and the SIGINT trap is only executed once sleep has ended.

Why does asynchronous child become a zombie altough parent waits for it?

I use the following code to start some long running task asynchronously but detect if it fails at the very beginning:
sleep 0.3 &
long_running &
wait -n
# [Error handling]
# Do other stuff.
# Wait for completion of 'long_running'.
wait -n
# [Error handling]
If I SIGINT (using Ctrl+C) the script during waiting for the long running child, the long running task just continues and gets a zombie after completion.
Furthermore the parent script consumes full CPU. I have to SIGKILL the parent to get rid of the processes.
I know that SIGINT is ignored by the child (which is probably the reason it continues till completion), but why does the parent get into such confusing state?
It works (like expected) if I kill the child when SIGINT has been received (the commented trap below), but I want to understand why it does not work the other way.
Below is the complete script. Please refer also to https://gist.github.com/doak/08b69c500c91a7fade9f2c61882c93b4 for an even more complete example/try-out:
#!/usr/bin/env bash
count="count=100000" # Adapt that 'dd' lasts about 3s. Comment out to run forever.
#fail=YES # Demonstrates failure of background task.
# This would work.
#trap "jobs -p | xargs kill" SIGINT
echo executing long running asynchronous task ...
sleep 0.3 &
dd if=/dev/zero$fail of=/dev/null bs=1M $count &
wait -n
errcode=$?
if test $errcode -ne -0; then
echo "failed"
exit $errcode
fi
echo waiting for completion ...
wait -n
errcode=$?
echo finished
exit $errcode
It could be that my question is related to this C question, although it discusses the system call wait(): Possible for parent process to HANG on "wait" step if child process becomes a ZOMBIE or CRASHES?

WAIT for "1 of many process" to finish

Is there any built in feature in bash to wait for 1 out of many processes to finish? And then kill remaining processes?
pids=""
# Run five concurrent processes
for i in {1..5}; do
( longprocess ) &
# store PID of process
pids+=" $!"
done
if [ "one of them finished" ]; then
kill_rest_of_them;
fi
I'm looking for "one of them finished" command. Is there any?
bash 4.3 added a -n flag to the built-in wait command, which causes the script to wait for the next child to complete. The -p option to jobs also means you don't need to store the list of pids, as long as there aren't any background jobs that you don't want to wait on.
# Run five concurrent processes
for i in {1..5}; do
( longprocess ) &
done
wait -n
kill $(jobs -p)
Note that if there is another background job other than the 5 long processes that completes first, wait -n will exit when it completes. That would also mean you would still want to save the list of process ids to kill, rather than killing whatever jobs -p returns.
It's actually fairly easy:
#!/bin/bash
set -o monitor
killAll()
{
# code to kill all child processes
}
# call function to kill all children on SIGCHLD from the first one
trap killAll SIGCHLD
# start your child processes here
# now wait for them to finish
wait
You just have to be really careful in your script to use only bash built-in commands. You can't start any utilities that run as a separate process after you issue the trap command - any child process exiting will send SIGCHLD - and you can't tell where it came from.

How do you stop two concurrent processes?

In my web development workflow, I have two processes:
watching my folder for changes
previewing my site in the browser
I want to be able to run them and then later stop them both at the same time. I've seen everyone suggesting using the ampersand operator:
process_1 & process_2
But pressing Ctrl + C only stops the second one. I have to kill the first one manually. What am I missing in this approach?
You can have the foreground script explicitly kill the subprocesses in response to SIGINT:
#!/bin/sh
trap 'kill $pid1 $pid2' 2
cmd1 &
pid1=$!
cmd2 &
pid2=$!
wait
There is a race condition in this example: if you send SIGINT to the parent before pid1 is assigned, kill will emit a warning message and neither child will be terminated. If you send SIGINT before pid2 is assigned, only the process running cmd1 will be sent the signal. In either case, the parent will continue running and a second SIGINT can be sent. Some versions of kill allow you to avoid this race condition by sending a signal to the process group using kill -$$, but not all versions of kill support that usage. (Note that if either child process does not terminate in response to the signal, the parent will not exit but continue waiting.)
How about writing two scripts, one containing
./process_1 &
./process_2 &
and a second containing
killall process_1
killall process_2
Start both prcesses by running the first script, and end them by running the second script.

Resources