Exit a bash script when one of the subprocesses exits - bash

I'm running several background processes in my script
run_gui()
{
exec ... # the real commands here
}
The functions run_ai1(), run_ai2 are analogous.
Then I run the functions and do the needed piping
run_gui &
run_ai1 &
run_ai2 &
while true; do
while true; do
read -u $ai1_outfd line || echo "Nothing read"
if [[ $line ]]; then
: # processing
fi
done
sleep $turndelay
while true; do
read -u $ai2_outfd line || echo "nothing read"
if [[ $line ]]; then
: # processing
fi
done
sleep $turndelay
done
If any of those three processes exits, I want to check their exit codes and terminate the rest of the processes. For example, if run_ai2 exits with exit code 3, then I want to stop the processes run_ai1 and run_gui and exit the main script with exit code 1. The correct exitcodes for the different backgrounds processes may differ.
The problem is: how can I detect it? There's the command wait but I don't know in advance which script will finish first. I could run wait as a background process - but it's becoming even more clumsy.
Can you help me please?

The following script monitors test child processes (in the example, sleep+false and sleep+true) and reports their PID and exit code:
#!/bin/bash
set -m
trap myhandler CHLD
myhandler() {
echo sigchld received
cat /tmp/foo
}
( sleep 5; false; echo "exit p1=$?" ) > /tmp/foo &
p1=$!
echo "p1=$p1"
( sleep 3; true; echo "exit p2=$?" ) > /tmp/foo &
p2=$!
echo "p2=$p2"
pstree -p $$
wait
The result is:
p1=3197
p2=3198
prueba(3196)─┬─prueba(3197)───sleep(3199)
├─prueba(3198)───sleep(3201)
└─pstree(3200)
sigchld received
sigchld received
exit p2=0
sigchld received
exit p1=1
It could be interesting to use SIGUSR1 instead of SIGCHLD; see here for an example: https://stackoverflow.com/a/12751700/4886927.
Also, inside the trap handler, it is posible to verify which child is still alive. Something like:
myhandler() {
if kill -0 $p1; then
echo "child1 is alive"
fi
if kill -0 $p2; then
echo "child2 is alive"
fi
}
or kill both childs when one of them dies:
myhandler() {
if kill -0 $p1 && kill -0 $p2; then
echo "all childs alive"
else
kill -9 $p1 $p2
fi
}

Related

How to check the return value of a certain process with its pid? [duplicate]

I have a command CMD called from my main bourne shell script that takes forever.
I want to modify the script as follows:
Run the command CMD in parallel as a background process (CMD &).
In the main script, have a loop to monitor the spawned command every few seconds. The loop also echoes some messages to stdout indicating progress of the script.
Exit the loop when the spawned command terminates.
Capture and report the exit code of the spawned process.
Can someone give me pointers to accomplish this?
1: In bash, $! holds the PID of the last background process that was executed. That will tell you what process to monitor, anyway.
4: wait <n> waits until the process with PID <n> is complete (it will block until the process completes, so you might not want to call this until you are sure the process is done), and then returns the exit code of the completed process.
2, 3: ps or ps | grep " $! " can tell you whether the process is still running. It is up to you how to understand the output and decide how close it is to finishing. (ps | grep isn't idiot-proof. If you have time you can come up with a more robust way to tell whether the process is still running).
Here's a skeleton script:
# simulate a long process that will have an identifiable exit code
(sleep 15 ; /bin/false) &
my_pid=$!
while ps | grep " $my_pid " # might also need | grep -v grep here
do
echo $my_pid is still in the ps output. Must still be running.
sleep 3
done
echo Oh, it looks like the process is done.
wait $my_pid
# The variable $? always holds the exit code of the last command to finish.
# Here it holds the exit code of $my_pid, since wait exits with that code.
my_status=$?
echo The exit status of the process was $my_status
This is how I solved it when I had a similar need:
# Some function that takes a long time to process
longprocess() {
# Sleep up to 14 seconds
sleep $((RANDOM % 15))
# Randomly exit with 0 or 1
exit $((RANDOM % 2))
}
pids=""
# Run five concurrent processes
for i in {1..5}; do
( longprocess ) &
# store PID of process
pids+=" $!"
done
# Wait for all processes to finish, will take max 14s
# as it waits in order of launch, not order of finishing
for p in $pids; do
if wait $p; then
echo "Process $p success"
else
echo "Process $p fail"
fi
done
The pid of a backgrounded child process is stored in $!.
You can store all child processes' pids into an array, e.g. PIDS[].
wait [-n] [jobspec or pid …]
Wait until the child process specified by each process ID pid or job specification jobspec exits and return the exit status of the last command waited for. If a job spec is given, all processes in the job are waited for. If no arguments are given, all currently active child processes are waited for, and the return status is zero. If the -n option is supplied, wait waits for any job to terminate and returns its exit status. If neither jobspec nor pid specifies an active child process of the shell, the return status is 127.
Use wait command you can wait for all child processes finish, meanwhile you can get exit status of each child processes via $? and store status into STATUS[]. Then you can do something depending by status.
I have tried the following 2 solutions and they run well. solution01 is
more concise, while solution02 is a little complicated.
solution01
#!/bin/bash
# start 3 child processes concurrently, and store each pid into array PIDS[].
process=(a.sh b.sh c.sh)
for app in ${process[#]}; do
./${app} &
PIDS+=($!)
done
# wait for all processes to finish, and store each process's exit code into array STATUS[].
for pid in ${PIDS[#]}; do
echo "pid=${pid}"
wait ${pid}
STATUS+=($?)
done
# after all processed finish, check their exit codes in STATUS[].
i=0
for st in ${STATUS[#]}; do
if [[ ${st} -ne 0 ]]; then
echo "$i failed"
else
echo "$i finish"
fi
((i+=1))
done
solution02
#!/bin/bash
# start 3 child processes concurrently, and store each pid into array PIDS[].
i=0
process=(a.sh b.sh c.sh)
for app in ${process[#]}; do
./${app} &
pid=$!
PIDS[$i]=${pid}
((i+=1))
done
# wait for all processes to finish, and store each process's exit code into array STATUS[].
i=0
for pid in ${PIDS[#]}; do
echo "pid=${pid}"
wait ${pid}
STATUS[$i]=$?
((i+=1))
done
# after all processed finish, check their exit codes in STATUS[].
i=0
for st in ${STATUS[#]}; do
if [[ ${st} -ne 0 ]]; then
echo "$i failed"
else
echo "$i finish"
fi
((i+=1))
done
As I see almost all answers use external utilities (mostly ps) to poll the state of the background process. There is a more unixesh solution, catching the SIGCHLD signal. In the signal handler it has to be checked which child process was stopped. It can be done by kill -0 <PID> built-in (universal) or checking the existence of /proc/<PID> directory (Linux specific) or using the jobs built-in (bash specific. jobs -l also reports the pid. In this case the 3rd field of the output can be Stopped|Running|Done|Exit . ).
Here is my example.
The launched process is called loop.sh. It accepts -x or a number as an argument. For -x is exits with exit code 1. For a number it waits num*5 seconds. In every 5 seconds it prints its PID.
The launcher process is called launch.sh:
#!/bin/bash
handle_chld() {
local tmp=()
for((i=0;i<${#pids[#]};++i)); do
if [ ! -d /proc/${pids[i]} ]; then
wait ${pids[i]}
echo "Stopped ${pids[i]}; exit code: $?"
else tmp+=(${pids[i]})
fi
done
pids=(${tmp[#]})
}
set -o monitor
trap "handle_chld" CHLD
# Start background processes
./loop.sh 3 &
pids+=($!)
./loop.sh 2 &
pids+=($!)
./loop.sh -x &
pids+=($!)
# Wait until all background processes are stopped
while [ ${#pids[#]} -gt 0 ]; do echo "WAITING FOR: ${pids[#]}"; sleep 2; done
echo STOPPED
For more explanation see: Starting a process from bash script failed
#/bin/bash
#pgm to monitor
tail -f /var/log/messages >> /tmp/log&
# background cmd pid
pid=$!
# loop to monitor running background cmd
while :
do
ps ax | grep $pid | grep -v grep
ret=$?
if test "$ret" != "0"
then
echo "Monitored pid ended"
break
fi
sleep 5
done
wait $pid
echo $?
I would change your approach slightly. Rather than checking every few seconds if the command is still alive and reporting a message, have another process that reports every few seconds that the command is still running and then kill that process when the command finishes. For example:
#!/bin/sh
cmd() { sleep 5; exit 24; }
cmd & # Run the long running process
pid=$! # Record the pid
# Spawn a process that coninually reports that the command is still running
while echo "$(date): $pid is still running"; do sleep 1; done &
echoer=$!
# Set a trap to kill the reporter when the process finishes
trap 'kill $echoer' 0
# Wait for the process to finish
if wait $pid; then
echo "cmd succeeded"
else
echo "cmd FAILED!! (returned $?)"
fi
Our team had the same need with a remote SSH-executed script which was timing out after 25 minutes of inactivity. Here is a solution with the monitoring loop checking the background process every second, but printing only every 10 minutes to suppress an inactivity timeout.
long_running.sh &
pid=$!
# Wait on a background job completion. Query status every 10 minutes.
declare -i elapsed=0
# `ps -p ${pid}` works on macOS and CentOS. On both OSes `ps ${pid}` works as well.
while ps -p ${pid} >/dev/null; do
sleep 1
if ((++elapsed % 600 == 0)); then
echo "Waiting for the completion of the main script. $((elapsed / 60))m and counting ..."
fi
done
# Return the exit code of the terminated background process. This works in Bash 4.4 despite what Bash docs say:
# "If neither jobspec nor pid specifies an active child process of the shell, the return status is 127."
wait ${pid}
A simple example, similar to the solutions above. This doesn't require monitoring any process output. The next example uses tail to follow output.
$ echo '#!/bin/bash' > tmp.sh
$ echo 'sleep 30; exit 5' >> tmp.sh
$ chmod +x tmp.sh
$ ./tmp.sh &
[1] 7454
$ pid=$!
$ wait $pid
[1]+ Exit 5 ./tmp.sh
$ echo $?
5
Use tail to follow process output and quit when the process is complete.
$ echo '#!/bin/bash' > tmp.sh
$ echo 'i=0; while let "$i < 10"; do sleep 5; echo "$i"; let i=$i+1; done; exit 5;' >> tmp.sh
$ chmod +x tmp.sh
$ ./tmp.sh
0
1
2
^C
$ ./tmp.sh > /tmp/tmp.log 2>&1 &
[1] 7673
$ pid=$!
$ tail -f --pid $pid /tmp/tmp.log
0
1
2
3
4
5
6
7
8
9
[1]+ Exit 5 ./tmp.sh > /tmp/tmp.log 2>&1
$ wait $pid
$ echo $?
5
Another solution is to monitor processes via the proc filesystem (safer than ps/grep combo); when you start a process it has a corresponding folder in /proc/$pid, so the solution could be
#!/bin/bash
....
doSomething &
local pid=$!
while [ -d /proc/$pid ]; do # While directory exists, the process is running
doSomethingElse
....
else # when directory is removed from /proc, process has ended
wait $pid
local exit_status=$?
done
....
Now you can use the $exit_status variable however you like.
With this method, your script doesnt have to wait for the background process, you will only have to monitor a temporary file for the exit status.
FUNCmyCmd() { sleep 3;return 6; };
export retFile=$(mktemp);
FUNCexecAndWait() { FUNCmyCmd;echo $? >$retFile; };
FUNCexecAndWait&
now, your script can do anything else while you just have to keep monitoring the contents of retFile (it can also contain any other information you want like the exit time).
PS.: btw, I coded thinking in bash
My solution was to use an anonymous pipe to pass the status to a monitoring loop. There are no temporary files used to exchange status so nothing to cleanup. If you were uncertain about the number of background jobs the break condition could be [ -z "$(jobs -p)" ].
#!/bin/bash
exec 3<> <(:)
{ sleep 15 ; echo "sleep/exit $?" >&3 ; } &
while read -u 3 -t 1 -r STAT CODE || STAT="timeout" ; do
echo "stat: ${STAT}; code: ${CODE}"
if [ "${STAT}" = "sleep/exit" ] ; then
break
fi
done
how about ...
# run your stuff
unset PID
for process in one two three four
do
( sleep $((RANDOM%20)); echo hello from process $process; exit $((RANDOM%3)); ) & 2>&1
PID+=($!)
done
# (optional) report on the status of that stuff as it exits
for pid in "${PID[#]}"
do
( wait "$pid"; echo "process $pid complemted with exit status $?") &
done
# (optional) while we wait, monitor that stuff
while ps --pid "${PID[*]}" --ppid "${PID[*]}" --format pid,ppid,command,pcpu
do
sleep 5
done | xargs -i date '+%x %X {}'
# return non-zero if any are non zero
SUCCESS=0
for pid in "${PID[#]}"
do
wait "$pid" && ((SUCCESS++)) && echo "$pid OK" || echo "$pid returned $?"
done
echo "success for $SUCCESS out of ${#PID} jobs"
exit $(( ${#PID} - SUCCESS ))
This may be extending beyond your question, however if you're concerned about the length of time processes are running for, you may be interested in checking the status of running background processes after an interval of time. It's easy enough to check which child PIDs are still running using pgrep -P $$, however I came up with the following solution to check the exit status of those PIDs that have already expired:
cmd1() { sleep 5; exit 24; }
cmd2() { sleep 10; exit 0; }
pids=()
cmd1 & pids+=("$!")
cmd2 & pids+=("$!")
lasttimeout=0
for timeout in 2 7 11; do
echo -n "interval-$timeout: "
sleep $((timeout-lasttimeout))
# you can only wait on a pid once
remainingpids=()
for pid in ${pids[*]}; do
if ! ps -p $pid >/dev/null ; then
wait $pid
echo -n "pid-$pid:exited($?); "
else
echo -n "pid-$pid:running; "
remainingpids+=("$pid")
fi
done
pids=( ${remainingpids[*]} )
lasttimeout=$timeout
echo
done
which outputs:
interval-2: pid-28083:running; pid-28084:running;
interval-7: pid-28083:exited(24); pid-28084:running;
interval-11: pid-28084:exited(0);
Note: You could change $pids to a string variable rather than array to simplify things if you like.

bash wait exit on error code

Let's consider this as a starting point:
#!/bin/bash
set -e
echo "Sleeping..."
sleep 2 &
wait
echo "Done"
exit 0
I would like wait to exit the whole script if the background process exited with error. Introducing an error as such:
#!/bin/bash
set -e
echo "Sleeping..."
sleep SOMETHING_STRANGE_AND_WRONG &
wait
echo "Done"
exit 0
does echo "Done". I was expecting wait to exit the script, because of set -e.
I know that I can save the pid of sleep and check the return value of the background process this way:
#!/bin/bash
set -e
echo "Sleeping..."
sleep SOMETHING_STRANGE_AND_WRONG &
pid=$!
if wait $pid; then
echo "Success"
else
echo "Failure!"
exit 1
fi
echo "Done"
exit 0
However, this gets cumbersome when I have several such "sync points" within my script, and several subprocesses to wait for at each of these points.
I am not very interested in the error codes themselves, only that they're not success.
Is there a less verbose way to make wait fail and exit (because of set -e) if any of the subprocesses it was waiting for did not succeed?
Edit: I am looking for a solution where wait fails and exit if any of the subprocesses fails:
#!/bin/bash
set -e
echo "Sleeping..."
sleep SOMETHING_STRANGE_AND_WRONG &
sleep 2 &
wait
echo "Done"
exit 0
which I currently solve this way (which I find cumbersome):
#!/bin/bash
set -e
echo "Sleeping..."
pids=""
sleep SOMETHING_STRANGE_AND_WRONG &
pids+=" $!"
sleep 2 &
pids+=" $!"
for p in $pids; do
if wait $p; then
echo "Success"
else
echo "Failure"
exit 1
fi
done
echo "Done"
exit 0
Because your shebang lines are Bash I will give a Bash-specific (non-POSIX-portable) answer first (and a less elegant portable version below that).
Bash has a concise/elegant/robust way which responds to each child as they finish, not in a hardcoded loop order. The POSIX-portable version has to use a hardcoded loop, and is about as good as you can do portably. For both versions with small tweaks they can either handle-and-exit as soon as they encounter the first failure or wait until all have finished then handle-and-exit, and either way can then wait for parent-exit after all children or not (which is useful when not doing so may lead to race-conditions or zombies in your larger program).
The pertinent points regarding Bash's non-portable "wait" used in its version are:
optflag -f: forces "wait" to wait for id to terminate before returning its status, instead of returning when it changes status (you may or may not want this, depending on your use-case)
optflag -n: waits for a single job from the list of ids or, if no ids are supplied, any job, to complete and returns its exit status
exit status 127: is returned if none of the supplied arguments is a child of the shell, or if no arguments are supplied and the shell has no unwaited-for children
Here are the key logic-snippets.
Bash version:
set -e
declare -i err=0 werr=0
while wait -fn || werr=$?; ((werr != 127)); do
err=$werr
done
POSIX portable shell version:
set -e
werr=0
err=0
for pid in $pids; do
wait $pid || werr=$?
! [ $werr = 127 ] || break
err=$werr
done
Both versions include handling, optionally straight away or after all children have exited, and optionally waiting for parent-exit or not too (see lines for uncommenting).
Bash version:
#!/usr/bin/env bash
sleep 2 &
sleep SOMETHING_STRANGE_AND_WRONG &
sleep 1 &
set -e
declare -i err=0 werr=0
while wait -fn || werr=$?; ((werr != 127)); do
err=$werr
## To handle *as soon as* first failure happens uncomment this:
#((err == 0)) || break
done
## If you want to still wait for children to finish before exiting
## parent (even if you handle the failed child early) uncomment this:
#trap 'wait || :' EXIT
if ((err == 0)); then
echo "Success"
else
echo "Failure!"
exit $err
fi
POSIX portable shell version:
#!/usr/bin/env sh
pids=''
sleep 2 & pids="$pids $!"
sleep SOMETHING_STRANGE_AND_WRONG & pids="$pids $!"
sleep 1 & pids="$pids $!"
set -e
werr=0
err=0
for pid in $pids; do
wait $pid || werr=$?
! [ $werr = 127 ] || break
err=$werr
## To handle *as soon as* first failure happens uncomment this:
#[ $err = 0 ] || break
done
## If you want to still wait for children to finish before exiting
## parent (even if you handle the failed child early) uncomment this:
#trap 'wait || :' EXIT
if [ $err = 0 ]; then
echo "Success"
else
echo "Failure!"
exit $err
fi
Shorter would be
wait || exit $?
Or if message needed, if not already logged by failing process
wait || { echo "background failed: $?" >&2; exit 1;}
or a function could be used instead
exit_fail() {
echo "$1" >&2
exit 1
}
...
wait || exit_fail "background failed: $?"
I implemented the functionality by hand in a function.
# Wait for subprocesses with pids passed as arguments, exit if any failed.
# $1 info_string: to show the command that the failure/success relates to.
# $* list of pids to wait for and check.
wait_and_check () {
info_string=$1
shift
for p in $*; do
if wait $p; then
echo "$info_string process $p success"
else
echo "$info_string process $p Failure!"
exit 1
fi
done
}
Usage:
pids=""
for p in $bunch_of_stuff ; do
stuff_function $p &
pids+=" $!"
done
wait_and_check "stuff" $pids
This feels like many should have had the need for something similar, so I'm surprised there isn't a ready-made solution for that.
One shortcoming is that it's hard to check for the error code of the processes.

bash - getting exit codes from 2 called programs that work asynchronously together [duplicate]

I have a command CMD called from my main bourne shell script that takes forever.
I want to modify the script as follows:
Run the command CMD in parallel as a background process (CMD &).
In the main script, have a loop to monitor the spawned command every few seconds. The loop also echoes some messages to stdout indicating progress of the script.
Exit the loop when the spawned command terminates.
Capture and report the exit code of the spawned process.
Can someone give me pointers to accomplish this?
1: In bash, $! holds the PID of the last background process that was executed. That will tell you what process to monitor, anyway.
4: wait <n> waits until the process with PID <n> is complete (it will block until the process completes, so you might not want to call this until you are sure the process is done), and then returns the exit code of the completed process.
2, 3: ps or ps | grep " $! " can tell you whether the process is still running. It is up to you how to understand the output and decide how close it is to finishing. (ps | grep isn't idiot-proof. If you have time you can come up with a more robust way to tell whether the process is still running).
Here's a skeleton script:
# simulate a long process that will have an identifiable exit code
(sleep 15 ; /bin/false) &
my_pid=$!
while ps | grep " $my_pid " # might also need | grep -v grep here
do
echo $my_pid is still in the ps output. Must still be running.
sleep 3
done
echo Oh, it looks like the process is done.
wait $my_pid
# The variable $? always holds the exit code of the last command to finish.
# Here it holds the exit code of $my_pid, since wait exits with that code.
my_status=$?
echo The exit status of the process was $my_status
This is how I solved it when I had a similar need:
# Some function that takes a long time to process
longprocess() {
# Sleep up to 14 seconds
sleep $((RANDOM % 15))
# Randomly exit with 0 or 1
exit $((RANDOM % 2))
}
pids=""
# Run five concurrent processes
for i in {1..5}; do
( longprocess ) &
# store PID of process
pids+=" $!"
done
# Wait for all processes to finish, will take max 14s
# as it waits in order of launch, not order of finishing
for p in $pids; do
if wait $p; then
echo "Process $p success"
else
echo "Process $p fail"
fi
done
The pid of a backgrounded child process is stored in $!.
You can store all child processes' pids into an array, e.g. PIDS[].
wait [-n] [jobspec or pid …]
Wait until the child process specified by each process ID pid or job specification jobspec exits and return the exit status of the last command waited for. If a job spec is given, all processes in the job are waited for. If no arguments are given, all currently active child processes are waited for, and the return status is zero. If the -n option is supplied, wait waits for any job to terminate and returns its exit status. If neither jobspec nor pid specifies an active child process of the shell, the return status is 127.
Use wait command you can wait for all child processes finish, meanwhile you can get exit status of each child processes via $? and store status into STATUS[]. Then you can do something depending by status.
I have tried the following 2 solutions and they run well. solution01 is
more concise, while solution02 is a little complicated.
solution01
#!/bin/bash
# start 3 child processes concurrently, and store each pid into array PIDS[].
process=(a.sh b.sh c.sh)
for app in ${process[#]}; do
./${app} &
PIDS+=($!)
done
# wait for all processes to finish, and store each process's exit code into array STATUS[].
for pid in ${PIDS[#]}; do
echo "pid=${pid}"
wait ${pid}
STATUS+=($?)
done
# after all processed finish, check their exit codes in STATUS[].
i=0
for st in ${STATUS[#]}; do
if [[ ${st} -ne 0 ]]; then
echo "$i failed"
else
echo "$i finish"
fi
((i+=1))
done
solution02
#!/bin/bash
# start 3 child processes concurrently, and store each pid into array PIDS[].
i=0
process=(a.sh b.sh c.sh)
for app in ${process[#]}; do
./${app} &
pid=$!
PIDS[$i]=${pid}
((i+=1))
done
# wait for all processes to finish, and store each process's exit code into array STATUS[].
i=0
for pid in ${PIDS[#]}; do
echo "pid=${pid}"
wait ${pid}
STATUS[$i]=$?
((i+=1))
done
# after all processed finish, check their exit codes in STATUS[].
i=0
for st in ${STATUS[#]}; do
if [[ ${st} -ne 0 ]]; then
echo "$i failed"
else
echo "$i finish"
fi
((i+=1))
done
As I see almost all answers use external utilities (mostly ps) to poll the state of the background process. There is a more unixesh solution, catching the SIGCHLD signal. In the signal handler it has to be checked which child process was stopped. It can be done by kill -0 <PID> built-in (universal) or checking the existence of /proc/<PID> directory (Linux specific) or using the jobs built-in (bash specific. jobs -l also reports the pid. In this case the 3rd field of the output can be Stopped|Running|Done|Exit . ).
Here is my example.
The launched process is called loop.sh. It accepts -x or a number as an argument. For -x is exits with exit code 1. For a number it waits num*5 seconds. In every 5 seconds it prints its PID.
The launcher process is called launch.sh:
#!/bin/bash
handle_chld() {
local tmp=()
for((i=0;i<${#pids[#]};++i)); do
if [ ! -d /proc/${pids[i]} ]; then
wait ${pids[i]}
echo "Stopped ${pids[i]}; exit code: $?"
else tmp+=(${pids[i]})
fi
done
pids=(${tmp[#]})
}
set -o monitor
trap "handle_chld" CHLD
# Start background processes
./loop.sh 3 &
pids+=($!)
./loop.sh 2 &
pids+=($!)
./loop.sh -x &
pids+=($!)
# Wait until all background processes are stopped
while [ ${#pids[#]} -gt 0 ]; do echo "WAITING FOR: ${pids[#]}"; sleep 2; done
echo STOPPED
For more explanation see: Starting a process from bash script failed
#/bin/bash
#pgm to monitor
tail -f /var/log/messages >> /tmp/log&
# background cmd pid
pid=$!
# loop to monitor running background cmd
while :
do
ps ax | grep $pid | grep -v grep
ret=$?
if test "$ret" != "0"
then
echo "Monitored pid ended"
break
fi
sleep 5
done
wait $pid
echo $?
I would change your approach slightly. Rather than checking every few seconds if the command is still alive and reporting a message, have another process that reports every few seconds that the command is still running and then kill that process when the command finishes. For example:
#!/bin/sh
cmd() { sleep 5; exit 24; }
cmd & # Run the long running process
pid=$! # Record the pid
# Spawn a process that coninually reports that the command is still running
while echo "$(date): $pid is still running"; do sleep 1; done &
echoer=$!
# Set a trap to kill the reporter when the process finishes
trap 'kill $echoer' 0
# Wait for the process to finish
if wait $pid; then
echo "cmd succeeded"
else
echo "cmd FAILED!! (returned $?)"
fi
Our team had the same need with a remote SSH-executed script which was timing out after 25 minutes of inactivity. Here is a solution with the monitoring loop checking the background process every second, but printing only every 10 minutes to suppress an inactivity timeout.
long_running.sh &
pid=$!
# Wait on a background job completion. Query status every 10 minutes.
declare -i elapsed=0
# `ps -p ${pid}` works on macOS and CentOS. On both OSes `ps ${pid}` works as well.
while ps -p ${pid} >/dev/null; do
sleep 1
if ((++elapsed % 600 == 0)); then
echo "Waiting for the completion of the main script. $((elapsed / 60))m and counting ..."
fi
done
# Return the exit code of the terminated background process. This works in Bash 4.4 despite what Bash docs say:
# "If neither jobspec nor pid specifies an active child process of the shell, the return status is 127."
wait ${pid}
A simple example, similar to the solutions above. This doesn't require monitoring any process output. The next example uses tail to follow output.
$ echo '#!/bin/bash' > tmp.sh
$ echo 'sleep 30; exit 5' >> tmp.sh
$ chmod +x tmp.sh
$ ./tmp.sh &
[1] 7454
$ pid=$!
$ wait $pid
[1]+ Exit 5 ./tmp.sh
$ echo $?
5
Use tail to follow process output and quit when the process is complete.
$ echo '#!/bin/bash' > tmp.sh
$ echo 'i=0; while let "$i < 10"; do sleep 5; echo "$i"; let i=$i+1; done; exit 5;' >> tmp.sh
$ chmod +x tmp.sh
$ ./tmp.sh
0
1
2
^C
$ ./tmp.sh > /tmp/tmp.log 2>&1 &
[1] 7673
$ pid=$!
$ tail -f --pid $pid /tmp/tmp.log
0
1
2
3
4
5
6
7
8
9
[1]+ Exit 5 ./tmp.sh > /tmp/tmp.log 2>&1
$ wait $pid
$ echo $?
5
Another solution is to monitor processes via the proc filesystem (safer than ps/grep combo); when you start a process it has a corresponding folder in /proc/$pid, so the solution could be
#!/bin/bash
....
doSomething &
local pid=$!
while [ -d /proc/$pid ]; do # While directory exists, the process is running
doSomethingElse
....
else # when directory is removed from /proc, process has ended
wait $pid
local exit_status=$?
done
....
Now you can use the $exit_status variable however you like.
With this method, your script doesnt have to wait for the background process, you will only have to monitor a temporary file for the exit status.
FUNCmyCmd() { sleep 3;return 6; };
export retFile=$(mktemp);
FUNCexecAndWait() { FUNCmyCmd;echo $? >$retFile; };
FUNCexecAndWait&
now, your script can do anything else while you just have to keep monitoring the contents of retFile (it can also contain any other information you want like the exit time).
PS.: btw, I coded thinking in bash
My solution was to use an anonymous pipe to pass the status to a monitoring loop. There are no temporary files used to exchange status so nothing to cleanup. If you were uncertain about the number of background jobs the break condition could be [ -z "$(jobs -p)" ].
#!/bin/bash
exec 3<> <(:)
{ sleep 15 ; echo "sleep/exit $?" >&3 ; } &
while read -u 3 -t 1 -r STAT CODE || STAT="timeout" ; do
echo "stat: ${STAT}; code: ${CODE}"
if [ "${STAT}" = "sleep/exit" ] ; then
break
fi
done
how about ...
# run your stuff
unset PID
for process in one two three four
do
( sleep $((RANDOM%20)); echo hello from process $process; exit $((RANDOM%3)); ) & 2>&1
PID+=($!)
done
# (optional) report on the status of that stuff as it exits
for pid in "${PID[#]}"
do
( wait "$pid"; echo "process $pid complemted with exit status $?") &
done
# (optional) while we wait, monitor that stuff
while ps --pid "${PID[*]}" --ppid "${PID[*]}" --format pid,ppid,command,pcpu
do
sleep 5
done | xargs -i date '+%x %X {}'
# return non-zero if any are non zero
SUCCESS=0
for pid in "${PID[#]}"
do
wait "$pid" && ((SUCCESS++)) && echo "$pid OK" || echo "$pid returned $?"
done
echo "success for $SUCCESS out of ${#PID} jobs"
exit $(( ${#PID} - SUCCESS ))
This may be extending beyond your question, however if you're concerned about the length of time processes are running for, you may be interested in checking the status of running background processes after an interval of time. It's easy enough to check which child PIDs are still running using pgrep -P $$, however I came up with the following solution to check the exit status of those PIDs that have already expired:
cmd1() { sleep 5; exit 24; }
cmd2() { sleep 10; exit 0; }
pids=()
cmd1 & pids+=("$!")
cmd2 & pids+=("$!")
lasttimeout=0
for timeout in 2 7 11; do
echo -n "interval-$timeout: "
sleep $((timeout-lasttimeout))
# you can only wait on a pid once
remainingpids=()
for pid in ${pids[*]}; do
if ! ps -p $pid >/dev/null ; then
wait $pid
echo -n "pid-$pid:exited($?); "
else
echo -n "pid-$pid:running; "
remainingpids+=("$pid")
fi
done
pids=( ${remainingpids[*]} )
lasttimeout=$timeout
echo
done
which outputs:
interval-2: pid-28083:running; pid-28084:running;
interval-7: pid-28083:exited(24); pid-28084:running;
interval-11: pid-28084:exited(0);
Note: You could change $pids to a string variable rather than array to simplify things if you like.

Calling exit doesn't work as expected

I need to run several child processes in background and pipe data between them. When the script exits, I want to kill any remaining of them, so I added
trap cleanup EXIT
cleanup()
{
echo "Cleaning up!"
pkill -TERM -P $$
}
Since I need to react if one of the processes reports an error, I created wrapper functions. Anything that ends with fd is a previously opened file descriptor, connected to a FIFO pipe.
run_gui()
{
"$GAME_BIN" $args <&$gui_infd >&$gui_outfd # redirecting IO to some file descriptors
if [[ $? == 0 ]]; then
echo exiting ok
exit $OK_EXITCODE
else
exit $ERROR_EXITCODE
fi
}
The functions run_ai1(), run_ai2 are analogous.
run_ai1()
{
"$ai1" <&$ai1_infd >&$ai1_outfd
if [[ $? == 0 || $? == 1 || $? == 2 ]]; then
exit $OK_EXITCODE
else
exit $ERROR_EXITCODE
fi
}
run_ai2()
{
"$ai2" <&$ai2_infd >&$ai2_outfd
if [[ $? == 0 || $? == 1 || $? == 2 ]]; then
exit $OK_EXITCODE
else
exit $ERROR_EXITCODE
fi
}
Then I run the functions and do the needed piping
printinit 1 >&$ai1_infd
printinit 2 >&$ai2_infd
run_gui &
run_ai1 &
run_ai2 &
while true; do
echo "Started the loop"
while true; do
read -u $ai1_outfd line || echo "Nothing read"
echo $line
if [[ $line ]]; then
echo "$line" >&$gui_infd
echo "$line" >&$ai2_infd
if [[ "$line" == "END_TURN" ]]; then
break
fi
fi
done
sleep $turndelay
while true; do
read -u $ai2_outfd line || echo "nothing read"
echo $line
if [[ $line ]]; then
echo "$line" >&$gui_infd
echo "$line" >&$ai1_infd
if [[ "$line" == "END_TURN" ]]; then
break
fi
fi
done
sleep $turndelay
done
When $GAME_BIN exits, i.e. the GUI is closed by the close button, I can see the exiting ok message on the stdout, but the cleanup function is not called at all. When I add a manual call to cleanup before calling exit $OK_EXITCODE, although the processes are killed:
./game.sh: line 309: 9193 Terminated run_gui
./game.sh: line 309: 9194 Terminated run_ai1
./game.sh: line 309: 9195 Terminated run_ai2
./game.sh: line 309: 9203 Terminated sleep $turndelay
the loop runs anyway and the script doesn't exit, as it should (exit $OK_EXITCODE). The AI scripts are simple:
#!/bin/sh
while true; do
echo END_TURN
done
There is no wait call anywhere in my script. What am I doing wrong?
What's interesting: when I call jobs -p right after run_ai2 &, then I get 3 pids listed. On the other hand, when I invoke this command from the cleanup function - the output is empty.
Besides, why is the sleep $turndelay process terminated? It's not a child invoked process.
An EXIT trap fires when the trapping script exits. Your toplevel script isn't exiting here.
The trap isn't inherited by the sub-shell that your run_* functions are running under (from being run in the background) so it never triggers when the sub-shell's exit.
What you want is most likely what you did manually (though slightly incorrectly it sounded like).
You want the cleanup function called from run_gui when $GAME has exited. Something like this.
run_gui() {
"$GAME_BIN" $args <&$gui_infd >&$gui_outfd # redirecting IO to some file descriptors
ret=$?
cleanup
exit $ret
}
Then you'll just need to make sure that cleanup gets the right value of $$ (Which in bash it will, for your usage, even in a sub-shell since $$ in a sub-shell is the parent process ID but you might want to make that more explicit by setting up a handler in your main script for a signal and signalling the main script when run_gui terminates instead.)
I'd guess you are getting some child processes kicked off by a child process. Do this: in another window do a ps -ft pts/1 or whatever your tty is. Verify.
Also change the pkill to a kill $(jobs -p) and see if that works.

bash trap interrupt command but should exit on end of loop

I´ve asked Bash trap - exit only at the end of loop and the submitted solution works but while pressing CTRL-C the running command in the script (mp3convert with lame) will be interrupt and than the complete for loop will running to the end. Let me show you the simple script:
#!/bin/bash
mp3convert () { lame -V0 file.wav file.mp3 }
PreTrap() { QUIT=1 }
CleanUp() {
if [ ! -z $QUIT ]; then
rm -f $TMPFILE1
rm -f $TMPFILE2
echo "... done!" && exit
fi }
trap PreTrap SIGINT SIGTERM SIGTSTP
trap CleanUp EXIT
case $1 in
write)
while [ -n "$line" ]
do
mp3convert
[SOMEMOREMAGIC]
CleanUp
done
;;
QUIT=1
If I press CTRL-C while function mp3convert is running the lame command will be interrupt and then [SOMEMOREMAGIC] will execute before CleanUp is running. I don´t understand why the lame command will be interrupt and how I could avoid them.
Try to simplify the discussion above, I wrap up an easier understandable version of show-case script below. This script also HANDLES the "double control-C problem":
(Double control-C problem: If you hit control C twice, or three times, depending on how many wait $PID you used, those clean up can not be done properly.)
#!/bin/bash
mp3convert () {
echo "mp3convert..."; sleep 5; echo "mp3convert done..."
}
PreTrap() {
echo "in trap"
QUIT=1
echo "exiting trap..."
}
CleanUp() {
### Since 'wait $PID' can be interrupted by ^C, we need to protected it
### by the 'kill' loop ==> double/triple control-C problem.
while kill -0 $PID >& /dev/null; do wait $PID; echo "check again"; done
### This won't work (A simple wait $PID is vulnerable to double control C)
# wait $PID
if [ ! -z $QUIT ]; then
echo "clean up..."
exit
fi
}
trap PreTrap SIGINT SIGTERM SIGTSTP
#trap CleanUp EXIT
for loop in 1 2 3; do
(
echo "loop #$loop"
mp3convert
echo magic 1
echo magic 2
echo magic 3
) &
PID=$!
CleanUp
echo "done loop #$loop"
done
The kill -0 trick can be found in a comment of this link
When you hit Ctrl-C in a terminal, SIGINT gets sent to all processes in the foreground process group of that terminal, as described in this Stack Exchange "Unix & Linux" answer: How Ctrl C works. (The other answers in that thread are well worth reading, too). And that's why your mp3convert function gets interrupted even though you have set a SIGINT trap.
But you can get around that by running the mp3convert function in the background, as mattias mentioned. Here's a variation of your script that demonstrates the technique.
#!/usr/bin/env bash
myfunc()
{
echo -n "Starting $1 :"
for i in {1..7}
do
echo -n " $i"
sleep 1
done
echo ". Finished $1"
}
PreTrap() { QUIT=1; echo -n " in trap "; }
CleanUp() {
#Don't start cleanup until current run of myfunc is completed.
wait $pid
[[ -n $QUIT ]] &&
{
QUIT=''
echo "Cleaning up"
sleep 1
echo "... done!" && exit
}
}
trap PreTrap SIGINT SIGTERM SIGTSTP
trap CleanUp EXIT
for i in {a..e}
do
#Run myfunc in background but wait until it completes.
myfunc "$i" &
pid=$!
wait $pid
CleanUp
done
QUIT=1
When you hit Ctrl-C while myfunc is in the middle of a run, PreTrap prints its message and sets the QUIT flag, but myfunc continues running and CleanUp doesn't commence until the current myfunc run has finished.
Note that my version of CleanUp resets the QUIT flag. This prevents CleanUp from running twice.
This version removes the CleanUp call from the main loop and puts it inside the PreTrap function. It uses wait with no ID argument in PreTrap, which means we don't need to bother saving the PID of each child process. This should be ok since if we're in the trap we do want to wait for all child processes to complete before proceeding.
#!/bin/bash
# Yet another Trap demo...
myfunc()
{
echo -n "Starting $1 :"
for i in {1..5}
do
echo -n " $i"
sleep 1
done
echo ". Finished $1"
}
PreTrap() { echo -n " in trap "; wait; CleanUp; }
CleanUp() {
[[ -n $CLEAN ]] && { echo bye; exit; }
echo "Cleaning up"
sleep 1
echo "... done!"
CLEAN=1
exit
}
trap PreTrap SIGINT SIGTERM SIGTSTP
trap "echo exittrap; CleanUp" EXIT
for i in {a..c}
do
#Run myfunc in background but wait until it completes.
myfunc "$i" & wait $!
done
We don't really need to do myfunc "$i" & wait $! in this script, it could be simplified even further to myfunc "$i" & wait. But generally it's better to wait for a specific PID just in case there's some other process running in the background that we don't want to wait for.
Note that pressing Ctrl-C while CleanUp itself is running will interrupt the current foreground process (probably sleep in this demo).
One way of doing this would be to simply disable the interrupt until your program is done.
Some pseudo code follows:
#!/bin/bash
# First, store your stty settings and disable the interrupt
STTY=$(stty -g)
stty intr undef
#run your program here
runMp3Convert()
#restore stty settings
stty ${STTY}
# eof
Another idea would be to run your bash script in the background (if possible).
mp3convert.sh &
or even,
nohup mp3convert.sh &

Resources