I have a command CMD called from my main bourne shell script that takes forever.
I want to modify the script as follows:
Run the command CMD in parallel as a background process (CMD &).
In the main script, have a loop to monitor the spawned command every few seconds. The loop also echoes some messages to stdout indicating progress of the script.
Exit the loop when the spawned command terminates.
Capture and report the exit code of the spawned process.
Can someone give me pointers to accomplish this?
1: In bash, $! holds the PID of the last background process that was executed. That will tell you what process to monitor, anyway.
4: wait <n> waits until the process with PID <n> is complete (it will block until the process completes, so you might not want to call this until you are sure the process is done), and then returns the exit code of the completed process.
2, 3: ps or ps | grep " $! " can tell you whether the process is still running. It is up to you how to understand the output and decide how close it is to finishing. (ps | grep isn't idiot-proof. If you have time you can come up with a more robust way to tell whether the process is still running).
Here's a skeleton script:
# simulate a long process that will have an identifiable exit code
(sleep 15 ; /bin/false) &
my_pid=$!
while ps | grep " $my_pid " # might also need | grep -v grep here
do
echo $my_pid is still in the ps output. Must still be running.
sleep 3
done
echo Oh, it looks like the process is done.
wait $my_pid
# The variable $? always holds the exit code of the last command to finish.
# Here it holds the exit code of $my_pid, since wait exits with that code.
my_status=$?
echo The exit status of the process was $my_status
This is how I solved it when I had a similar need:
# Some function that takes a long time to process
longprocess() {
# Sleep up to 14 seconds
sleep $((RANDOM % 15))
# Randomly exit with 0 or 1
exit $((RANDOM % 2))
}
pids=""
# Run five concurrent processes
for i in {1..5}; do
( longprocess ) &
# store PID of process
pids+=" $!"
done
# Wait for all processes to finish, will take max 14s
# as it waits in order of launch, not order of finishing
for p in $pids; do
if wait $p; then
echo "Process $p success"
else
echo "Process $p fail"
fi
done
The pid of a backgrounded child process is stored in $!.
You can store all child processes' pids into an array, e.g. PIDS[].
wait [-n] [jobspec or pid …]
Wait until the child process specified by each process ID pid or job specification jobspec exits and return the exit status of the last command waited for. If a job spec is given, all processes in the job are waited for. If no arguments are given, all currently active child processes are waited for, and the return status is zero. If the -n option is supplied, wait waits for any job to terminate and returns its exit status. If neither jobspec nor pid specifies an active child process of the shell, the return status is 127.
Use wait command you can wait for all child processes finish, meanwhile you can get exit status of each child processes via $? and store status into STATUS[]. Then you can do something depending by status.
I have tried the following 2 solutions and they run well. solution01 is
more concise, while solution02 is a little complicated.
solution01
#!/bin/bash
# start 3 child processes concurrently, and store each pid into array PIDS[].
process=(a.sh b.sh c.sh)
for app in ${process[#]}; do
./${app} &
PIDS+=($!)
done
# wait for all processes to finish, and store each process's exit code into array STATUS[].
for pid in ${PIDS[#]}; do
echo "pid=${pid}"
wait ${pid}
STATUS+=($?)
done
# after all processed finish, check their exit codes in STATUS[].
i=0
for st in ${STATUS[#]}; do
if [[ ${st} -ne 0 ]]; then
echo "$i failed"
else
echo "$i finish"
fi
((i+=1))
done
solution02
#!/bin/bash
# start 3 child processes concurrently, and store each pid into array PIDS[].
i=0
process=(a.sh b.sh c.sh)
for app in ${process[#]}; do
./${app} &
pid=$!
PIDS[$i]=${pid}
((i+=1))
done
# wait for all processes to finish, and store each process's exit code into array STATUS[].
i=0
for pid in ${PIDS[#]}; do
echo "pid=${pid}"
wait ${pid}
STATUS[$i]=$?
((i+=1))
done
# after all processed finish, check their exit codes in STATUS[].
i=0
for st in ${STATUS[#]}; do
if [[ ${st} -ne 0 ]]; then
echo "$i failed"
else
echo "$i finish"
fi
((i+=1))
done
As I see almost all answers use external utilities (mostly ps) to poll the state of the background process. There is a more unixesh solution, catching the SIGCHLD signal. In the signal handler it has to be checked which child process was stopped. It can be done by kill -0 <PID> built-in (universal) or checking the existence of /proc/<PID> directory (Linux specific) or using the jobs built-in (bash specific. jobs -l also reports the pid. In this case the 3rd field of the output can be Stopped|Running|Done|Exit . ).
Here is my example.
The launched process is called loop.sh. It accepts -x or a number as an argument. For -x is exits with exit code 1. For a number it waits num*5 seconds. In every 5 seconds it prints its PID.
The launcher process is called launch.sh:
#!/bin/bash
handle_chld() {
local tmp=()
for((i=0;i<${#pids[#]};++i)); do
if [ ! -d /proc/${pids[i]} ]; then
wait ${pids[i]}
echo "Stopped ${pids[i]}; exit code: $?"
else tmp+=(${pids[i]})
fi
done
pids=(${tmp[#]})
}
set -o monitor
trap "handle_chld" CHLD
# Start background processes
./loop.sh 3 &
pids+=($!)
./loop.sh 2 &
pids+=($!)
./loop.sh -x &
pids+=($!)
# Wait until all background processes are stopped
while [ ${#pids[#]} -gt 0 ]; do echo "WAITING FOR: ${pids[#]}"; sleep 2; done
echo STOPPED
For more explanation see: Starting a process from bash script failed
#/bin/bash
#pgm to monitor
tail -f /var/log/messages >> /tmp/log&
# background cmd pid
pid=$!
# loop to monitor running background cmd
while :
do
ps ax | grep $pid | grep -v grep
ret=$?
if test "$ret" != "0"
then
echo "Monitored pid ended"
break
fi
sleep 5
done
wait $pid
echo $?
I would change your approach slightly. Rather than checking every few seconds if the command is still alive and reporting a message, have another process that reports every few seconds that the command is still running and then kill that process when the command finishes. For example:
#!/bin/sh
cmd() { sleep 5; exit 24; }
cmd & # Run the long running process
pid=$! # Record the pid
# Spawn a process that coninually reports that the command is still running
while echo "$(date): $pid is still running"; do sleep 1; done &
echoer=$!
# Set a trap to kill the reporter when the process finishes
trap 'kill $echoer' 0
# Wait for the process to finish
if wait $pid; then
echo "cmd succeeded"
else
echo "cmd FAILED!! (returned $?)"
fi
Our team had the same need with a remote SSH-executed script which was timing out after 25 minutes of inactivity. Here is a solution with the monitoring loop checking the background process every second, but printing only every 10 minutes to suppress an inactivity timeout.
long_running.sh &
pid=$!
# Wait on a background job completion. Query status every 10 minutes.
declare -i elapsed=0
# `ps -p ${pid}` works on macOS and CentOS. On both OSes `ps ${pid}` works as well.
while ps -p ${pid} >/dev/null; do
sleep 1
if ((++elapsed % 600 == 0)); then
echo "Waiting for the completion of the main script. $((elapsed / 60))m and counting ..."
fi
done
# Return the exit code of the terminated background process. This works in Bash 4.4 despite what Bash docs say:
# "If neither jobspec nor pid specifies an active child process of the shell, the return status is 127."
wait ${pid}
A simple example, similar to the solutions above. This doesn't require monitoring any process output. The next example uses tail to follow output.
$ echo '#!/bin/bash' > tmp.sh
$ echo 'sleep 30; exit 5' >> tmp.sh
$ chmod +x tmp.sh
$ ./tmp.sh &
[1] 7454
$ pid=$!
$ wait $pid
[1]+ Exit 5 ./tmp.sh
$ echo $?
5
Use tail to follow process output and quit when the process is complete.
$ echo '#!/bin/bash' > tmp.sh
$ echo 'i=0; while let "$i < 10"; do sleep 5; echo "$i"; let i=$i+1; done; exit 5;' >> tmp.sh
$ chmod +x tmp.sh
$ ./tmp.sh
0
1
2
^C
$ ./tmp.sh > /tmp/tmp.log 2>&1 &
[1] 7673
$ pid=$!
$ tail -f --pid $pid /tmp/tmp.log
0
1
2
3
4
5
6
7
8
9
[1]+ Exit 5 ./tmp.sh > /tmp/tmp.log 2>&1
$ wait $pid
$ echo $?
5
Another solution is to monitor processes via the proc filesystem (safer than ps/grep combo); when you start a process it has a corresponding folder in /proc/$pid, so the solution could be
#!/bin/bash
....
doSomething &
local pid=$!
while [ -d /proc/$pid ]; do # While directory exists, the process is running
doSomethingElse
....
else # when directory is removed from /proc, process has ended
wait $pid
local exit_status=$?
done
....
Now you can use the $exit_status variable however you like.
With this method, your script doesnt have to wait for the background process, you will only have to monitor a temporary file for the exit status.
FUNCmyCmd() { sleep 3;return 6; };
export retFile=$(mktemp);
FUNCexecAndWait() { FUNCmyCmd;echo $? >$retFile; };
FUNCexecAndWait&
now, your script can do anything else while you just have to keep monitoring the contents of retFile (it can also contain any other information you want like the exit time).
PS.: btw, I coded thinking in bash
My solution was to use an anonymous pipe to pass the status to a monitoring loop. There are no temporary files used to exchange status so nothing to cleanup. If you were uncertain about the number of background jobs the break condition could be [ -z "$(jobs -p)" ].
#!/bin/bash
exec 3<> <(:)
{ sleep 15 ; echo "sleep/exit $?" >&3 ; } &
while read -u 3 -t 1 -r STAT CODE || STAT="timeout" ; do
echo "stat: ${STAT}; code: ${CODE}"
if [ "${STAT}" = "sleep/exit" ] ; then
break
fi
done
how about ...
# run your stuff
unset PID
for process in one two three four
do
( sleep $((RANDOM%20)); echo hello from process $process; exit $((RANDOM%3)); ) & 2>&1
PID+=($!)
done
# (optional) report on the status of that stuff as it exits
for pid in "${PID[#]}"
do
( wait "$pid"; echo "process $pid complemted with exit status $?") &
done
# (optional) while we wait, monitor that stuff
while ps --pid "${PID[*]}" --ppid "${PID[*]}" --format pid,ppid,command,pcpu
do
sleep 5
done | xargs -i date '+%x %X {}'
# return non-zero if any are non zero
SUCCESS=0
for pid in "${PID[#]}"
do
wait "$pid" && ((SUCCESS++)) && echo "$pid OK" || echo "$pid returned $?"
done
echo "success for $SUCCESS out of ${#PID} jobs"
exit $(( ${#PID} - SUCCESS ))
This may be extending beyond your question, however if you're concerned about the length of time processes are running for, you may be interested in checking the status of running background processes after an interval of time. It's easy enough to check which child PIDs are still running using pgrep -P $$, however I came up with the following solution to check the exit status of those PIDs that have already expired:
cmd1() { sleep 5; exit 24; }
cmd2() { sleep 10; exit 0; }
pids=()
cmd1 & pids+=("$!")
cmd2 & pids+=("$!")
lasttimeout=0
for timeout in 2 7 11; do
echo -n "interval-$timeout: "
sleep $((timeout-lasttimeout))
# you can only wait on a pid once
remainingpids=()
for pid in ${pids[*]}; do
if ! ps -p $pid >/dev/null ; then
wait $pid
echo -n "pid-$pid:exited($?); "
else
echo -n "pid-$pid:running; "
remainingpids+=("$pid")
fi
done
pids=( ${remainingpids[*]} )
lasttimeout=$timeout
echo
done
which outputs:
interval-2: pid-28083:running; pid-28084:running;
interval-7: pid-28083:exited(24); pid-28084:running;
interval-11: pid-28084:exited(0);
Note: You could change $pids to a string variable rather than array to simplify things if you like.
I have a shell script that calls some data processing functions. These functions can be long-running.
I want to update the script to interrupt the running process in case a certain "status" is seen from an external source. Otherwise the program should complete normally.
I have written a monitor_status function and called is asynchronously. The function sends a kill command to the main process in case the status is found. I have the following questions on this
in case the kill -15 $ is invoked from monitor_status function, the cleanup function gets called twice. how can I prevent that?
If the main process completes normally, how should I terminate the monitor_status function?
Also please suggest if there is a better way to handle such a scenario and any other improvements I can make to the script
The bash script looks something like this:
#!/bin/bash
function cleanup() {
echo "cleanup invoked"
if [[ -n "${child}" ]]; then
echo "Stopping the child"
kill "${child}" >/dev/null 2>&1 || true
fi
echo "cleanup done"
}
function main() {
echo "Starting main process"
# do long running data processing
echo "Exiting Main"
}
function monitor_status() {
echo "checking status"
$Status = get_status
if [[ $Status = "Terminate" ]]; then
echo "Alert Alert!!!"
kill -15 $$
fi
sleep 5
done
}
monitor_status &
main &
trap cleanup SIGTERM EXIT
child=$!
wait "${child}"
I have a function in bash, call it "timer", that simply displays number of seconds elapsed. Presently, it runs in a separate process, and the parent process kills it when it is done.
I wish the function to to trap a signal somehow and exit gracefully, but I have no idea how. Here is an example script as it is now:
#!/bin/bash
function timer () {
t0=$(date +%s)
while true ; do
t=$(date +%s)
echo -en "\r$(($t - $t0))"
done
}
timer &
pid=$!
echo $pid
sleep 5 # do something while timer runs
echo "done"
kill -9 $pid
Two things:
Don't use kill -9 to kill it. SIGKILL is uncatchable. It doesn't let the target process do any cleanup. Just do a plain kill to send a SIGTERM signal.
You can trap on SIGTERM. You could also trap on SIGINT to catch Ctrl-C. Or best, trap on EXIT to do cleanup no matter how the script is killed.
function timer () {
trap 'echo -e "\ntimer stopped"' EXIT
t0=$(date +%s)
while true ; do
t=$(date +%s)
echo -en "\r$(($t - $t0))"
done
}
timer &
pid=$!
echo "$pid"
sleep 5 # do something while timer runs
echo "done"
kill "$pid"
What I am trying to do is create a generic, asynchronous command runner that will allow me to run a command in the background and get its output and code without blocking the shell I am working in (think serial). For most commands, I could do something like:
FUNCwaitForCommand() {
wait "$1"
echo $? > "code.txt"
}
ls > "output.txt" &
pid=$!
FUNCwaitForCommand $pid &
however, this does not work for composed commands, e.g.
(cat < somefifo)
I can make it run the commands with something like:
FUNCwaitForCommand() {
wait $1
echo $? > code
}
eval "ls > output.txt &"
pid=$!
FUNCwaitForCommand $pid &
but the wait does not wait. I can make it wait to finish until the process finishes by doing:
while kill -0 "$1"; do wait "$1"; done
instead of just wait, but the code it gives me is 127 instead of the code of the command that gets run. If I put a wait directly after the pid collection
eval "ls > output.txt &"
pid=$!
wait $pid
it waits for the process just fine, but obviously it doesn't background and release the shell back to me.
I'm not great at bash, but it looks like inside of the function is not in the same sub shell as the eval, so it doesn't recognize the background process, though I don't know why it only acts that way when using eval, and not when using the normal execution method.
Explanation
Just as you can't use:
sleep 5 & pid=$!
wait $pid &
you can't put that wait inside a backgrounded function either. That is to say, you can run:
sleep 5 & pid=$!
waitForCommand() { wait "$#"; }
waitForCommand "$pid"
but you can't run:
sleep 5 & pid=$!
waitForCommand() { wait "$#"; }
waitForCommand "$pid" &
This is because processes can only wait() for their children. When you fork off a new child off the shell with &, you're no longer the parent -- instead, you're a sibling. As such, this isn't shell-specific behavior but general-purpose UNIX semantics -- you'd get an equivalent error in any language.
Workaround
Ensure that exit-status recording is done by the direct parent of the process whose exit status is being recorded, even if that parent itself is in the background relative to the original shell.
tempdir_top=$(mktemp -t -d bgdir.XXXXXX)
declare -g -A tempdirs=( )
runBackgroundCommand() {
(( "$#" == 1 )) || { echo "Usage: runBackgroundCommand 'command'" >&2; return 1; }
local cmd tempdir
cmd=$1
tempdir=$(mktemp -d "$tempdir_top/proc.XXXXXX")
{
printf '%s\0' "$cmd" >"$tempdir/cmd"
eval "$cmd" >"$tempdir/stdout" 2>"$tempdir/stderr" & pid=$!
printf '%s\n' "$pid" >"$tempdir/pid"
wait "$1"; retval=$?
printf '%s\n' "$retval" >"$tempdir/retval"
} &
tempdirs[$tempdir]=$!
}
# example usage
runBackgroundCommand "sleep 5"
runBackgroundCommand "sleep 10"
That way, in the parent process, you have a map of temporary directories to the top-level PID for each (easily used to check for completion), and can look inside that directory for more information on any of the processes involved.
I'm running several background processes in my script
run_gui()
{
exec ... # the real commands here
}
The functions run_ai1(), run_ai2 are analogous.
Then I run the functions and do the needed piping
run_gui &
run_ai1 &
run_ai2 &
while true; do
while true; do
read -u $ai1_outfd line || echo "Nothing read"
if [[ $line ]]; then
: # processing
fi
done
sleep $turndelay
while true; do
read -u $ai2_outfd line || echo "nothing read"
if [[ $line ]]; then
: # processing
fi
done
sleep $turndelay
done
If any of those three processes exits, I want to check their exit codes and terminate the rest of the processes. For example, if run_ai2 exits with exit code 3, then I want to stop the processes run_ai1 and run_gui and exit the main script with exit code 1. The correct exitcodes for the different backgrounds processes may differ.
The problem is: how can I detect it? There's the command wait but I don't know in advance which script will finish first. I could run wait as a background process - but it's becoming even more clumsy.
Can you help me please?
The following script monitors test child processes (in the example, sleep+false and sleep+true) and reports their PID and exit code:
#!/bin/bash
set -m
trap myhandler CHLD
myhandler() {
echo sigchld received
cat /tmp/foo
}
( sleep 5; false; echo "exit p1=$?" ) > /tmp/foo &
p1=$!
echo "p1=$p1"
( sleep 3; true; echo "exit p2=$?" ) > /tmp/foo &
p2=$!
echo "p2=$p2"
pstree -p $$
wait
The result is:
p1=3197
p2=3198
prueba(3196)─┬─prueba(3197)───sleep(3199)
├─prueba(3198)───sleep(3201)
└─pstree(3200)
sigchld received
sigchld received
exit p2=0
sigchld received
exit p1=1
It could be interesting to use SIGUSR1 instead of SIGCHLD; see here for an example: https://stackoverflow.com/a/12751700/4886927.
Also, inside the trap handler, it is posible to verify which child is still alive. Something like:
myhandler() {
if kill -0 $p1; then
echo "child1 is alive"
fi
if kill -0 $p2; then
echo "child2 is alive"
fi
}
or kill both childs when one of them dies:
myhandler() {
if kill -0 $p1 && kill -0 $p2; then
echo "all childs alive"
else
kill -9 $p1 $p2
fi
}