Related
This answer to Command line command to auto-kill a command after a certain amount of time
proposes a 1-line method to timeout a long-running command from the bash command line:
( /path/to/slow command with options ) & sleep 5 ; kill $!
But it's possible that a given "long-running" command may finish earlier than the timeout.
(Let's call it a "typically-long-running-but-sometimes-fast" command, or tlrbsf for fun.)
So this nifty 1-liner approach has a couple of problems.
First, the sleep isn't conditional, so that sets an undesirable lower bound on the time taken for the sequence to finish. Consider 30s or 2m or even 5m for the sleep, when the tlrbsf command finishes in 2 seconds — highly undesirable.
Second, the kill is unconditional, so this sequence will attempt to kill a non-running process and whine about it.
So...
Is there a way to timeout a typically-long-running-but-sometimes-fast ("tlrbsf") command that
has a bash implementation (the other question already has Perl and C answers)
will terminate at the earlier of the two: tlrbsf program termination, or timeout elapsed
will not kill non-existing/non-running processes (or, optionally: will not complain about a bad kill)
doesn't have to be a 1-liner
can run under Cygwin or Linux
... and, for bonus points
runs the tlrbsf command in the foreground
any 'sleep' or extra process in the background
such that the stdin/stdout/stderr of the tlrbsf command can be redirected, same as if it had been run directly?
If so, please share your code. If not, please explain why.
I have spent awhile trying to hack the aforementioned example but I'm hitting the limit of my bash skills.
You are probably looking for the timeout command in coreutils. Since it's a part of coreutils, it is technically a C solution, but it's still coreutils. info timeout for more details.
Here's an example:
timeout 5 /path/to/slow/command with options
I think this is precisely what you are asking for:
http://www.bashcookbook.com/bashinfo/source/bash-4.0/examples/scripts/timeout3
#!/bin/bash
#
# The Bash shell script executes a command with a time-out.
# Upon time-out expiration SIGTERM (15) is sent to the process. If the signal
# is blocked, then the subsequent SIGKILL (9) terminates it.
#
# Based on the Bash documentation example.
# Hello Chet,
# please find attached a "little easier" :-) to comprehend
# time-out example. If you find it suitable, feel free to include
# anywhere: the very same logic as in the original examples/scripts, a
# little more transparent implementation to my taste.
#
# Dmitry V Golovashkin <Dmitry.Golovashkin#sas.com>
scriptName="${0##*/}"
declare -i DEFAULT_TIMEOUT=9
declare -i DEFAULT_INTERVAL=1
declare -i DEFAULT_DELAY=1
# Timeout.
declare -i timeout=DEFAULT_TIMEOUT
# Interval between checks if the process is still alive.
declare -i interval=DEFAULT_INTERVAL
# Delay between posting the SIGTERM signal and destroying the process by SIGKILL.
declare -i delay=DEFAULT_DELAY
function printUsage() {
cat <<EOF
Synopsis
$scriptName [-t timeout] [-i interval] [-d delay] command
Execute a command with a time-out.
Upon time-out expiration SIGTERM (15) is sent to the process. If SIGTERM
signal is blocked, then the subsequent SIGKILL (9) terminates it.
-t timeout
Number of seconds to wait for command completion.
Default value: $DEFAULT_TIMEOUT seconds.
-i interval
Interval between checks if the process is still alive.
Positive integer, default value: $DEFAULT_INTERVAL seconds.
-d delay
Delay between posting the SIGTERM signal and destroying the
process by SIGKILL. Default value: $DEFAULT_DELAY seconds.
As of today, Bash does not support floating point arithmetic (sleep does),
therefore all delay/time values must be integers.
EOF
}
# Options.
while getopts ":t:i:d:" option; do
case "$option" in
t) timeout=$OPTARG ;;
i) interval=$OPTARG ;;
d) delay=$OPTARG ;;
*) printUsage; exit 1 ;;
esac
done
shift $((OPTIND - 1))
# $# should be at least 1 (the command to execute), however it may be strictly
# greater than 1 if the command itself has options.
if (($# == 0 || interval <= 0)); then
printUsage
exit 1
fi
# kill -0 pid Exit code indicates if a signal may be sent to $pid process.
(
((t = timeout))
while ((t > 0)); do
sleep $interval
kill -0 $$ || exit 0
((t -= interval))
done
# Be nice, post SIGTERM first.
# The 'exit 0' below will be executed if any preceeding command fails.
kill -s SIGTERM $$ && kill -0 $$ || exit 0
sleep $delay
kill -s SIGKILL $$
) 2> /dev/null &
exec "$#"
This solution works regardless of bash monitor mode. You can use the proper signal to terminate your_command
#!/bin/sh
( your_command ) & pid=$!
( sleep $TIMEOUT && kill -HUP $pid ) 2>/dev/null & watcher=$!
wait $pid 2>/dev/null && pkill -HUP -P $watcher
The watcher kills your_command after given timeout; the script waits for the slow task and terminates the watcher. Note that wait does not work with processes which are children of a different shell.
Examples:
your_command runs more than 2 seconds and was terminated
your_command interrupted
( sleep 20 ) & pid=$!
( sleep 2 && kill -HUP $pid ) 2>/dev/null & watcher=$!
if wait $pid 2>/dev/null; then
echo "your_command finished"
pkill -HUP -P $watcher
wait $watcher
else
echo "your_command interrupted"
fi
your_command finished before the timeout (20 seconds)
your_command finished
( sleep 2 ) & pid=$!
( sleep 20 && kill -HUP $pid ) 2>/dev/null & watcher=$!
if wait $pid 2>/dev/null; then
echo "your_command finished"
pkill -HUP -P $watcher
wait $watcher
else
echo "your_command interrupted"
fi
To timeout the slowcommand after 1 second:
timeout 1 slowcommand || echo "I failed, perhaps due to time out"
To determine whether the command timed out or failed for its own reasons, check whether the status code is 124:
# ping the address 8.8.8.8 for 3 seconds, but timeout after only 1 second
timeout 1 ping 8.8.8.8 -w3
EXIT_STATUS=$?
if [ $EXIT_STATUS -eq 124 ]
then
echo 'Process Timed Out!'
else
echo 'Process did not timeout. Something else went wrong.'
fi
exit $EXIT_STATUS
Note that when the exit status is 124, you don't know whether it timed out due to your timeout command, or whether the command itself terminated due to some internal timeout logic of its own and then returned 124. You can safely assume in either case, though, that a timeout of some kind happened.
There you go:
timeout --signal=SIGINT 10 /path/to/slow command with options
you may change the SIGINT and 10 as you desire ;)
You can do this entirely with bash 4.3 and above:
_timeout() { ( set +b; sleep "$1" & "${#:2}" & wait -n; r=$?; kill -9 `jobs -p`; exit $r; ) }
Example: _timeout 5 longrunning_command args
Example: { _timeout 5 producer || echo KABOOM $?; } | consumer
Example: producer | { _timeout 5 consumer1; consumer2; }
Example: { while date; do sleep .3; done; } | _timeout 5 cat | less
Needs Bash 4.3 for wait -n
Gives 137 if the command was killed, else the return value of the command.
Works for pipes. (You do not need to go foreground here!)
Works with internal shell commands or functions, too.
Runs in a subshell, so no variable export into the current shell, sorry.
If you do not need the return code, this can be made even simpler:
_timeout() { ( set +b; sleep "$1" & "${#:2}" & wait -n; kill -9 `jobs -p`; ) }
Notes:
Strictly speaking you do not need the ; in ; ), however it makes thing more consistent to the ; }-case. And the set +b probably can be left away, too, but better safe than sorry.
Except for --forground (probably) you can implement all variants timeout supports. --preserve-status is a bit difficult, though. This is left as an exercise for the reader ;)
This recipe can be used "naturally" in the shell (as natural as for flock fd):
(
set +b
sleep 20 &
{
YOUR SHELL CODE HERE
} &
wait -n
kill `jobs -p`
)
However, as explained above, you cannot re-export environment variables into the enclosing shell this way naturally.
Edit:
Real world example: Time out __git_ps1 in case it takes too long (for things like slow SSHFS-Links):
eval "__orig$(declare -f __git_ps1)" && __git_ps1() { ( git() { _timeout 0.3 /usr/bin/git "$#"; }; _timeout 0.3 __orig__git_ps1 "$#"; ) }
Edit2: Bugfix. I noticed that exit 137 is not needed and makes _timeout unreliable at the same time.
Edit3: git is a die-hard, so it needs a double-trick to work satisfyingly.
Edit4: Forgot a _ in the first _timeout for the real world GIT example.
I prefer "timelimit", which has a package at least in debian.
http://devel.ringlet.net/sysutils/timelimit/
It is a bit nicer than the coreutils "timeout" because it prints something when killing the process, and it also sends SIGKILL after some time by default.
See also the http://www.pixelbeat.org/scripts/timeout script the functionality of which has been integrated into newer coreutils
timeout is probably the first approach to try. You may need notification or another command to execute if it times out. After quite a bit of searching and experimenting, I came up with this bash script:
if
timeout 20s COMMAND_YOU_WANT_TO_EXECUTE;
timeout 20s AS_MANY_COMMANDS_AS_YOU_WANT;
then
echo 'OK'; #if you want a positive response
else
echo 'Not OK';
AND_ALTERNATIVE_COMMANDS
fi
Kinda hacky, but it works. Doesn't work if you have other foreground processes (please help me fix this!)
sleep TIMEOUT & SPID=${!}; (YOUR COMMAND HERE; kill ${SPID}) & CPID=${!}; fg 1; kill ${CPID}
Actually, I think you can reverse it, meeting your 'bonus' criteria:
(YOUR COMMAND HERE & SPID=${!}; (sleep TIMEOUT; kill ${SPID}) & CPID=${!}; fg 1; kill ${CPID}) < asdf > fdsa
Simple script with code clarity. Save to /usr/local/bin/run:
#!/bin/bash
# run
# Run command with timeout $1 seconds.
# Timeout seconds
timeout_seconds="$1"
shift
# PID
pid=$$
# Start timeout
(
sleep "$timeout_seconds"
echo "Timed out after $timeout_seconds seconds"
kill -- -$pid &>/dev/null
) &
timeout_pid=$!
# Run
"$#"
# Stop timeout
kill $timeout_pid &>/dev/null
Times out a command that runs too long:
$ run 2 sleep 10
Timed out after 2 seconds
Terminated
$
Ends immediately for a command that completes:
$ run 10 sleep 2
$
If you already know the name of the program (let's assume program) to terminate after the timeout (as an example 3 seconds), I can contribute a simple and somewhat dirty alternative solution:
(sleep 3 && killall program) & ./program
This works perfectly if I call benchmark processes with system calls.
There's also cratimeout by Martin Cracauer (written in C for Unix and Linux systems).
# cf. http://www.cons.org/cracauer/software.html
# usage: cratimeout timeout_in_msec cmd args
cratimeout 5000 sleep 1
cratimeout 5000 sleep 600
cratimeout 5000 tail -f /dev/null
cratimeout 5000 sh -c 'while sleep 1; do date; done'
OS X doesn't use bash 4 yet, nor does it have /usr/bin/timeout, so here's a function that works on OS X without home-brew or macports that is similar to /usr/bin/timeout (based on Tino's answer). Parameter validation, help, usage, and support for other signals are an exercise for reader.
# implement /usr/bin/timeout only if it doesn't exist
[ -n "$(type -p timeout 2>&1)" ] || function timeout { (
set -m +b
sleep "$1" &
SPID=${!}
("${#:2}"; RETVAL=$?; kill ${SPID}; exit $RETVAL) &
CPID=${!}
wait %1
SLEEPRETVAL=$?
if [ $SLEEPRETVAL -eq 0 ] && kill ${CPID} >/dev/null 2>&1 ; then
RETVAL=124
# When you need to make sure it dies
#(sleep 1; kill -9 ${CPID} >/dev/null 2>&1)&
wait %2
else
wait %2
RETVAL=$?
fi
return $RETVAL
) }
Here is a version that does not rely on spawning a child process - I needed a standalone script which embedded this functionality. It also does a fractional poll interval, so you can poll quicker. timeout would have been preferred - but I'm stuck on an old server
# wait_on_command <timeout> <poll interval> command
wait_on_command()
{
local timeout=$1; shift
local interval=$1; shift
$* &
local child=$!
loops=$(bc <<< "($timeout * (1 / $interval)) + 0.5" | sed 's/\..*//g')
((t = loops))
while ((t > 0)); do
sleep $interval
kill -0 $child &>/dev/null || return
((t -= 1))
done
kill $child &>/dev/null || kill -0 $child &>/dev/null || return
sleep $interval
kill -9 $child &>/dev/null
echo Timed out
}
slow_command()
{
sleep 2
echo Completed normally
}
# wait 1 sec in 0.1 sec increments
wait_on_command 1 0.1 slow_command
# or call an external command
wait_on_command 1 0.1 sleep 10
The timeout command itself has a --foreground option. This lets the command interact with the user "when not running timeout directly from a shell prompt".
timeout --foreground the_command its_options
I think the questioner must have been aware of the very obvious solution of the timeout command, but asked for an alternate solution for this reason. timeout did not work for me when I called it using popen, i.e. 'not directly from the shell'. However, let me not assume that this may have been the reason in the questioner's case. Take a look at its man page.
If you want to do it in your script, put this in there:
parent=$$
( sleep 5 && kill -HUP $parent ) 2>/dev/null &
I was presented with a problem to preserve the shell context and allow timeouts, the only problem with it is it will stop script execution on the timeout - but it's fine with the needs I was presented:
#!/usr/bin/env bash
safe_kill()
{
ps aux | grep -v grep | grep $1 >/dev/null && kill ${2:-} $1
}
my_timeout()
{
typeset _my_timeout _waiter_pid _return
_my_timeout=$1
echo "Timeout($_my_timeout) running: $*"
shift
(
trap "return 0" USR1
sleep $_my_timeout
echo "Timeout($_my_timeout) reached for: $*"
safe_kill $$
) &
_waiter_pid=$!
"$#" || _return=$?
safe_kill $_waiter_pid -USR1
echo "Timeout($_my_timeout) ran: $*"
return ${_return:-0}
}
my_timeout 3 cd scripts
my_timeout 3 pwd
my_timeout 3 true && echo true || echo false
my_timeout 3 false && echo true || echo false
my_timeout 3 sleep 10
my_timeout 3 pwd
with the outputs:
Timeout(3) running: 3 cd scripts
Timeout(3) ran: cd scripts
Timeout(3) running: 3 pwd
/home/mpapis/projects/rvm/rvm/scripts
Timeout(3) ran: pwd
Timeout(3) running: 3 true
Timeout(3) ran: true
true
Timeout(3) running: 3 false
Timeout(3) ran: false
false
Timeout(3) running: 3 sleep 10
Timeout(3) reached for: sleep 10
Terminated
of course I assume there was a dir called scripts
#! /bin/bash
timeout=10
interval=1
delay=3
(
((t = timeout)) || :
while ((t > 0)); do
echo "$t"
sleep $interval
# Check if the process still exists.
kill -0 $$ 2> /dev/null || exit 0
((t -= interval)) || :
done
# Be nice, post SIGTERM first.
{ echo SIGTERM to $$ ; kill -s TERM $$ ; sleep $delay ; kill -0 $$ 2> /dev/null && { echo SIGKILL to $$ ; kill -s KILL $$ ; } ; }
) &
exec "$#"
My problem was maybe a bit different : I start a command via ssh on a remote machine and want to kill the shell and childs if the command hangs.
I now use the following :
ssh server '( sleep 60 && kill -9 0 ) 2>/dev/null & my_command; RC=$? ; sleep 1 ; pkill -P $! ; exit $RC'
This way the command returns 255 when there was a timeout or the returncode of the command in case of success
Please note that killing processes from a ssh session is handled different from an interactive shell. But you can also use the -t option to ssh to allocate a pseudo terminal, so it acts like an interactive shell
Building on #loup's answer...
If you want to timeout a process and silence the kill job/pid output, run:
( (sleep 1 && killall program 2>/dev/null) &) && program --version
This puts the backgrounded process into a subshell so you don't see the job output.
A very simplistic way:
# command & sleep 5; pkill -9 -x -f "command"
with pkill (option -f) you can kill your specific command with arguments or specify -n to avoid kill old process.
In 99% of the cases the answer is NOT to implement any timeout logic. Timeout logic is in nearly any situation a red warning sign that something else is wrong and should be fixed instead.
Is your process hanging or breaking after n seconds sometimes? Then find out why and fix that instead.
As an aside, to do strager's solution right, you need to use wait "$SPID" instead of fg 1, since in scripts you don't have job control (and trying to turn it on is stupid). Moreover, fg 1 relies on the fact that you didn't start any other jobs previously in the script which is a bad assumption to make.
I have a cron job that calls a php script and, some times, it get stuck on php script. This solution was perfect to me.
I use:
scripttimeout -t 60 /script.php
I have a command CMD called from my main bourne shell script that takes forever.
I want to modify the script as follows:
Run the command CMD in parallel as a background process (CMD &).
In the main script, have a loop to monitor the spawned command every few seconds. The loop also echoes some messages to stdout indicating progress of the script.
Exit the loop when the spawned command terminates.
Capture and report the exit code of the spawned process.
Can someone give me pointers to accomplish this?
1: In bash, $! holds the PID of the last background process that was executed. That will tell you what process to monitor, anyway.
4: wait <n> waits until the process with PID <n> is complete (it will block until the process completes, so you might not want to call this until you are sure the process is done), and then returns the exit code of the completed process.
2, 3: ps or ps | grep " $! " can tell you whether the process is still running. It is up to you how to understand the output and decide how close it is to finishing. (ps | grep isn't idiot-proof. If you have time you can come up with a more robust way to tell whether the process is still running).
Here's a skeleton script:
# simulate a long process that will have an identifiable exit code
(sleep 15 ; /bin/false) &
my_pid=$!
while ps | grep " $my_pid " # might also need | grep -v grep here
do
echo $my_pid is still in the ps output. Must still be running.
sleep 3
done
echo Oh, it looks like the process is done.
wait $my_pid
# The variable $? always holds the exit code of the last command to finish.
# Here it holds the exit code of $my_pid, since wait exits with that code.
my_status=$?
echo The exit status of the process was $my_status
This is how I solved it when I had a similar need:
# Some function that takes a long time to process
longprocess() {
# Sleep up to 14 seconds
sleep $((RANDOM % 15))
# Randomly exit with 0 or 1
exit $((RANDOM % 2))
}
pids=""
# Run five concurrent processes
for i in {1..5}; do
( longprocess ) &
# store PID of process
pids+=" $!"
done
# Wait for all processes to finish, will take max 14s
# as it waits in order of launch, not order of finishing
for p in $pids; do
if wait $p; then
echo "Process $p success"
else
echo "Process $p fail"
fi
done
The pid of a backgrounded child process is stored in $!.
You can store all child processes' pids into an array, e.g. PIDS[].
wait [-n] [jobspec or pid …]
Wait until the child process specified by each process ID pid or job specification jobspec exits and return the exit status of the last command waited for. If a job spec is given, all processes in the job are waited for. If no arguments are given, all currently active child processes are waited for, and the return status is zero. If the -n option is supplied, wait waits for any job to terminate and returns its exit status. If neither jobspec nor pid specifies an active child process of the shell, the return status is 127.
Use wait command you can wait for all child processes finish, meanwhile you can get exit status of each child processes via $? and store status into STATUS[]. Then you can do something depending by status.
I have tried the following 2 solutions and they run well. solution01 is
more concise, while solution02 is a little complicated.
solution01
#!/bin/bash
# start 3 child processes concurrently, and store each pid into array PIDS[].
process=(a.sh b.sh c.sh)
for app in ${process[#]}; do
./${app} &
PIDS+=($!)
done
# wait for all processes to finish, and store each process's exit code into array STATUS[].
for pid in ${PIDS[#]}; do
echo "pid=${pid}"
wait ${pid}
STATUS+=($?)
done
# after all processed finish, check their exit codes in STATUS[].
i=0
for st in ${STATUS[#]}; do
if [[ ${st} -ne 0 ]]; then
echo "$i failed"
else
echo "$i finish"
fi
((i+=1))
done
solution02
#!/bin/bash
# start 3 child processes concurrently, and store each pid into array PIDS[].
i=0
process=(a.sh b.sh c.sh)
for app in ${process[#]}; do
./${app} &
pid=$!
PIDS[$i]=${pid}
((i+=1))
done
# wait for all processes to finish, and store each process's exit code into array STATUS[].
i=0
for pid in ${PIDS[#]}; do
echo "pid=${pid}"
wait ${pid}
STATUS[$i]=$?
((i+=1))
done
# after all processed finish, check their exit codes in STATUS[].
i=0
for st in ${STATUS[#]}; do
if [[ ${st} -ne 0 ]]; then
echo "$i failed"
else
echo "$i finish"
fi
((i+=1))
done
As I see almost all answers use external utilities (mostly ps) to poll the state of the background process. There is a more unixesh solution, catching the SIGCHLD signal. In the signal handler it has to be checked which child process was stopped. It can be done by kill -0 <PID> built-in (universal) or checking the existence of /proc/<PID> directory (Linux specific) or using the jobs built-in (bash specific. jobs -l also reports the pid. In this case the 3rd field of the output can be Stopped|Running|Done|Exit . ).
Here is my example.
The launched process is called loop.sh. It accepts -x or a number as an argument. For -x is exits with exit code 1. For a number it waits num*5 seconds. In every 5 seconds it prints its PID.
The launcher process is called launch.sh:
#!/bin/bash
handle_chld() {
local tmp=()
for((i=0;i<${#pids[#]};++i)); do
if [ ! -d /proc/${pids[i]} ]; then
wait ${pids[i]}
echo "Stopped ${pids[i]}; exit code: $?"
else tmp+=(${pids[i]})
fi
done
pids=(${tmp[#]})
}
set -o monitor
trap "handle_chld" CHLD
# Start background processes
./loop.sh 3 &
pids+=($!)
./loop.sh 2 &
pids+=($!)
./loop.sh -x &
pids+=($!)
# Wait until all background processes are stopped
while [ ${#pids[#]} -gt 0 ]; do echo "WAITING FOR: ${pids[#]}"; sleep 2; done
echo STOPPED
For more explanation see: Starting a process from bash script failed
#/bin/bash
#pgm to monitor
tail -f /var/log/messages >> /tmp/log&
# background cmd pid
pid=$!
# loop to monitor running background cmd
while :
do
ps ax | grep $pid | grep -v grep
ret=$?
if test "$ret" != "0"
then
echo "Monitored pid ended"
break
fi
sleep 5
done
wait $pid
echo $?
I would change your approach slightly. Rather than checking every few seconds if the command is still alive and reporting a message, have another process that reports every few seconds that the command is still running and then kill that process when the command finishes. For example:
#!/bin/sh
cmd() { sleep 5; exit 24; }
cmd & # Run the long running process
pid=$! # Record the pid
# Spawn a process that coninually reports that the command is still running
while echo "$(date): $pid is still running"; do sleep 1; done &
echoer=$!
# Set a trap to kill the reporter when the process finishes
trap 'kill $echoer' 0
# Wait for the process to finish
if wait $pid; then
echo "cmd succeeded"
else
echo "cmd FAILED!! (returned $?)"
fi
Our team had the same need with a remote SSH-executed script which was timing out after 25 minutes of inactivity. Here is a solution with the monitoring loop checking the background process every second, but printing only every 10 minutes to suppress an inactivity timeout.
long_running.sh &
pid=$!
# Wait on a background job completion. Query status every 10 minutes.
declare -i elapsed=0
# `ps -p ${pid}` works on macOS and CentOS. On both OSes `ps ${pid}` works as well.
while ps -p ${pid} >/dev/null; do
sleep 1
if ((++elapsed % 600 == 0)); then
echo "Waiting for the completion of the main script. $((elapsed / 60))m and counting ..."
fi
done
# Return the exit code of the terminated background process. This works in Bash 4.4 despite what Bash docs say:
# "If neither jobspec nor pid specifies an active child process of the shell, the return status is 127."
wait ${pid}
A simple example, similar to the solutions above. This doesn't require monitoring any process output. The next example uses tail to follow output.
$ echo '#!/bin/bash' > tmp.sh
$ echo 'sleep 30; exit 5' >> tmp.sh
$ chmod +x tmp.sh
$ ./tmp.sh &
[1] 7454
$ pid=$!
$ wait $pid
[1]+ Exit 5 ./tmp.sh
$ echo $?
5
Use tail to follow process output and quit when the process is complete.
$ echo '#!/bin/bash' > tmp.sh
$ echo 'i=0; while let "$i < 10"; do sleep 5; echo "$i"; let i=$i+1; done; exit 5;' >> tmp.sh
$ chmod +x tmp.sh
$ ./tmp.sh
0
1
2
^C
$ ./tmp.sh > /tmp/tmp.log 2>&1 &
[1] 7673
$ pid=$!
$ tail -f --pid $pid /tmp/tmp.log
0
1
2
3
4
5
6
7
8
9
[1]+ Exit 5 ./tmp.sh > /tmp/tmp.log 2>&1
$ wait $pid
$ echo $?
5
Another solution is to monitor processes via the proc filesystem (safer than ps/grep combo); when you start a process it has a corresponding folder in /proc/$pid, so the solution could be
#!/bin/bash
....
doSomething &
local pid=$!
while [ -d /proc/$pid ]; do # While directory exists, the process is running
doSomethingElse
....
else # when directory is removed from /proc, process has ended
wait $pid
local exit_status=$?
done
....
Now you can use the $exit_status variable however you like.
With this method, your script doesnt have to wait for the background process, you will only have to monitor a temporary file for the exit status.
FUNCmyCmd() { sleep 3;return 6; };
export retFile=$(mktemp);
FUNCexecAndWait() { FUNCmyCmd;echo $? >$retFile; };
FUNCexecAndWait&
now, your script can do anything else while you just have to keep monitoring the contents of retFile (it can also contain any other information you want like the exit time).
PS.: btw, I coded thinking in bash
My solution was to use an anonymous pipe to pass the status to a monitoring loop. There are no temporary files used to exchange status so nothing to cleanup. If you were uncertain about the number of background jobs the break condition could be [ -z "$(jobs -p)" ].
#!/bin/bash
exec 3<> <(:)
{ sleep 15 ; echo "sleep/exit $?" >&3 ; } &
while read -u 3 -t 1 -r STAT CODE || STAT="timeout" ; do
echo "stat: ${STAT}; code: ${CODE}"
if [ "${STAT}" = "sleep/exit" ] ; then
break
fi
done
how about ...
# run your stuff
unset PID
for process in one two three four
do
( sleep $((RANDOM%20)); echo hello from process $process; exit $((RANDOM%3)); ) & 2>&1
PID+=($!)
done
# (optional) report on the status of that stuff as it exits
for pid in "${PID[#]}"
do
( wait "$pid"; echo "process $pid complemted with exit status $?") &
done
# (optional) while we wait, monitor that stuff
while ps --pid "${PID[*]}" --ppid "${PID[*]}" --format pid,ppid,command,pcpu
do
sleep 5
done | xargs -i date '+%x %X {}'
# return non-zero if any are non zero
SUCCESS=0
for pid in "${PID[#]}"
do
wait "$pid" && ((SUCCESS++)) && echo "$pid OK" || echo "$pid returned $?"
done
echo "success for $SUCCESS out of ${#PID} jobs"
exit $(( ${#PID} - SUCCESS ))
This may be extending beyond your question, however if you're concerned about the length of time processes are running for, you may be interested in checking the status of running background processes after an interval of time. It's easy enough to check which child PIDs are still running using pgrep -P $$, however I came up with the following solution to check the exit status of those PIDs that have already expired:
cmd1() { sleep 5; exit 24; }
cmd2() { sleep 10; exit 0; }
pids=()
cmd1 & pids+=("$!")
cmd2 & pids+=("$!")
lasttimeout=0
for timeout in 2 7 11; do
echo -n "interval-$timeout: "
sleep $((timeout-lasttimeout))
# you can only wait on a pid once
remainingpids=()
for pid in ${pids[*]}; do
if ! ps -p $pid >/dev/null ; then
wait $pid
echo -n "pid-$pid:exited($?); "
else
echo -n "pid-$pid:running; "
remainingpids+=("$pid")
fi
done
pids=( ${remainingpids[*]} )
lasttimeout=$timeout
echo
done
which outputs:
interval-2: pid-28083:running; pid-28084:running;
interval-7: pid-28083:exited(24); pid-28084:running;
interval-11: pid-28084:exited(0);
Note: You could change $pids to a string variable rather than array to simplify things if you like.
What I am trying to do is create a generic, asynchronous command runner that will allow me to run a command in the background and get its output and code without blocking the shell I am working in (think serial). For most commands, I could do something like:
FUNCwaitForCommand() {
wait "$1"
echo $? > "code.txt"
}
ls > "output.txt" &
pid=$!
FUNCwaitForCommand $pid &
however, this does not work for composed commands, e.g.
(cat < somefifo)
I can make it run the commands with something like:
FUNCwaitForCommand() {
wait $1
echo $? > code
}
eval "ls > output.txt &"
pid=$!
FUNCwaitForCommand $pid &
but the wait does not wait. I can make it wait to finish until the process finishes by doing:
while kill -0 "$1"; do wait "$1"; done
instead of just wait, but the code it gives me is 127 instead of the code of the command that gets run. If I put a wait directly after the pid collection
eval "ls > output.txt &"
pid=$!
wait $pid
it waits for the process just fine, but obviously it doesn't background and release the shell back to me.
I'm not great at bash, but it looks like inside of the function is not in the same sub shell as the eval, so it doesn't recognize the background process, though I don't know why it only acts that way when using eval, and not when using the normal execution method.
Explanation
Just as you can't use:
sleep 5 & pid=$!
wait $pid &
you can't put that wait inside a backgrounded function either. That is to say, you can run:
sleep 5 & pid=$!
waitForCommand() { wait "$#"; }
waitForCommand "$pid"
but you can't run:
sleep 5 & pid=$!
waitForCommand() { wait "$#"; }
waitForCommand "$pid" &
This is because processes can only wait() for their children. When you fork off a new child off the shell with &, you're no longer the parent -- instead, you're a sibling. As such, this isn't shell-specific behavior but general-purpose UNIX semantics -- you'd get an equivalent error in any language.
Workaround
Ensure that exit-status recording is done by the direct parent of the process whose exit status is being recorded, even if that parent itself is in the background relative to the original shell.
tempdir_top=$(mktemp -t -d bgdir.XXXXXX)
declare -g -A tempdirs=( )
runBackgroundCommand() {
(( "$#" == 1 )) || { echo "Usage: runBackgroundCommand 'command'" >&2; return 1; }
local cmd tempdir
cmd=$1
tempdir=$(mktemp -d "$tempdir_top/proc.XXXXXX")
{
printf '%s\0' "$cmd" >"$tempdir/cmd"
eval "$cmd" >"$tempdir/stdout" 2>"$tempdir/stderr" & pid=$!
printf '%s\n' "$pid" >"$tempdir/pid"
wait "$1"; retval=$?
printf '%s\n' "$retval" >"$tempdir/retval"
} &
tempdirs[$tempdir]=$!
}
# example usage
runBackgroundCommand "sleep 5"
runBackgroundCommand "sleep 10"
That way, in the parent process, you have a map of temporary directories to the top-level PID for each (easily used to check for completion), and can look inside that directory for more information on any of the processes involved.
I have a command CMD called from my main bourne shell script that takes forever.
I want to modify the script as follows:
Run the command CMD in parallel as a background process (CMD &).
In the main script, have a loop to monitor the spawned command every few seconds. The loop also echoes some messages to stdout indicating progress of the script.
Exit the loop when the spawned command terminates.
Capture and report the exit code of the spawned process.
Can someone give me pointers to accomplish this?
1: In bash, $! holds the PID of the last background process that was executed. That will tell you what process to monitor, anyway.
4: wait <n> waits until the process with PID <n> is complete (it will block until the process completes, so you might not want to call this until you are sure the process is done), and then returns the exit code of the completed process.
2, 3: ps or ps | grep " $! " can tell you whether the process is still running. It is up to you how to understand the output and decide how close it is to finishing. (ps | grep isn't idiot-proof. If you have time you can come up with a more robust way to tell whether the process is still running).
Here's a skeleton script:
# simulate a long process that will have an identifiable exit code
(sleep 15 ; /bin/false) &
my_pid=$!
while ps | grep " $my_pid " # might also need | grep -v grep here
do
echo $my_pid is still in the ps output. Must still be running.
sleep 3
done
echo Oh, it looks like the process is done.
wait $my_pid
# The variable $? always holds the exit code of the last command to finish.
# Here it holds the exit code of $my_pid, since wait exits with that code.
my_status=$?
echo The exit status of the process was $my_status
This is how I solved it when I had a similar need:
# Some function that takes a long time to process
longprocess() {
# Sleep up to 14 seconds
sleep $((RANDOM % 15))
# Randomly exit with 0 or 1
exit $((RANDOM % 2))
}
pids=""
# Run five concurrent processes
for i in {1..5}; do
( longprocess ) &
# store PID of process
pids+=" $!"
done
# Wait for all processes to finish, will take max 14s
# as it waits in order of launch, not order of finishing
for p in $pids; do
if wait $p; then
echo "Process $p success"
else
echo "Process $p fail"
fi
done
The pid of a backgrounded child process is stored in $!.
You can store all child processes' pids into an array, e.g. PIDS[].
wait [-n] [jobspec or pid …]
Wait until the child process specified by each process ID pid or job specification jobspec exits and return the exit status of the last command waited for. If a job spec is given, all processes in the job are waited for. If no arguments are given, all currently active child processes are waited for, and the return status is zero. If the -n option is supplied, wait waits for any job to terminate and returns its exit status. If neither jobspec nor pid specifies an active child process of the shell, the return status is 127.
Use wait command you can wait for all child processes finish, meanwhile you can get exit status of each child processes via $? and store status into STATUS[]. Then you can do something depending by status.
I have tried the following 2 solutions and they run well. solution01 is
more concise, while solution02 is a little complicated.
solution01
#!/bin/bash
# start 3 child processes concurrently, and store each pid into array PIDS[].
process=(a.sh b.sh c.sh)
for app in ${process[#]}; do
./${app} &
PIDS+=($!)
done
# wait for all processes to finish, and store each process's exit code into array STATUS[].
for pid in ${PIDS[#]}; do
echo "pid=${pid}"
wait ${pid}
STATUS+=($?)
done
# after all processed finish, check their exit codes in STATUS[].
i=0
for st in ${STATUS[#]}; do
if [[ ${st} -ne 0 ]]; then
echo "$i failed"
else
echo "$i finish"
fi
((i+=1))
done
solution02
#!/bin/bash
# start 3 child processes concurrently, and store each pid into array PIDS[].
i=0
process=(a.sh b.sh c.sh)
for app in ${process[#]}; do
./${app} &
pid=$!
PIDS[$i]=${pid}
((i+=1))
done
# wait for all processes to finish, and store each process's exit code into array STATUS[].
i=0
for pid in ${PIDS[#]}; do
echo "pid=${pid}"
wait ${pid}
STATUS[$i]=$?
((i+=1))
done
# after all processed finish, check their exit codes in STATUS[].
i=0
for st in ${STATUS[#]}; do
if [[ ${st} -ne 0 ]]; then
echo "$i failed"
else
echo "$i finish"
fi
((i+=1))
done
As I see almost all answers use external utilities (mostly ps) to poll the state of the background process. There is a more unixesh solution, catching the SIGCHLD signal. In the signal handler it has to be checked which child process was stopped. It can be done by kill -0 <PID> built-in (universal) or checking the existence of /proc/<PID> directory (Linux specific) or using the jobs built-in (bash specific. jobs -l also reports the pid. In this case the 3rd field of the output can be Stopped|Running|Done|Exit . ).
Here is my example.
The launched process is called loop.sh. It accepts -x or a number as an argument. For -x is exits with exit code 1. For a number it waits num*5 seconds. In every 5 seconds it prints its PID.
The launcher process is called launch.sh:
#!/bin/bash
handle_chld() {
local tmp=()
for((i=0;i<${#pids[#]};++i)); do
if [ ! -d /proc/${pids[i]} ]; then
wait ${pids[i]}
echo "Stopped ${pids[i]}; exit code: $?"
else tmp+=(${pids[i]})
fi
done
pids=(${tmp[#]})
}
set -o monitor
trap "handle_chld" CHLD
# Start background processes
./loop.sh 3 &
pids+=($!)
./loop.sh 2 &
pids+=($!)
./loop.sh -x &
pids+=($!)
# Wait until all background processes are stopped
while [ ${#pids[#]} -gt 0 ]; do echo "WAITING FOR: ${pids[#]}"; sleep 2; done
echo STOPPED
For more explanation see: Starting a process from bash script failed
#/bin/bash
#pgm to monitor
tail -f /var/log/messages >> /tmp/log&
# background cmd pid
pid=$!
# loop to monitor running background cmd
while :
do
ps ax | grep $pid | grep -v grep
ret=$?
if test "$ret" != "0"
then
echo "Monitored pid ended"
break
fi
sleep 5
done
wait $pid
echo $?
I would change your approach slightly. Rather than checking every few seconds if the command is still alive and reporting a message, have another process that reports every few seconds that the command is still running and then kill that process when the command finishes. For example:
#!/bin/sh
cmd() { sleep 5; exit 24; }
cmd & # Run the long running process
pid=$! # Record the pid
# Spawn a process that coninually reports that the command is still running
while echo "$(date): $pid is still running"; do sleep 1; done &
echoer=$!
# Set a trap to kill the reporter when the process finishes
trap 'kill $echoer' 0
# Wait for the process to finish
if wait $pid; then
echo "cmd succeeded"
else
echo "cmd FAILED!! (returned $?)"
fi
Our team had the same need with a remote SSH-executed script which was timing out after 25 minutes of inactivity. Here is a solution with the monitoring loop checking the background process every second, but printing only every 10 minutes to suppress an inactivity timeout.
long_running.sh &
pid=$!
# Wait on a background job completion. Query status every 10 minutes.
declare -i elapsed=0
# `ps -p ${pid}` works on macOS and CentOS. On both OSes `ps ${pid}` works as well.
while ps -p ${pid} >/dev/null; do
sleep 1
if ((++elapsed % 600 == 0)); then
echo "Waiting for the completion of the main script. $((elapsed / 60))m and counting ..."
fi
done
# Return the exit code of the terminated background process. This works in Bash 4.4 despite what Bash docs say:
# "If neither jobspec nor pid specifies an active child process of the shell, the return status is 127."
wait ${pid}
A simple example, similar to the solutions above. This doesn't require monitoring any process output. The next example uses tail to follow output.
$ echo '#!/bin/bash' > tmp.sh
$ echo 'sleep 30; exit 5' >> tmp.sh
$ chmod +x tmp.sh
$ ./tmp.sh &
[1] 7454
$ pid=$!
$ wait $pid
[1]+ Exit 5 ./tmp.sh
$ echo $?
5
Use tail to follow process output and quit when the process is complete.
$ echo '#!/bin/bash' > tmp.sh
$ echo 'i=0; while let "$i < 10"; do sleep 5; echo "$i"; let i=$i+1; done; exit 5;' >> tmp.sh
$ chmod +x tmp.sh
$ ./tmp.sh
0
1
2
^C
$ ./tmp.sh > /tmp/tmp.log 2>&1 &
[1] 7673
$ pid=$!
$ tail -f --pid $pid /tmp/tmp.log
0
1
2
3
4
5
6
7
8
9
[1]+ Exit 5 ./tmp.sh > /tmp/tmp.log 2>&1
$ wait $pid
$ echo $?
5
Another solution is to monitor processes via the proc filesystem (safer than ps/grep combo); when you start a process it has a corresponding folder in /proc/$pid, so the solution could be
#!/bin/bash
....
doSomething &
local pid=$!
while [ -d /proc/$pid ]; do # While directory exists, the process is running
doSomethingElse
....
else # when directory is removed from /proc, process has ended
wait $pid
local exit_status=$?
done
....
Now you can use the $exit_status variable however you like.
With this method, your script doesnt have to wait for the background process, you will only have to monitor a temporary file for the exit status.
FUNCmyCmd() { sleep 3;return 6; };
export retFile=$(mktemp);
FUNCexecAndWait() { FUNCmyCmd;echo $? >$retFile; };
FUNCexecAndWait&
now, your script can do anything else while you just have to keep monitoring the contents of retFile (it can also contain any other information you want like the exit time).
PS.: btw, I coded thinking in bash
My solution was to use an anonymous pipe to pass the status to a monitoring loop. There are no temporary files used to exchange status so nothing to cleanup. If you were uncertain about the number of background jobs the break condition could be [ -z "$(jobs -p)" ].
#!/bin/bash
exec 3<> <(:)
{ sleep 15 ; echo "sleep/exit $?" >&3 ; } &
while read -u 3 -t 1 -r STAT CODE || STAT="timeout" ; do
echo "stat: ${STAT}; code: ${CODE}"
if [ "${STAT}" = "sleep/exit" ] ; then
break
fi
done
how about ...
# run your stuff
unset PID
for process in one two three four
do
( sleep $((RANDOM%20)); echo hello from process $process; exit $((RANDOM%3)); ) & 2>&1
PID+=($!)
done
# (optional) report on the status of that stuff as it exits
for pid in "${PID[#]}"
do
( wait "$pid"; echo "process $pid complemted with exit status $?") &
done
# (optional) while we wait, monitor that stuff
while ps --pid "${PID[*]}" --ppid "${PID[*]}" --format pid,ppid,command,pcpu
do
sleep 5
done | xargs -i date '+%x %X {}'
# return non-zero if any are non zero
SUCCESS=0
for pid in "${PID[#]}"
do
wait "$pid" && ((SUCCESS++)) && echo "$pid OK" || echo "$pid returned $?"
done
echo "success for $SUCCESS out of ${#PID} jobs"
exit $(( ${#PID} - SUCCESS ))
This may be extending beyond your question, however if you're concerned about the length of time processes are running for, you may be interested in checking the status of running background processes after an interval of time. It's easy enough to check which child PIDs are still running using pgrep -P $$, however I came up with the following solution to check the exit status of those PIDs that have already expired:
cmd1() { sleep 5; exit 24; }
cmd2() { sleep 10; exit 0; }
pids=()
cmd1 & pids+=("$!")
cmd2 & pids+=("$!")
lasttimeout=0
for timeout in 2 7 11; do
echo -n "interval-$timeout: "
sleep $((timeout-lasttimeout))
# you can only wait on a pid once
remainingpids=()
for pid in ${pids[*]}; do
if ! ps -p $pid >/dev/null ; then
wait $pid
echo -n "pid-$pid:exited($?); "
else
echo -n "pid-$pid:running; "
remainingpids+=("$pid")
fi
done
pids=( ${remainingpids[*]} )
lasttimeout=$timeout
echo
done
which outputs:
interval-2: pid-28083:running; pid-28084:running;
interval-7: pid-28083:exited(24); pid-28084:running;
interval-11: pid-28084:exited(0);
Note: You could change $pids to a string variable rather than array to simplify things if you like.
This seems like a pretty trivial thing to do, but I'm very stuck.
To execute something in the background, use &:
>>> sleep 5 &
[1] 21763
>>> #hit enter
[1]+ Done sleep 5
But having a bashrc-sourced background script output job information is pretty frustrating, so you can do this to fix it:
>>> (sleep 5 &)
OK, so now I want to get the PID of sleep for wait or kill. Unfortunately its running in a subshell so the typical $! method doesn't work:
>>> echo $!
21763
>>> (sleep 5 &)
>>> echo $!
21763 #hasn't changed
So I thought, maybe I could get the subshell to print its PID in this way:
>>> sleep 5 & echo $!
[1] 21803 #annoying job-start message (stderr)
21803 #from the echo
But now when I throw that in the subshell no matter how I try to capture stdout of the subshell, it appears to block until sleep has finished.
>>> pid=$(sleep 5 & echo $!)
How can I run something in the background, get its PID and stop it from printing job information and "Done"?
Solution A
When summoning the process, redirect the shell's stderr to >/dev/null for that summoning instance. We can do this by duplicating fd 2 so we could still use the duplicate fd for the process. We do all of these inside a block to make redirection temporary:
{ sleep 5 2>&3 & pid=$!; } 3>&2 2>/dev/null
Now to prevent the "Done" message from being shown later, we exclude the process from the job table and this is done with the disown command:
{ sleep 5 2>&3 & disown; pid=$!; } 3>&2 2>/dev/null
It's not necessary if job control is not enabled. Job control can be disabled with set +m or shopt -u -o monitor.
Solution B
We can also use command substitution to summon the process. The only problem we had is that the process still hooks itself to the pipe created by $() that reads stdout but we can fix this by duplicating original stdout before it then using that file descriptor for the process:
{ pid=$( sleep 200s >&3 & echo $! ); } 3>&1
It may not be necessary if we redirect the process' output somewhere like /dev/null:
pid=$( sleep 200s >/dev/null & echo $! )
Similarly with process substitution:
{ read pid < <(sleep 200s >&3 & echo $!); } 3>&1
Some may say that redirection is not necessary for process substitution but the problem is that the process that may be accessing its stdout would die quickly. For example:
$ function x { for A in {1..100}; do echo "$A"; sleep 1s; done }
$ read pid < <(x & echo $!)
$ kill -s 0 "$pid" &>/dev/null && echo "Process active." || echo "Process died."
Process died.
$ read pid < <(x > /dev/null & echo $!)
$ kill -s 0 "$pid" &>/dev/null && echo "Process active." || echo "Process died."
Process active.
Optionally you can just create a permanent duplicate fd with exec 3>&1 so you can just have pid=$( sleep 200s >&3 & echo $! ) on the next lines.
You can use read bulletin to capture output:
read -r pid < <(sleep 10 & echo $!)
Then:
ps -p $pid
PID TTY TIME CMD
78541 ttys001 0:00.00 sleep 10
The set +m disable monitor mode in bash. In other words it rid off the annnoying Done message.
To enable again, use set -m.
eg:
$ set +m
$ (sleep 5; echo some) &
[1] 23545 #still prints the job number
#after 5 secs
some
$ #no Done message...
Try this:
pid=$((sleep 5 & echo $!) | sed 1q)
I found a great way, no need sub-shell, will keep the parent-child relationship.
Since: [1] 21763 and [1]+ Done sleep 5 are all stderr, which is &2.
We can redirect &2 to /dev/null, here is code:
exec 7>&2 2>/dev/null # Here backup 2 to 7, and redirect 2 to /dev/null
sleep 5
wait
exec 2>&7 7>&- # here restore 7 to 2, and delete 7.
See: Using exec