I am trying to implement a timed function. If the timer times out the function/command should be killed. If the function/command finishes, the timer should not make the bash to wait for the timer to timeout.
(cmdpid=$BASHPID; \
( sleep 60; kill $cmdpid 2>/dev/null) & \
child_pid=$!; \
ssh remote_host /users/jj/test.sh; \
kill -9 $child_pid)
The test.sh may or may not finish in 60 seconds. This worked fine.
But when I want to get the result of the test.sh, which echoes "SUCESS" or "FAILURE", I tried with
result=$(cmdpid=$BASHPID; \
( sleep 60; kill $cmdpid 2>/dev/null) & \
child_pid=$!; \
ssh remote_host /users/jj/test.sh; \
kill -9 $child_pid)
Here it waits for timer to exit. I can see the "kill -9 $child_pid" is executed, using set -x command, but the kill is not really killing the sub-shell.
One way to tackle this problem would be to run the timer on a separate script, say MyTimerTest, which is called from the (say) MainScriptTest but runs separately, and then whichever script that finishes first "kills" the other. For example:
On MainScriptTest you could put this at the beginning:
nohup /folder/MyTimerTest > /dev/null 2>&1 &
On MainScriptTest you could put this at the very end:
killall MyTimerTest > /dev/null 2>&1
The MyTimerTest could be something like this:
#!/bin/bash
sleep 60
killall MainScriptTest > /dev/null 2>&1
exit 0
Note: the long name for the scripts with mixed capital and lowercase letters (ex.: MainScriptTest) is on purpose, killall is case sensitive and that helps to preclude it from killing something it should not. To be very safe, you might want to even add a token in addition to the longer name, like: MainScriptTest88888 or something like that.
Edit: Thanks to gilez, who suggested the use of the timeout command. If that is available to you on your system, one could do a quick one-liner like this:
timeout 60 bash -c "/folder/MainScriptTest"
Using timeout is convenient. However, if MainScriptTest creates independent child processes (for example by calling: nohup /folder/OtherScript &) then timeout would not kill those child processes, and the exit would not be clean.
The first solution I gave is longer, but it could be customized to kill those child processes (or any other processes you want) by adding them to the MainScriptTest, like for example:
killall OtherScript > /dev/null 2>&1
Found some other way.
result=$( ssh $remote_host /users/jj/test.sh ) & mypid=$!
( sleep 10; kill -9 $mypid ) &
wait $mypid
Related
I have a bash script that launches a child process that crashes (actually, hangs) from time to time and with no apparent reason (closed source, so there isn't much I can do about it). As a result, I would like to be able to launch this process for a given amount of time, and kill it if it did not return successfully after a given amount of time.
Is there a simple and robust way to achieve that using bash?
P.S.: tell me if this question is better suited to serverfault or superuser.
(As seen in:
BASH FAQ entry #68: "How do I run a command, and have it abort (timeout) after N seconds?")
If you don't mind downloading something, use timeout (sudo apt-get install timeout) and use it like: (most Systems have it already installed otherwise use sudo apt-get install coreutils)
timeout 10 ping www.goooooogle.com
If you don't want to download something, do what timeout does internally:
( cmdpid=$BASHPID; (sleep 10; kill $cmdpid) & exec ping www.goooooogle.com )
In case that you want to do a timeout for longer bash code, use the second option as such:
( cmdpid=$BASHPID;
(sleep 10; kill $cmdpid) \
& while ! ping -w 1 www.goooooogle.com
do
echo crap;
done )
# Spawn a child process:
(dosmth) & pid=$!
# in the background, sleep for 10 secs then kill that process
(sleep 10 && kill -9 $pid) &
or to get the exit codes as well:
# Spawn a child process:
(dosmth) & pid=$!
# in the background, sleep for 10 secs then kill that process
(sleep 10 && kill -9 $pid) & waiter=$!
# wait on our worker process and return the exitcode
exitcode=$(wait $pid && echo $?)
# kill the waiter subshell, if it still runs
kill -9 $waiter 2>/dev/null
# 0 if we killed the waiter, cause that means the process finished before the waiter
finished_gracefully=$?
sleep 999&
t=$!
sleep 10
kill $t
I also had this question and found two more things very useful:
The SECONDS variable in bash.
The command "pgrep".
So I use something like this on the command line (OSX 10.9):
ping www.goooooogle.com & PING_PID=$(pgrep 'ping'); SECONDS=0; while pgrep -q 'ping'; do sleep 0.2; if [ $SECONDS = 10 ]; then kill $PING_PID; fi; done
As this is a loop I included a "sleep 0.2" to keep the CPU cool. ;-)
(BTW: ping is a bad example anyway, you just would use the built-in "-t" (timeout) option.)
Assuming you have (or can easily make) a pid file for tracking the child's pid, you could then create a script that checks the modtime of the pid file and kills/respawns the process as needed. Then just put the script in crontab to run at approximately the period you need.
Let me know if you need more details. If that doesn't sound like it'd suit your needs, what about upstart?
One way is to run the program in a subshell, and communicate with the subshell through a named pipe with the read command. This way you can check the exit status of the process being run and communicate this back through the pipe.
Here's an example of timing out the yes command after 3 seconds. It gets the PID of the process using pgrep (possibly only works on Linux). There is also some problem with using a pipe in that a process opening a pipe for read will hang until it is also opened for write, and vice versa. So to prevent the read command hanging, I've "wedged" open the pipe for read with a background subshell. (Another way to prevent a freeze to open the pipe read-write, i.e. read -t 5 <>finished.pipe - however, that also may not work except with Linux.)
rm -f finished.pipe
mkfifo finished.pipe
{ yes >/dev/null; echo finished >finished.pipe ; } &
SUBSHELL=$!
# Get command PID
while : ; do
PID=$( pgrep -P $SUBSHELL yes )
test "$PID" = "" || break
sleep 1
done
# Open pipe for writing
{ exec 4>finished.pipe ; while : ; do sleep 1000; done } &
read -t 3 FINISHED <finished.pipe
if [ "$FINISHED" = finished ] ; then
echo 'Subprocess finished'
else
echo 'Subprocess timed out'
kill $PID
fi
rm finished.pipe
Here's an attempt which tries to avoid killing a process after it has already exited, which reduces the chance of killing another process with the same process ID (although it's probably impossible to avoid this kind of error completely).
run_with_timeout ()
{
t=$1
shift
echo "running \"$*\" with timeout $t"
(
# first, run process in background
(exec sh -c "$*") &
pid=$!
echo $pid
# the timeout shell
(sleep $t ; echo timeout) &
waiter=$!
echo $waiter
# finally, allow process to end naturally
wait $pid
echo $?
) \
| (read pid
read waiter
if test $waiter != timeout ; then
read status
else
status=timeout
fi
# if we timed out, kill the process
if test $status = timeout ; then
kill $pid
exit 99
else
# if the program exited normally, kill the waiting shell
kill $waiter
exit $status
fi
)
}
Use like run_with_timeout 3 sleep 10000, which runs sleep 10000 but ends it after 3 seconds.
This is like other answers which use a background timeout process to kill the child process after a delay. I think this is almost the same as Dan's extended answer (https://stackoverflow.com/a/5161274/1351983), except the timeout shell will not be killed if it has already ended.
After this program has ended, there will still be a few lingering "sleep" processes running, but they should be harmless.
This may be a better solution than my other answer because it does not use the non-portable shell feature read -t and does not use pgrep.
Here's the third answer I've submitted here. This one handles signal interrupts and cleans up background processes when SIGINT is received. It uses the $BASHPID and exec trick used in the top answer to get the PID of a process (in this case $$ in a sh invocation). It uses a FIFO to communicate with a subshell that is responsible for killing and cleanup. (This is like the pipe in my second answer, but having a named pipe means that the signal handler can write into it too.)
run_with_timeout ()
{
t=$1 ; shift
trap cleanup 2
F=$$.fifo ; rm -f $F ; mkfifo $F
# first, run main process in background
"$#" & pid=$!
# sleeper process to time out
( sh -c "echo \$\$ >$F ; exec sleep $t" ; echo timeout >$F ) &
read sleeper <$F
# control shell. read from fifo.
# final input is "finished". after that
# we clean up. we can get a timeout or a
# signal first.
( exec 0<$F
while : ; do
read input
case $input in
finished)
test $sleeper != 0 && kill $sleeper
rm -f $F
exit 0
;;
timeout)
test $pid != 0 && kill $pid
sleeper=0
;;
signal)
test $pid != 0 && kill $pid
;;
esac
done
) &
# wait for process to end
wait $pid
status=$?
echo finished >$F
return $status
}
cleanup ()
{
echo signal >$$.fifo
}
I've tried to avoid race conditions as far as I can. However, one source of error I couldn't remove is when the process ends near the same time as the timeout. For example, run_with_timeout 2 sleep 2 or run_with_timeout 0 sleep 0. For me, the latter gives an error:
timeout.sh: line 250: kill: (23248) - No such process
as it is trying to kill a process that has already exited by itself.
#Kill command after 10 seconds
timeout 10 command
#If you don't have timeout installed, this is almost the same:
sh -c '(sleep 10; kill "$$") & command'
#The same as above, with muted duplicate messages:
sh -c '(sleep 10; kill "$$" 2>/dev/null) & command'
By referencing bash: silently kill background function process and Timeout a command in bash without unnecessary delay, I wrote my own script to set a timeout for a command, as well as silencing the kill message.
But I still am getting a "Terminated" message when my process gets killed. What's wrong with my code?
#!/bin/bash
silent_kill() {
kill $1 2>/dev/null
wait $1 2>/dev/null
}
timeout() {
limit=$1 #timeout limit
shift
command=$* #command to run
interval=1 #default interval between checks if the process is still alive
delay=1 #default delay between SIGTERM and SIGKILL
(
((t = limit))
while ((t > 0)); do
sleep $interval;
#kill -0 $$ || exit 0
((t -= interval))
done
silent_kill $$
#kill -s SIGTERM $$ && kill -0 $$ || exit 0
sleep $delay
#kill -s SIGKILL $$
) &> /dev/null &
exec $*
}
timeout 1 sleep 10
There's nothing wrong with your code, that "Terminated" message doesn't come from your script but from the invoking shell (the one you launch your script
from).
You can deactivate if by disabling job control:
$ set +m
$ bash <your timeout script>
Perhaps bash has moved on in 4 years. I do know you can avoid
getting Terminated by disowning a child process. You can no longer job control it though. Eg:
$ sleep 100 &
[1] 15436
$ disown -r
$ kill -9 15436
help disown:
disown [-h] [-ar] [jobspec ...]
Remove jobs from current shell.
Removes each JOBSPEC argument from the table of active jobs. Without
any JOBSPECs, the shell uses its notion of the current job.
-a remove all jobs if JOBSPEC is not supplied
-h mark each JOBSPEC so that SIGHUP is not sent to the job if the shell receives a SIGHUP
-r remove only running jobs
Internally the shell maintains a list of children it forked and wait()s for any of them to exit or be killed. When a child's exit status was collected, the shell prints a message. This is called monitoring in shell parlance.
It seems you want to turn off monitoring. Monitoring is managed with the m option; to turn it on, use set -m (the default at startup). To turn it off, set +m.
Note that monitoring off also disables messages for asynchronous jobs, e.g. no more messages like
$ sleep 5 &
[1] 59468
$
[1] + done sleep 5
$
Say I have this pseudocode in bash
#!/bin/bash
things
for i in {1..3}
do
nohup someScript[i] &
done
wait
for i in {4..6}
do
nohup someScript[i] &
done
wait
otherThings
and say this someScript[i] sometimes end up hanging.
Is there a way I can take the process IDs (with $!)
and check periodically if the process is taking more than a specified amount of time after which I want to kill the hanged processes with kill -9 ?
Unfortunately the answer from #Eugeniu did not work for me, timeout gave an error.
However I found useful doing this routine, I'll post it here so anyone can take advantage of it if in my same problem.
Create another script which goes like this
#!/bin/bash
#monitor.sh
pid=$1
counter=10
while ps -p $pid > /dev/null
do
if [[ $counter -eq 0 ]] ; then
kill -9 $pid
#if it's still there then kill it
fi
counter=$((counter-1))
sleep 1
done
then in the main work you just put
things
for i in {1..3}
do
nohup someScript[i] &
./monitor.sh $! &
done
wait
In this way for any of your someScript you will have a parallel process that checks if it's still there every chosen interval (until maximum time decided by the counter) and that actually quit itself if the associated process dies (or gets killed)
One possible approach:
#!/bin/bash
# things
mypids=()
for i in {1..3}; do
# launch the script with timeout (3600s)
timeout 3600 nohup someScript[i] &
mypids[i]=$! # store the PID
done
wait "${mypids[#]}"
I have a main script which run all the scripts in a folder.
#!/bin/bash
for each in /some_folder/*.sh
do
bash $each
done;
I want to know if execution of one of them lasts too long (more than N seconds). For example execution of script such as:
#!/bin/bash
ping -c 10000 google.com
will lasts very long, and I want my main script to e-mail me after N second.
All I can do now is to run all scripts with #timeout N option but it stops them!
Is it possible to E-mail me and not to stop execution of script?
Try this :
#!/bin/bash
# max seconds before mail alert
MAX_SECONDS=3600
# running the command in the background and get the pid
command_that_takes_a_long_time & _pid=$!
sleep $MAX_SECONDS
# if the pid is alive...
if kill &>/dev/null -0 $_pid; then
mail -s "script $0 takes more than $MAX_SECONDS" user#domain.tld < /dev/null
fi
We run the command in the background, then sleep for MAX_SECONDS in // and alert by email if the process takes more than what is permitted.
Finally, with your specific requirements :
#!/bin/bash
MAX_SECONDS=3600
alerter(){
bash "$1" & _pid=$!
sleep $MAX_SECONDS
if kill &>/dev/null -0 $_pid; then
mail -s "$2 takes more than $MAX_SECONDS" user#domain.tld < /dev/null
fi
}
for each in /some_folder/*.sh; do
alerter "$each" &
wait $_pid # remove this line if you wou'd like to run all scripts in //
done
You can do something like this:
( sleep 10 ; echo 'Takes a while' | sendmail myself#example.com ) &
email_pid=$!
bash $each
kill $email_pid
The first command is run in a subshell in the background. It first sleeps a while, then sends email. If the script $each finishes before the sleep expires, the subshell is killed without sending email.
I have a web server where I run some slow-starting programs as daemons. These sometimes need quick restarting (or stopping) when I recompile them or switch to another installation of them.
Inspired by http://mywiki.wooledge.org/ProcessManagement, I'm writing a script
called daemonise.sh that looks like
#!/bin/sh
while :; do
./myprogram lotsadata.xml
echo "Restarting server..." 1>&2
done
to keep a "daemon" running. Since I sometimes need to stop it, or just
restart it, I run that script in a screen session, like:
$ ./daemonise.sh & DPID=$!
$ screen -d
Then perhaps I recompile myprogram, install it to a new path, start
the new one up and want to kill the old one:
$ screen -r
$ kill $DPID
$ screen -d
This works fine when I'm the only maintainer, but now I want to let
someone else stop/restart the program, no matter who started it. And
to make things more complicated, the daemonise.sh script in fact
starts about 16 programs, making it a hassle to kill every single one
if you don't know their PIDs.
What would be the "best practices" way of letting another user
stop/restart the daemons?
I thought about shared screen sessions, but that just sounds hacky and
insecure. The best solution I've come up with for now is to wrap
starting and killing in a script that catches certain signals:
#!/bin/bash
DPID=
trap './daemonise.sh & DPID=$!' USR1
trap 'kill $DPID' USR2 EXIT
# Ensure trapper wrapper doesn't exit:
while :; do
sleep 10000 & wait $!
done
Now, should another user need to stop the daemons and I can't do it,
she just has to know the pid of the wrapper, and e.g. sudo kill -s
USR2 $wrapperpid. (Also, this makes it possible to run the daemons
on reboots, and still kill them cleanly.)
Is there a better solution? Are there obvious problems with this
solution that I'm not seeing?
(After reading Greg's Bash Wiki, I'd like to avoid any solution involving pgrep or PID-files …)
I recommend a PID based init script. Anyone with sudo privileged to the script will be able to start and stop the server processes.
On improving your approach: wouldn't it be advisable to make sure that your sleep command in sleep 10000 & wait $! gets properly terminated if your pidwrapper script exits somehow?
Otherwise there would remain a dangling sleep process in the process table for quite some time.
Similarly, wouldn't it be cleaner to terminate myprogram in daemonise.sh properly on restart (i. e. if daemonise.sh receives a TERM signal)?
In addition, it is possible to suppress job notification messages and test for pid existence before killing.
#!/bin/sh
# cat daemonise.sh
# cf. "How to suppress Terminated message after killing in bash?",
# http://stackoverflow.com/q/81520
trap '
echo "server shut down..." 1>&2
kill $spid1 $spid2 $spid3 &&
wait $spid1 $spid2 $spid3 2>/dev/null
exit
' TERM
while :; do
echo "Starting server..." 1>&2
#./myprogram lotsadata.xml
sleep 100 &
spid1=${!}
sleep 100 &
spid2=${!}
sleep 100 &
spid3=${!}
wait
echo "Restarting server..." 1>&2
done
#------------------------------------------------------------
#!/bin/bash
# cat pidwrapper
DPID=
trap '
kill -0 ${!} 2>/dev/null && kill ${!} && wait ${!} 2>/dev/null
./daemonise.sh & DPID=${!}
' USR1
trap '
kill -0 ${!} 2>/dev/null && kill ${!} && wait ${!} 2>/dev/null
kill -0 $DPID 2>/dev/null && kill $DPID && wait ${DPID} 2>/dev/null
' USR2
trap '
trap - EXIT
kill -0 $DPID 2>/dev/null && kill $DPID && wait ${DPID} 2>/dev/null
kill -0 ${!} 2>/dev/null && kill ${!} && wait ${!} 2>/dev/null
exit 0
' EXIT
# Ensure trapper wrapper does not exit:
while :; do
sleep 10000 & wait $!
done
#------------------------------------------------------------
# test
{
wrapperpid="`exec sh -c './pidwrapper & echo ${!}' | head -1`"
echo "wrapperpid: $wrapperpid"
for n in 1 2 3 4 5; do
sleep 2
# start daemonise.sh
kill -s USR1 $wrapperpid
sleep 2
# kill daemonise.sh
kill -s USR2 $wrapperpid
done
sleep 2
echo kill $wrapperpid
kill $wrapperpid
}