Kill process after a given time bash? - bash

I have a script that tries to make a DB connection using another program and the timeout(2.5min) of the program is to long. I want to add this functionality to the script.
If it takes longer then 5 seconds to connect, kill the process
Else kill the sleep/kill process.
The issue I'm having is how bash reports when a process is killed, that's because the processes are in the same shell just the background. Is there a better way to do this or how can I silence the shell for the kill commands?
DB_CONNECTION_PROGRAM > $CONNECTFILE &
pid=$!
(sleep 5; kill $pid) &
sleep_pid=$!
wait $pid
# If the DB failed to connect after 5 seconds and was killed
status=$? #Kill returns 128+n (fatal error)
if [ $status -gt 128 ]; then
no_connection="ERROR: Timeout while trying to connect to $dbserver"
else # If it connected kill the sleep and any errors collect
kill $sleep_pid
no_connection=`sed -n '/^ERROR:/,$p' $CONNECTFILE`
fi

There's a GNU coreutils utility called timeout: http://www.gnu.org/s/coreutils/manual/html_node/timeout-invocation.html
If you have it on your platform, you could do:
timeout 5 CONNECT_TO_DB
if [ $? -eq 124 ]; then
# Timeout occurred
else
# No hang
fi

I don't know if it's identical but I did fix a similar issue a few years ago. However I'm a programmer, not a Unix-like sysadmin so take the following with a grain of salt because my Bash-fu may not be that strong...
Basically I did fork, fork and fork : )
Out of memory After founding back my old code (which I amazingly still use daily) because my memory wasn't good enough, in Bash it worked a bit like this:
commandThatMayHang.sh 2 > /dev/null 2>&1 & # notice that last '&', we're forking
MAYBE_HUNG_PID=$!
sleepAndMaybeKill.sh $MAYBE_HUNG_PID 2 > /dev/null 2>&1 & # we're forking again
SLEEP_AND_MAYBE_KILL_PID=$!
wait $MAYBE_HUNG_PID > /dev/null 2>&1
if [ $? -eq 0 ]
# commandThatMayHand.sh did not hang, fine, no need to monitor it anymore
kill -9 $SLEEP_AND_MAYBE_KILL 2> /dev/null 2>&1
fi
where sleepAndMaybeKill.sh sleeps the amount of time you want and then kills commandThatMayHand.sh.
So basically the two scenario are:
your command exits fine (before your 5 seconds timeout or whatever) and so the wait stop as soon as your command exits fine (and kills the "killer" because it's not needed anymore
the command locks up, the killer ends up killing the command
In any case you're guaranteed to either succeed as soon as the command is done or to fail after the timeout.

You can set a timeout after 2 hours and restart your javaScriptThatStalls 100 times this way in a loop
seq 100|xargs -II timeout $((2 * 60 * 60)) javaScriptThatStalls

Do you mean you don't want the error message printed if the process isn't still running? Then you could just redirect stderr: kill $pid 2>/dev/null.
You could also check whether the process is still running:
if ps -p $pid >/dev/null; then kill $pid; fi

I found this bash script
timeout.sh
by Anthony Thyssen (his web). Looks good.

Related

Using sleep and wait -n to implement simple timeout in bash, race condition or not?

If I do this in a bash script:
sleep 10 &
sleep_pid=$!
some_command &
wait -n
cmd_pid=$!
if kill -0 $sleep_pid 2> /dev/null; then
# all ok
kill $sleep_pid
else
# some_command hung
...code to log diagnostics and then kill -9 $cmd_pid...
fi
where some_command is something that should be quick but can hang due to rare errors.
Is there then a risk that some_command can be done and cleaned up before "wait -n" starts, so there is only the sleep to wait for? Or does the '&' after one command guarantee that the shell won't call waitpid() on it until the next line of input has been handled?
It works in interactive shells. If you do:
sleep 10 &
sleep 0 &
wait -n
then the "wait -n" returns right away even if you wait a couple of seconds before running it. But I'm not sure if it can be trusted for non-interactive shells?
EDIT: Clarifying need for diagnostics + some grammar.
I believe you may be able to use the timeout command to do this.
http://man7.org/linux/man-pages/man1/timeout.1.html
timeout 10s command_to_run
You can check the exit status of the timeout command to know if it timed out.
timeout 2s sleep 10
if [[ $? -gt 0 ]]; then
echo "it timed out"
else
echo "It was successful"
fi
By using the $! variable, we avoid relying on interactive job control features. Try this:
...long executing command... &
pid_long=$!
sleep 3 &
pid_sleep=$!
wait -n
kill -KILL $pid_long
The problem here is PID recycling. Unlikely to happen in 3 seconds, though.
In the case when the command finishes earlier than the sleep (and its PID has not been recycled to a new process) kill produces an error message; we could pipe that to /dev/null.
We should probably also kill the sleep in case it is the one that is lingering.
As #CharlesDuffy pointed out in comments, the answer is no, there is no race (provided it is run in a non-interactive shell).
Also there is no need (in non-interactive shells) to make sure the wait comes directly after the command, as non-interactive shells don't do automatic reaping of children.
But I guess one should wrap this in a sub-shell, so "wait -n" won't return early due to some previously started unrelated background job.

Sleep seems to run first in bash script

I need to terminate a script if it exceeds a specific duration (10 mins)
examplescript.sh &
pid=$!
sleep 600
if ['pgrep $pid']
then
kill $pid
fi
When I tested it on my test environment, it seems working well. examplescript.sh runs first and if it runs for more than 10 mins, it will be terminated. However, when I tried in our production environment, it seems that sleep runs first. It waits 600s before running the examplescript.sh. Is there something wrong in the script?
There is multiply thing you should correct in your code.
pgrep will make a regex search on process names not pids. You can use kill -0 pid to check if a process with pid is running.
[ (test) is a command[1] and should be treated as one. That means each argument should be separated by spaces. When using [ the last argument should also be ]:
[ arg1 arg2 ]
In your example you wont need [ since kill -0 will exit truly if the process is still running:
if kill -0 pid; then
And to wrap it up:
examplescript.sh &
pid=$!
sleep 600
if kill -0 "$pid" 2> /dev/null; then
kill "$pid"
fi
kill -0 will write an error to stderr if the process is not running anymore. So we redirect that to /dev/null.
[1] It's usually a build-in these days.
Another thing to note is that your script will run for 600 seconds even though examplescript.sh will only take a few seconds to run.
Are your production machines significantly faster? I do not have example script to really run this on my machine, but I think your problem might be solved if you take the code you mention above
examplescript.sh &
pid=$!
sleep 600
if ['pgrep $pid']
then
kill $pid
fi
put it in a file called, say, monitor.sh and run that file in the background. i.e.
monitor.sh &
Hope this helps.

shell script - how to stop "watch" command in the shell script [duplicate]

I have a bash script that launches a child process that crashes (actually, hangs) from time to time and with no apparent reason (closed source, so there isn't much I can do about it). As a result, I would like to be able to launch this process for a given amount of time, and kill it if it did not return successfully after a given amount of time.
Is there a simple and robust way to achieve that using bash?
P.S.: tell me if this question is better suited to serverfault or superuser.
(As seen in:
BASH FAQ entry #68: "How do I run a command, and have it abort (timeout) after N seconds?")
If you don't mind downloading something, use timeout (sudo apt-get install timeout) and use it like: (most Systems have it already installed otherwise use sudo apt-get install coreutils)
timeout 10 ping www.goooooogle.com
If you don't want to download something, do what timeout does internally:
( cmdpid=$BASHPID; (sleep 10; kill $cmdpid) & exec ping www.goooooogle.com )
In case that you want to do a timeout for longer bash code, use the second option as such:
( cmdpid=$BASHPID;
(sleep 10; kill $cmdpid) \
& while ! ping -w 1 www.goooooogle.com
do
echo crap;
done )
# Spawn a child process:
(dosmth) & pid=$!
# in the background, sleep for 10 secs then kill that process
(sleep 10 && kill -9 $pid) &
or to get the exit codes as well:
# Spawn a child process:
(dosmth) & pid=$!
# in the background, sleep for 10 secs then kill that process
(sleep 10 && kill -9 $pid) & waiter=$!
# wait on our worker process and return the exitcode
exitcode=$(wait $pid && echo $?)
# kill the waiter subshell, if it still runs
kill -9 $waiter 2>/dev/null
# 0 if we killed the waiter, cause that means the process finished before the waiter
finished_gracefully=$?
sleep 999&
t=$!
sleep 10
kill $t
I also had this question and found two more things very useful:
The SECONDS variable in bash.
The command "pgrep".
So I use something like this on the command line (OSX 10.9):
ping www.goooooogle.com & PING_PID=$(pgrep 'ping'); SECONDS=0; while pgrep -q 'ping'; do sleep 0.2; if [ $SECONDS = 10 ]; then kill $PING_PID; fi; done
As this is a loop I included a "sleep 0.2" to keep the CPU cool. ;-)
(BTW: ping is a bad example anyway, you just would use the built-in "-t" (timeout) option.)
Assuming you have (or can easily make) a pid file for tracking the child's pid, you could then create a script that checks the modtime of the pid file and kills/respawns the process as needed. Then just put the script in crontab to run at approximately the period you need.
Let me know if you need more details. If that doesn't sound like it'd suit your needs, what about upstart?
One way is to run the program in a subshell, and communicate with the subshell through a named pipe with the read command. This way you can check the exit status of the process being run and communicate this back through the pipe.
Here's an example of timing out the yes command after 3 seconds. It gets the PID of the process using pgrep (possibly only works on Linux). There is also some problem with using a pipe in that a process opening a pipe for read will hang until it is also opened for write, and vice versa. So to prevent the read command hanging, I've "wedged" open the pipe for read with a background subshell. (Another way to prevent a freeze to open the pipe read-write, i.e. read -t 5 <>finished.pipe - however, that also may not work except with Linux.)
rm -f finished.pipe
mkfifo finished.pipe
{ yes >/dev/null; echo finished >finished.pipe ; } &
SUBSHELL=$!
# Get command PID
while : ; do
PID=$( pgrep -P $SUBSHELL yes )
test "$PID" = "" || break
sleep 1
done
# Open pipe for writing
{ exec 4>finished.pipe ; while : ; do sleep 1000; done } &
read -t 3 FINISHED <finished.pipe
if [ "$FINISHED" = finished ] ; then
echo 'Subprocess finished'
else
echo 'Subprocess timed out'
kill $PID
fi
rm finished.pipe
Here's an attempt which tries to avoid killing a process after it has already exited, which reduces the chance of killing another process with the same process ID (although it's probably impossible to avoid this kind of error completely).
run_with_timeout ()
{
t=$1
shift
echo "running \"$*\" with timeout $t"
(
# first, run process in background
(exec sh -c "$*") &
pid=$!
echo $pid
# the timeout shell
(sleep $t ; echo timeout) &
waiter=$!
echo $waiter
# finally, allow process to end naturally
wait $pid
echo $?
) \
| (read pid
read waiter
if test $waiter != timeout ; then
read status
else
status=timeout
fi
# if we timed out, kill the process
if test $status = timeout ; then
kill $pid
exit 99
else
# if the program exited normally, kill the waiting shell
kill $waiter
exit $status
fi
)
}
Use like run_with_timeout 3 sleep 10000, which runs sleep 10000 but ends it after 3 seconds.
This is like other answers which use a background timeout process to kill the child process after a delay. I think this is almost the same as Dan's extended answer (https://stackoverflow.com/a/5161274/1351983), except the timeout shell will not be killed if it has already ended.
After this program has ended, there will still be a few lingering "sleep" processes running, but they should be harmless.
This may be a better solution than my other answer because it does not use the non-portable shell feature read -t and does not use pgrep.
Here's the third answer I've submitted here. This one handles signal interrupts and cleans up background processes when SIGINT is received. It uses the $BASHPID and exec trick used in the top answer to get the PID of a process (in this case $$ in a sh invocation). It uses a FIFO to communicate with a subshell that is responsible for killing and cleanup. (This is like the pipe in my second answer, but having a named pipe means that the signal handler can write into it too.)
run_with_timeout ()
{
t=$1 ; shift
trap cleanup 2
F=$$.fifo ; rm -f $F ; mkfifo $F
# first, run main process in background
"$#" & pid=$!
# sleeper process to time out
( sh -c "echo \$\$ >$F ; exec sleep $t" ; echo timeout >$F ) &
read sleeper <$F
# control shell. read from fifo.
# final input is "finished". after that
# we clean up. we can get a timeout or a
# signal first.
( exec 0<$F
while : ; do
read input
case $input in
finished)
test $sleeper != 0 && kill $sleeper
rm -f $F
exit 0
;;
timeout)
test $pid != 0 && kill $pid
sleeper=0
;;
signal)
test $pid != 0 && kill $pid
;;
esac
done
) &
# wait for process to end
wait $pid
status=$?
echo finished >$F
return $status
}
cleanup ()
{
echo signal >$$.fifo
}
I've tried to avoid race conditions as far as I can. However, one source of error I couldn't remove is when the process ends near the same time as the timeout. For example, run_with_timeout 2 sleep 2 or run_with_timeout 0 sleep 0. For me, the latter gives an error:
timeout.sh: line 250: kill: (23248) - No such process
as it is trying to kill a process that has already exited by itself.
#Kill command after 10 seconds
timeout 10 command
#If you don't have timeout installed, this is almost the same:
sh -c '(sleep 10; kill "$$") & command'
#The same as above, with muted duplicate messages:
sh -c '(sleep 10; kill "$$" 2>/dev/null) & command'

Checking and killing hanged background processes in a bash script

Say I have this pseudocode in bash
#!/bin/bash
things
for i in {1..3}
do
nohup someScript[i] &
done
wait
for i in {4..6}
do
nohup someScript[i] &
done
wait
otherThings
and say this someScript[i] sometimes end up hanging.
Is there a way I can take the process IDs (with $!)
and check periodically if the process is taking more than a specified amount of time after which I want to kill the hanged processes with kill -9 ?
Unfortunately the answer from #Eugeniu did not work for me, timeout gave an error.
However I found useful doing this routine, I'll post it here so anyone can take advantage of it if in my same problem.
Create another script which goes like this
#!/bin/bash
#monitor.sh
pid=$1
counter=10
while ps -p $pid > /dev/null
do
if [[ $counter -eq 0 ]] ; then
kill -9 $pid
#if it's still there then kill it
fi
counter=$((counter-1))
sleep 1
done
then in the main work you just put
things
for i in {1..3}
do
nohup someScript[i] &
./monitor.sh $! &
done
wait
In this way for any of your someScript you will have a parallel process that checks if it's still there every chosen interval (until maximum time decided by the counter) and that actually quit itself if the associated process dies (or gets killed)
One possible approach:
#!/bin/bash
# things
mypids=()
for i in {1..3}; do
# launch the script with timeout (3600s)
timeout 3600 nohup someScript[i] &
mypids[i]=$! # store the PID
done
wait "${mypids[#]}"

How do I terminate all the subshell processes?

I have a bash script to test how a server performs under load.
num=1
if [ $# -gt 0 ]; then
num=$1
fi
for i in {1 .. $num}; do
(while true; do
{ time curl --silent 'http://localhost'; } 2>&1 | grep real
done) &
done
wait
When I hit Ctrl-C, the main process exits, but the background loops keep running. How do I make them all exit? Or is there a better way of spawning a configurable number of logic loops executing in parallel?
Here's a simpler solution -- just add the following line at the top of your script:
trap "kill 0" SIGINT
Killing 0 sends the signal to all processes in the current process group.
One way to kill subshells, but not self:
kill $(jobs -p)
Bit of a late answer, but for me solutions like kill 0 or kill $(jobs -p) go too far (kill all child processes).
If you just want to make sure one specific child-process (and its own children) are tidied up then a better solution is to kill by process group (PGID) using the sub-process' PID, like so:
set -m
./some_child_script.sh &
some_pid=$!
kill -- -${some_pid}
Firstly, the set -m command will enable job management (if it isn't already), this is important, as otherwise all commands, sub-shells etc. will be assigned to the same process group as your parent script (unlike when you run the commands manually in a terminal), and kill will just give a "no such process" error. This needs to be called before you run the background command you wish to manage as a group (or just call it at script start if you have several).
Secondly, note that the argument to kill is negative, this indicates that you want to kill an entire process group. By default the process group ID is the same as the first command in the group, so we can get it by simply adding a minus sign in front of the PID we fetched with $!. If you need to get the process group ID in a more complex case, you will need to use ps -o pgid= ${some_pid}, then add the minus sign to that.
Lastly, note the use of the explicit end of options --, this is important, as otherwise the process group argument will be treated as an option (signal number), and kill will complain it doesn't have enough arguments. You only need this if the process group argument is the first one you wish to terminate.
Here is a simplified example of a background timeout process, and how to cleanup as much as possible:
#!/bin/bash
# Use the overkill method in case we're terminated ourselves
trap 'kill $(jobs -p | xargs)' SIGINT SIGHUP SIGTERM EXIT
# Setup a simple timeout command (an echo)
set -m
{ sleep 3600; echo "Operation took longer than an hour"; } &
timeout_pid=$!
# Run our actual operation here
do_something
# Cancel our timeout
kill -- -${timeout_pid} >/dev/null 2>&1
wait -- -${timeout_pid} >/dev/null 2>&1
printf '' 2>&1
This should cleanly handle cancelling this simplistic timeout in all reasonable cases; the only case that can't be handled is the script being terminated immediately (kill -9), as it won't get a chance to cleanup.
I've also added a wait, followed by a no-op (printf ''), this is to suppress "terminated" messages that can be caused by the kill command, it's a bit of a hack, but is reliable enough in my experience.
You need to use job control, which, unfortunately, is a bit complicated. If these are the only background jobs that you expect will be running, you can run a command like this one:
jobs \
| perl -ne 'print "$1\n" if m/^\[(\d+)\][+-]? +Running/;' \
| while read -r ; do kill %"$REPLY" ; done
jobs prints a list of all active jobs (running jobs, plus recently finished or terminated jobs), in a format like this:
[1] Running sleep 10 &
[2] Running sleep 10 &
[3] Running sleep 10 &
[4] Running sleep 10 &
[5] Running sleep 10 &
[6] Running sleep 10 &
[7] Running sleep 10 &
[8] Running sleep 10 &
[9]- Running sleep 10 &
[10]+ Running sleep 10 &
(Those are jobs that I launched by running for i in {1..10} ; do sleep 10 & done.)
perl -ne ... is me using Perl to extract the job numbers of the running jobs; you can obviously use a different tool if you prefer. You may need to modify this script if your jobs has a different output format; but the above output is also on Cygwin, so it's very likely identical to yours.
read -r reads a "raw" line from standard input, and saves it into the variable $REPLY. kill %"$REPLY" will be something like kill %1, which "kills" (sends an interrupt signal to) job number 1. (Not to be confused with kill 1, which would kill process number 1.) Together, while read -r ; do kill %"$REPLY" ; done goes through each job number printed by the Perl script, and kills it.
By the way, your for i in {1 .. $num} won't do what you expect, since brace expansion is handled before parameter expansion, so what you have is equivalent to for i in "{1" .. "$num}". (And you can't have white-space inside the brace expansion, anyway.) Unfortunately, I don't know of a clean alternative; I think you have to do something like for i in $(bash -c "{1..$num}"), or else switch to an arithmetic for-loop or whatnot.
Also by the way, you don't need to wrap your while-loop in parentheses; & already causes the job to be run in a subshell.
Here's my eventual solution. I'm keeping track of the subshell process IDs using an array variable, and trapping the Ctrl-C signal to kill them.
declare -a subs #array of subshell pids
function kill_subs() {
for pid in ${subs[#]}; do
kill $pid
done
exit 0
}
num=1 if [ $# -gt 0 ]; then
num=$1 fi
for ((i=0;i < $num; i++)); do
while true; do
{ time curl --silent 'http://localhost'; } 2>&1 | grep real
done &
subs[$i]=$! #grab the pid of the subshell
done
trap kill_subs 1 2 15
wait
While these is not an answer, I just would like to point out something which invalidates the selected one; using jobs or kill 0 might have unexpected results; in my case it killed unintended processes which in my case is not an option.
It has been highlighted somehow in some of the answers but I am afraid not with enough stress or it has been not considered:
"Bit of a late answer, but for me solutions like kill 0 or kill $(jobs -p) go too far (kill all child processes)."
"If these are the only background jobs that you expect will be running, you can run a command like this one:"

Resources