Using sleep and wait -n to implement simple timeout in bash, race condition or not? - bash

If I do this in a bash script:
sleep 10 &
sleep_pid=$!
some_command &
wait -n
cmd_pid=$!
if kill -0 $sleep_pid 2> /dev/null; then
# all ok
kill $sleep_pid
else
# some_command hung
...code to log diagnostics and then kill -9 $cmd_pid...
fi
where some_command is something that should be quick but can hang due to rare errors.
Is there then a risk that some_command can be done and cleaned up before "wait -n" starts, so there is only the sleep to wait for? Or does the '&' after one command guarantee that the shell won't call waitpid() on it until the next line of input has been handled?
It works in interactive shells. If you do:
sleep 10 &
sleep 0 &
wait -n
then the "wait -n" returns right away even if you wait a couple of seconds before running it. But I'm not sure if it can be trusted for non-interactive shells?
EDIT: Clarifying need for diagnostics + some grammar.

I believe you may be able to use the timeout command to do this.
http://man7.org/linux/man-pages/man1/timeout.1.html
timeout 10s command_to_run
You can check the exit status of the timeout command to know if it timed out.
timeout 2s sleep 10
if [[ $? -gt 0 ]]; then
echo "it timed out"
else
echo "It was successful"
fi

By using the $! variable, we avoid relying on interactive job control features. Try this:
...long executing command... &
pid_long=$!
sleep 3 &
pid_sleep=$!
wait -n
kill -KILL $pid_long
The problem here is PID recycling. Unlikely to happen in 3 seconds, though.
In the case when the command finishes earlier than the sleep (and its PID has not been recycled to a new process) kill produces an error message; we could pipe that to /dev/null.
We should probably also kill the sleep in case it is the one that is lingering.

As #CharlesDuffy pointed out in comments, the answer is no, there is no race (provided it is run in a non-interactive shell).
Also there is no need (in non-interactive shells) to make sure the wait comes directly after the command, as non-interactive shells don't do automatic reaping of children.
But I guess one should wrap this in a sub-shell, so "wait -n" won't return early due to some previously started unrelated background job.

Related

Sleep seems to run first in bash script

I need to terminate a script if it exceeds a specific duration (10 mins)
examplescript.sh &
pid=$!
sleep 600
if ['pgrep $pid']
then
kill $pid
fi
When I tested it on my test environment, it seems working well. examplescript.sh runs first and if it runs for more than 10 mins, it will be terminated. However, when I tried in our production environment, it seems that sleep runs first. It waits 600s before running the examplescript.sh. Is there something wrong in the script?
There is multiply thing you should correct in your code.
pgrep will make a regex search on process names not pids. You can use kill -0 pid to check if a process with pid is running.
[ (test) is a command[1] and should be treated as one. That means each argument should be separated by spaces. When using [ the last argument should also be ]:
[ arg1 arg2 ]
In your example you wont need [ since kill -0 will exit truly if the process is still running:
if kill -0 pid; then
And to wrap it up:
examplescript.sh &
pid=$!
sleep 600
if kill -0 "$pid" 2> /dev/null; then
kill "$pid"
fi
kill -0 will write an error to stderr if the process is not running anymore. So we redirect that to /dev/null.
[1] It's usually a build-in these days.
Another thing to note is that your script will run for 600 seconds even though examplescript.sh will only take a few seconds to run.
Are your production machines significantly faster? I do not have example script to really run this on my machine, but I think your problem might be solved if you take the code you mention above
examplescript.sh &
pid=$!
sleep 600
if ['pgrep $pid']
then
kill $pid
fi
put it in a file called, say, monitor.sh and run that file in the background. i.e.
monitor.sh &
Hope this helps.

Wait for process to finish, or user input

I have a backgrounded process that I would like to wait for (in case it fails or dies), unless I receive user input. Said another way, the user input should interrupt my waiting.
Here's a simplified snippet of my code
#!/bin/bash
...
mplayer -noconsolecontrols "$media_url" &
sleep 10 # enough time for it to fail
ps -p $!
if [ $? -ne 0 ]
then
fallback
else
read
kill $!
fi
The line that I particularly dislike is sleep 10, which is bad because it could be too much time, or not enough time.
Is there a way to wait $! || read or the equivalent?
Use kill -0 to validate that the process is still there and read with a timeout of 0 to test for user input. Something like this?
pid=$!
while kill -0 $pid; do
read -t 0 && exit
sleep 1
done
Original
ps -p to check the process. read -t 1 to wait for user input.
pid=$!
got_input=142
while ps -p $pid > /dev/null; do
if read -t 1; then
got_input=$?
kill $pid
fi
done
This allows for branching based whether the process died, or was killed due to user input.
All credit to gubblebozer. The only reason I'm posting this answer is the claim by moderators that my edits to his post constituted altering his intent.
Anti Race-Condition
First off, a race condition involving pids is (very likely) not a concern if you're fairly quick, because they're reused on a cycle.
Even so, I guess anything is possible... Here's some code that handles that possibility, without breaking your head on traps.
got_input=142
while true; do
if read -t 1; then
got_input=$?
pkill --ns $$ name > /dev/null
break
elif ! pgrep --ns $$ name > /dev/null; then
break
fi
done
Now, we've accomplished our goal, while (probably) completely eliminating the race condition.
Any loop with a sleep or similar timeout in it, will introduce a race condition. It's better to actively wait for the process to die, or, in this case, to trap the signal that's sent when a child dies.
#!/bin/bash
set -o monitor
trap stop_process SIGCHLD
stop_process()
{
echo sigchld received
exit
}
# the background process: (this simulates a process that exits after 10 seconds)
sleep 10 &
procpid=$!
echo pid of process: $procpid
echo -n hit enter:
read
# not reached when SIGCHLD is received
echo killing pid $procpid
kill $procpid
I'm not 100% sure this eliminates any race condition, but it's a lot closer than a sleep loop.
edit: the shorter, less verbose version
#!/bin/bash
set -o monitor
trap exit SIGCHLD
sleep 5 &
read -p "hit enter: "
kill $!
edit 2: setting the trap before starting the background process prevents another race condition in which the process would die before the trap was installed

Checking and killing hanged background processes in a bash script

Say I have this pseudocode in bash
#!/bin/bash
things
for i in {1..3}
do
nohup someScript[i] &
done
wait
for i in {4..6}
do
nohup someScript[i] &
done
wait
otherThings
and say this someScript[i] sometimes end up hanging.
Is there a way I can take the process IDs (with $!)
and check periodically if the process is taking more than a specified amount of time after which I want to kill the hanged processes with kill -9 ?
Unfortunately the answer from #Eugeniu did not work for me, timeout gave an error.
However I found useful doing this routine, I'll post it here so anyone can take advantage of it if in my same problem.
Create another script which goes like this
#!/bin/bash
#monitor.sh
pid=$1
counter=10
while ps -p $pid > /dev/null
do
if [[ $counter -eq 0 ]] ; then
kill -9 $pid
#if it's still there then kill it
fi
counter=$((counter-1))
sleep 1
done
then in the main work you just put
things
for i in {1..3}
do
nohup someScript[i] &
./monitor.sh $! &
done
wait
In this way for any of your someScript you will have a parallel process that checks if it's still there every chosen interval (until maximum time decided by the counter) and that actually quit itself if the associated process dies (or gets killed)
One possible approach:
#!/bin/bash
# things
mypids=()
for i in {1..3}; do
# launch the script with timeout (3600s)
timeout 3600 nohup someScript[i] &
mypids[i]=$! # store the PID
done
wait "${mypids[#]}"

How do I terminate all the subshell processes?

I have a bash script to test how a server performs under load.
num=1
if [ $# -gt 0 ]; then
num=$1
fi
for i in {1 .. $num}; do
(while true; do
{ time curl --silent 'http://localhost'; } 2>&1 | grep real
done) &
done
wait
When I hit Ctrl-C, the main process exits, but the background loops keep running. How do I make them all exit? Or is there a better way of spawning a configurable number of logic loops executing in parallel?
Here's a simpler solution -- just add the following line at the top of your script:
trap "kill 0" SIGINT
Killing 0 sends the signal to all processes in the current process group.
One way to kill subshells, but not self:
kill $(jobs -p)
Bit of a late answer, but for me solutions like kill 0 or kill $(jobs -p) go too far (kill all child processes).
If you just want to make sure one specific child-process (and its own children) are tidied up then a better solution is to kill by process group (PGID) using the sub-process' PID, like so:
set -m
./some_child_script.sh &
some_pid=$!
kill -- -${some_pid}
Firstly, the set -m command will enable job management (if it isn't already), this is important, as otherwise all commands, sub-shells etc. will be assigned to the same process group as your parent script (unlike when you run the commands manually in a terminal), and kill will just give a "no such process" error. This needs to be called before you run the background command you wish to manage as a group (or just call it at script start if you have several).
Secondly, note that the argument to kill is negative, this indicates that you want to kill an entire process group. By default the process group ID is the same as the first command in the group, so we can get it by simply adding a minus sign in front of the PID we fetched with $!. If you need to get the process group ID in a more complex case, you will need to use ps -o pgid= ${some_pid}, then add the minus sign to that.
Lastly, note the use of the explicit end of options --, this is important, as otherwise the process group argument will be treated as an option (signal number), and kill will complain it doesn't have enough arguments. You only need this if the process group argument is the first one you wish to terminate.
Here is a simplified example of a background timeout process, and how to cleanup as much as possible:
#!/bin/bash
# Use the overkill method in case we're terminated ourselves
trap 'kill $(jobs -p | xargs)' SIGINT SIGHUP SIGTERM EXIT
# Setup a simple timeout command (an echo)
set -m
{ sleep 3600; echo "Operation took longer than an hour"; } &
timeout_pid=$!
# Run our actual operation here
do_something
# Cancel our timeout
kill -- -${timeout_pid} >/dev/null 2>&1
wait -- -${timeout_pid} >/dev/null 2>&1
printf '' 2>&1
This should cleanly handle cancelling this simplistic timeout in all reasonable cases; the only case that can't be handled is the script being terminated immediately (kill -9), as it won't get a chance to cleanup.
I've also added a wait, followed by a no-op (printf ''), this is to suppress "terminated" messages that can be caused by the kill command, it's a bit of a hack, but is reliable enough in my experience.
You need to use job control, which, unfortunately, is a bit complicated. If these are the only background jobs that you expect will be running, you can run a command like this one:
jobs \
| perl -ne 'print "$1\n" if m/^\[(\d+)\][+-]? +Running/;' \
| while read -r ; do kill %"$REPLY" ; done
jobs prints a list of all active jobs (running jobs, plus recently finished or terminated jobs), in a format like this:
[1] Running sleep 10 &
[2] Running sleep 10 &
[3] Running sleep 10 &
[4] Running sleep 10 &
[5] Running sleep 10 &
[6] Running sleep 10 &
[7] Running sleep 10 &
[8] Running sleep 10 &
[9]- Running sleep 10 &
[10]+ Running sleep 10 &
(Those are jobs that I launched by running for i in {1..10} ; do sleep 10 & done.)
perl -ne ... is me using Perl to extract the job numbers of the running jobs; you can obviously use a different tool if you prefer. You may need to modify this script if your jobs has a different output format; but the above output is also on Cygwin, so it's very likely identical to yours.
read -r reads a "raw" line from standard input, and saves it into the variable $REPLY. kill %"$REPLY" will be something like kill %1, which "kills" (sends an interrupt signal to) job number 1. (Not to be confused with kill 1, which would kill process number 1.) Together, while read -r ; do kill %"$REPLY" ; done goes through each job number printed by the Perl script, and kills it.
By the way, your for i in {1 .. $num} won't do what you expect, since brace expansion is handled before parameter expansion, so what you have is equivalent to for i in "{1" .. "$num}". (And you can't have white-space inside the brace expansion, anyway.) Unfortunately, I don't know of a clean alternative; I think you have to do something like for i in $(bash -c "{1..$num}"), or else switch to an arithmetic for-loop or whatnot.
Also by the way, you don't need to wrap your while-loop in parentheses; & already causes the job to be run in a subshell.
Here's my eventual solution. I'm keeping track of the subshell process IDs using an array variable, and trapping the Ctrl-C signal to kill them.
declare -a subs #array of subshell pids
function kill_subs() {
for pid in ${subs[#]}; do
kill $pid
done
exit 0
}
num=1 if [ $# -gt 0 ]; then
num=$1 fi
for ((i=0;i < $num; i++)); do
while true; do
{ time curl --silent 'http://localhost'; } 2>&1 | grep real
done &
subs[$i]=$! #grab the pid of the subshell
done
trap kill_subs 1 2 15
wait
While these is not an answer, I just would like to point out something which invalidates the selected one; using jobs or kill 0 might have unexpected results; in my case it killed unintended processes which in my case is not an option.
It has been highlighted somehow in some of the answers but I am afraid not with enough stress or it has been not considered:
"Bit of a late answer, but for me solutions like kill 0 or kill $(jobs -p) go too far (kill all child processes)."
"If these are the only background jobs that you expect will be running, you can run a command like this one:"

Kill process after a given time bash?

I have a script that tries to make a DB connection using another program and the timeout(2.5min) of the program is to long. I want to add this functionality to the script.
If it takes longer then 5 seconds to connect, kill the process
Else kill the sleep/kill process.
The issue I'm having is how bash reports when a process is killed, that's because the processes are in the same shell just the background. Is there a better way to do this or how can I silence the shell for the kill commands?
DB_CONNECTION_PROGRAM > $CONNECTFILE &
pid=$!
(sleep 5; kill $pid) &
sleep_pid=$!
wait $pid
# If the DB failed to connect after 5 seconds and was killed
status=$? #Kill returns 128+n (fatal error)
if [ $status -gt 128 ]; then
no_connection="ERROR: Timeout while trying to connect to $dbserver"
else # If it connected kill the sleep and any errors collect
kill $sleep_pid
no_connection=`sed -n '/^ERROR:/,$p' $CONNECTFILE`
fi
There's a GNU coreutils utility called timeout: http://www.gnu.org/s/coreutils/manual/html_node/timeout-invocation.html
If you have it on your platform, you could do:
timeout 5 CONNECT_TO_DB
if [ $? -eq 124 ]; then
# Timeout occurred
else
# No hang
fi
I don't know if it's identical but I did fix a similar issue a few years ago. However I'm a programmer, not a Unix-like sysadmin so take the following with a grain of salt because my Bash-fu may not be that strong...
Basically I did fork, fork and fork : )
Out of memory After founding back my old code (which I amazingly still use daily) because my memory wasn't good enough, in Bash it worked a bit like this:
commandThatMayHang.sh 2 > /dev/null 2>&1 & # notice that last '&', we're forking
MAYBE_HUNG_PID=$!
sleepAndMaybeKill.sh $MAYBE_HUNG_PID 2 > /dev/null 2>&1 & # we're forking again
SLEEP_AND_MAYBE_KILL_PID=$!
wait $MAYBE_HUNG_PID > /dev/null 2>&1
if [ $? -eq 0 ]
# commandThatMayHand.sh did not hang, fine, no need to monitor it anymore
kill -9 $SLEEP_AND_MAYBE_KILL 2> /dev/null 2>&1
fi
where sleepAndMaybeKill.sh sleeps the amount of time you want and then kills commandThatMayHand.sh.
So basically the two scenario are:
your command exits fine (before your 5 seconds timeout or whatever) and so the wait stop as soon as your command exits fine (and kills the "killer" because it's not needed anymore
the command locks up, the killer ends up killing the command
In any case you're guaranteed to either succeed as soon as the command is done or to fail after the timeout.
You can set a timeout after 2 hours and restart your javaScriptThatStalls 100 times this way in a loop
seq 100|xargs -II timeout $((2 * 60 * 60)) javaScriptThatStalls
Do you mean you don't want the error message printed if the process isn't still running? Then you could just redirect stderr: kill $pid 2>/dev/null.
You could also check whether the process is still running:
if ps -p $pid >/dev/null; then kill $pid; fi
I found this bash script
timeout.sh
by Anthony Thyssen (his web). Looks good.

Resources