Why do sleep & wait in bash? - bash

I'm having trouble understanding the startup commands for the services in this docker-compose.yml. The two relevant lines from the .yml are:
command: "/bin/sh -c 'while :; do sleep 6h & wait $${!}; nginx -s reload; done & nginx -g \"daemon off;\"'"
and
entrypoint: "/bin/sh -c 'trap exit TERM; while :; do certbot renew; sleep 12h & wait $${!}; done;'"
Why send the sleep command to the background and then wait on it? Why not just do sleep 6h directly? Also, is the double dollar sign just escaping the dollar sign in ${!}?
I'm finding other places where sleep and wait are used in conjunction, but none seem to have any explanation of why:
http://www.masteringunixshell.net/qa17/bash-how-to-wait-seconds.html
https://stackoverflow.com/a/13301329/828584
https://superuser.com/a/753984/98583

It makes sense to sleep in background and then wait, when one wants to handle signals in a timely manner.
When bash is executing an external command in the foreground, it does
not handle any signals received until the foreground process
terminates
(detailed explanation here).
While the second example implements a signal handler, for the first one it makes no difference whether the sleep is executed in foreground or not. There is no trap and the signal is not propagated to the nginx process.
To make it respond to the SIGTERM signal, the entrypoint should be something this:
/bin/sh -c 'nginx -g \"daemon off;\" & trap exit TERM; while :; do sleep 6h & wait $${!}; nginx -s reload; done'
To test it:
docker run --name test --rm --entrypoint="/bin/sh" nginx -c 'nginx -g "daemon off;" & trap exit TERM; while :; do sleep 20 & wait ${!}; echo running; done'
Stop the container
docker stop test
or send the TERM signal (docker stop sends a TERM followed by KILL if the main process does not exit)
docker kill --signal=SIGTERM test
By doing this, the scripts exits immediately. Now if we remove the wait ${!} the trap is executed when sleep ends. All that works well for the second example too.
Note: in both cases the intention is to check certificate renewal every 12h and reload the configuration every 6h as mentioned in the guide
The two commands do that just fine. IMHO the additional wait in the first example is just an oversight of the developers.
EDITED:
It seems the rationalization above, which was meant to give possible reasons behind the background sleep, might create some confusion.
(There is a related post Why use nginx with “daemon off” in background with docker?).
While the command suggested in the answer above is an improvement over the one in the question it is still flawed because, as mentioned in the linked post, the nginx server should be the main process and not a child. That can be easily achieved using the exec system call. The script becomes:
'while :; do sleep 6h; nginx -s reload; done & exec nginx -g "daemon off;"'
(More info in section Configure app as PID 1 in Docker best practices)
This, IMHO, is far better because not only is nginx monitored but it also handle signals. A configuration reload (nginx -s reload), for example, can also be done manually by simply sending the HUP signal to the docker container (See Controlling nginx).

The only reason I see:
If you killall -INT sleep, this won't affect main script.
Try this:
while true ;do sleep 12; echo yes;done
Then send a Interrupt signal:
killall -INT sleep
This will break the job!
Try now
while true ;do sleep 12 & wait $! ; echo yes;done
Then again:
killall -INT sleep
Job won't break!
Sample output, hitting killall -INT sleep from another window:
user#myhost:~$ while true ;do sleep 12; echo yes;done
break
user#myhost:~$ while true ;do sleep 12 & wait $! ; echo yes;done
[1] 30632
[1]+ Interrupt sleep 12
yes
[1] 30636
[1]+ Interrupt sleep 12
yes
[1] 30638
[1]+ Interrupt sleep 12
yes
[1] 30640

Related

Using sleep and wait -n to implement simple timeout in bash, race condition or not?

If I do this in a bash script:
sleep 10 &
sleep_pid=$!
some_command &
wait -n
cmd_pid=$!
if kill -0 $sleep_pid 2> /dev/null; then
# all ok
kill $sleep_pid
else
# some_command hung
...code to log diagnostics and then kill -9 $cmd_pid...
fi
where some_command is something that should be quick but can hang due to rare errors.
Is there then a risk that some_command can be done and cleaned up before "wait -n" starts, so there is only the sleep to wait for? Or does the '&' after one command guarantee that the shell won't call waitpid() on it until the next line of input has been handled?
It works in interactive shells. If you do:
sleep 10 &
sleep 0 &
wait -n
then the "wait -n" returns right away even if you wait a couple of seconds before running it. But I'm not sure if it can be trusted for non-interactive shells?
EDIT: Clarifying need for diagnostics + some grammar.
I believe you may be able to use the timeout command to do this.
http://man7.org/linux/man-pages/man1/timeout.1.html
timeout 10s command_to_run
You can check the exit status of the timeout command to know if it timed out.
timeout 2s sleep 10
if [[ $? -gt 0 ]]; then
echo "it timed out"
else
echo "It was successful"
fi
By using the $! variable, we avoid relying on interactive job control features. Try this:
...long executing command... &
pid_long=$!
sleep 3 &
pid_sleep=$!
wait -n
kill -KILL $pid_long
The problem here is PID recycling. Unlikely to happen in 3 seconds, though.
In the case when the command finishes earlier than the sleep (and its PID has not been recycled to a new process) kill produces an error message; we could pipe that to /dev/null.
We should probably also kill the sleep in case it is the one that is lingering.
As #CharlesDuffy pointed out in comments, the answer is no, there is no race (provided it is run in a non-interactive shell).
Also there is no need (in non-interactive shells) to make sure the wait comes directly after the command, as non-interactive shells don't do automatic reaping of children.
But I guess one should wrap this in a sub-shell, so "wait -n" won't return early due to some previously started unrelated background job.

shell script - how to stop "watch" command in the shell script [duplicate]

I have a bash script that launches a child process that crashes (actually, hangs) from time to time and with no apparent reason (closed source, so there isn't much I can do about it). As a result, I would like to be able to launch this process for a given amount of time, and kill it if it did not return successfully after a given amount of time.
Is there a simple and robust way to achieve that using bash?
P.S.: tell me if this question is better suited to serverfault or superuser.
(As seen in:
BASH FAQ entry #68: "How do I run a command, and have it abort (timeout) after N seconds?")
If you don't mind downloading something, use timeout (sudo apt-get install timeout) and use it like: (most Systems have it already installed otherwise use sudo apt-get install coreutils)
timeout 10 ping www.goooooogle.com
If you don't want to download something, do what timeout does internally:
( cmdpid=$BASHPID; (sleep 10; kill $cmdpid) & exec ping www.goooooogle.com )
In case that you want to do a timeout for longer bash code, use the second option as such:
( cmdpid=$BASHPID;
(sleep 10; kill $cmdpid) \
& while ! ping -w 1 www.goooooogle.com
do
echo crap;
done )
# Spawn a child process:
(dosmth) & pid=$!
# in the background, sleep for 10 secs then kill that process
(sleep 10 && kill -9 $pid) &
or to get the exit codes as well:
# Spawn a child process:
(dosmth) & pid=$!
# in the background, sleep for 10 secs then kill that process
(sleep 10 && kill -9 $pid) & waiter=$!
# wait on our worker process and return the exitcode
exitcode=$(wait $pid && echo $?)
# kill the waiter subshell, if it still runs
kill -9 $waiter 2>/dev/null
# 0 if we killed the waiter, cause that means the process finished before the waiter
finished_gracefully=$?
sleep 999&
t=$!
sleep 10
kill $t
I also had this question and found two more things very useful:
The SECONDS variable in bash.
The command "pgrep".
So I use something like this on the command line (OSX 10.9):
ping www.goooooogle.com & PING_PID=$(pgrep 'ping'); SECONDS=0; while pgrep -q 'ping'; do sleep 0.2; if [ $SECONDS = 10 ]; then kill $PING_PID; fi; done
As this is a loop I included a "sleep 0.2" to keep the CPU cool. ;-)
(BTW: ping is a bad example anyway, you just would use the built-in "-t" (timeout) option.)
Assuming you have (or can easily make) a pid file for tracking the child's pid, you could then create a script that checks the modtime of the pid file and kills/respawns the process as needed. Then just put the script in crontab to run at approximately the period you need.
Let me know if you need more details. If that doesn't sound like it'd suit your needs, what about upstart?
One way is to run the program in a subshell, and communicate with the subshell through a named pipe with the read command. This way you can check the exit status of the process being run and communicate this back through the pipe.
Here's an example of timing out the yes command after 3 seconds. It gets the PID of the process using pgrep (possibly only works on Linux). There is also some problem with using a pipe in that a process opening a pipe for read will hang until it is also opened for write, and vice versa. So to prevent the read command hanging, I've "wedged" open the pipe for read with a background subshell. (Another way to prevent a freeze to open the pipe read-write, i.e. read -t 5 <>finished.pipe - however, that also may not work except with Linux.)
rm -f finished.pipe
mkfifo finished.pipe
{ yes >/dev/null; echo finished >finished.pipe ; } &
SUBSHELL=$!
# Get command PID
while : ; do
PID=$( pgrep -P $SUBSHELL yes )
test "$PID" = "" || break
sleep 1
done
# Open pipe for writing
{ exec 4>finished.pipe ; while : ; do sleep 1000; done } &
read -t 3 FINISHED <finished.pipe
if [ "$FINISHED" = finished ] ; then
echo 'Subprocess finished'
else
echo 'Subprocess timed out'
kill $PID
fi
rm finished.pipe
Here's an attempt which tries to avoid killing a process after it has already exited, which reduces the chance of killing another process with the same process ID (although it's probably impossible to avoid this kind of error completely).
run_with_timeout ()
{
t=$1
shift
echo "running \"$*\" with timeout $t"
(
# first, run process in background
(exec sh -c "$*") &
pid=$!
echo $pid
# the timeout shell
(sleep $t ; echo timeout) &
waiter=$!
echo $waiter
# finally, allow process to end naturally
wait $pid
echo $?
) \
| (read pid
read waiter
if test $waiter != timeout ; then
read status
else
status=timeout
fi
# if we timed out, kill the process
if test $status = timeout ; then
kill $pid
exit 99
else
# if the program exited normally, kill the waiting shell
kill $waiter
exit $status
fi
)
}
Use like run_with_timeout 3 sleep 10000, which runs sleep 10000 but ends it after 3 seconds.
This is like other answers which use a background timeout process to kill the child process after a delay. I think this is almost the same as Dan's extended answer (https://stackoverflow.com/a/5161274/1351983), except the timeout shell will not be killed if it has already ended.
After this program has ended, there will still be a few lingering "sleep" processes running, but they should be harmless.
This may be a better solution than my other answer because it does not use the non-portable shell feature read -t and does not use pgrep.
Here's the third answer I've submitted here. This one handles signal interrupts and cleans up background processes when SIGINT is received. It uses the $BASHPID and exec trick used in the top answer to get the PID of a process (in this case $$ in a sh invocation). It uses a FIFO to communicate with a subshell that is responsible for killing and cleanup. (This is like the pipe in my second answer, but having a named pipe means that the signal handler can write into it too.)
run_with_timeout ()
{
t=$1 ; shift
trap cleanup 2
F=$$.fifo ; rm -f $F ; mkfifo $F
# first, run main process in background
"$#" & pid=$!
# sleeper process to time out
( sh -c "echo \$\$ >$F ; exec sleep $t" ; echo timeout >$F ) &
read sleeper <$F
# control shell. read from fifo.
# final input is "finished". after that
# we clean up. we can get a timeout or a
# signal first.
( exec 0<$F
while : ; do
read input
case $input in
finished)
test $sleeper != 0 && kill $sleeper
rm -f $F
exit 0
;;
timeout)
test $pid != 0 && kill $pid
sleeper=0
;;
signal)
test $pid != 0 && kill $pid
;;
esac
done
) &
# wait for process to end
wait $pid
status=$?
echo finished >$F
return $status
}
cleanup ()
{
echo signal >$$.fifo
}
I've tried to avoid race conditions as far as I can. However, one source of error I couldn't remove is when the process ends near the same time as the timeout. For example, run_with_timeout 2 sleep 2 or run_with_timeout 0 sleep 0. For me, the latter gives an error:
timeout.sh: line 250: kill: (23248) - No such process
as it is trying to kill a process that has already exited by itself.
#Kill command after 10 seconds
timeout 10 command
#If you don't have timeout installed, this is almost the same:
sh -c '(sleep 10; kill "$$") & command'
#The same as above, with muted duplicate messages:
sh -c '(sleep 10; kill "$$" 2>/dev/null) & command'

inconsistent signal behavior? Only works for the first signal?

Trying to have a script that is able to restart itself with exec (so it can pick up any "upgrade") given a specific signal (tried SIGHUP & SIGUSR1).
This seems to work the first time, but not the second, even tho the registration (trap) does recur in the execed instance (which is still the same PID).
#!/usr/bin/env bash
set -x
readonly PROGNAME="${0}"
function run_prog()
{
echo hi
sleep 2
echo ho
sleep 1000 &
wait $!
}
restart()
{
sleep 5
exec "${PROGNAME}"
}
trap restart USR1
echo -e "TRAPS:"
trap
echo
run_prog
This is how I run it:
./tst.sh & TSTPID=$! # Starts ok, see both "hi" & "ho" messages
sleep 10
kill -USR1 ${TSTPID} # Restarts ok, see both "hi" & "ho" messages
sleep 10
kill -USR1 ${TSTPID} # NOTHING HAPPENS
sleep 5
kill ${TSTPID}
Any idea why the second signal is ignored? (some code, like de-registering the trap in the cleanup may just be paranoia)
Maybe because you're execing from a signal handler, the signal code is continuing to run and continuing into oblivion, due to the exec, or preventing other cleanup code or daisy-chained handlers from executing.
Who knows what's going on in the blackbox of the OS signal handling code and bash's own layering over it that might be circumvented by exec. exec is a very draconian measure :-)
Also check out this cool bash site. I'm looking for the bash source code that handles signals. Just curious.
Your solution here is the right approach:
#!/usr/bin/env bash
set -x
readonly PROGNAME="${0}"
DO_RESTART=
function run_prog()
{
echo hi
sleep 2
echo ho
sleep 1000 &
SLEEPPID=$!
#builtin
wait ${SLEEPPID}
}
trap DO_RESTART=1 SIGUSR1
echo -e "TRAPS:"
trap -p
echo
run_prog
if [ -n "${DO_RESTART}" ]; then
sleep 5
kill ${SLEEPPID}
exec "${PROGNAME}"
fi

Why does my bash script take so long to respond to kill when it runs in the background?

(Question revised, now that I understand more about what's actually happening):
I have a script that runs in the background, periodically doing some work and then sleeping for 30 seconds:
echo "background script PID: $$"
trap 'echo "Exiting..."' INT EXIT
while true; do
# check for stuff to do, do it
sleep 30
done &
If I try to kill this script via kill or kill INT, it takes 30 seconds to respond to the signal.
I will answer this question below, since I found a good explanation online.
(My original, embarrassingly un-researched question)
This question is for a bash script that includes the following trap:
trap 'echo "Exiting...">&2; kill $childPID 2>/dev/null; exit 0' \
SIGALRM SIGHUP SIGINT SIGKILL SIGPIPE SIGPROF SIGTERM \
SIGUSR1 SIGUSR2 SIGVTALRM SIGSTKFLT
If I run the script in the foreground, and hit
CTRL-C, it gets the signal immediately and exits
(under one sec).
If I run the same script in the background (&), and kill it via
kill or kill -INT, it takes 30 seconds before getting the signal.
Why is that, and how can I fix it?
As explained in http://mywiki.wooledge.org/SignalTrap --
"When bash is executing an external command in the foreground, it does not handle any signals received until the foreground process terminates" - and since sleep is an external command, bash does not even see the signal until sleep finishes.
That page has a very good overview of signal processing in bash, and work-arounds to this issue. Briefly, one correct way of handling the situation is to send the signal to the process group instead of just the parent process:
kill -INT -123 # will kill the process group with the ID 123
Head over to the referenced page for a full explanation (no sense in my reproducing any more of it here).
Possible reason: signals issued while a process is sleeping are not delivered until wake-up of the process. When started via the command line, the process doesn't sleep, so the signal gets delivered immediately.
#RashaMatt, I was unable to get the read command to work as advertised on Greg's wiki. Sending a signal to the script simply did not interrupt the read. I needed to do this:
#!/bin/bash
bail() {
echo "exiting"
kill $readpid
rm -rf $TMPDIR
exit 0
}
sig2() {
echo "doing stuff"
}
echo Shell $$ started.
trap sig2 SIGUSR2
trap bail SIGUSR1 SIGHUP SIGINT SIGQUIT SIGTERM
trap -p
TMPDIR=$(mktemp -p /tmp -d .daemonXXXXXXX)
chmod 700 $TMPDIR
mkfifo $TMPDIR/fifo
chmod 400 $TMPDIR/fifo
while : ; do
read < $TMPDIR/fifo & readpid=$!
wait $readpid
done
...send the desired signal to the shell's pid displayed from the Shell $$ started line, and watch the excitement.
waiting on a sleep is simpler, true, but some os' don't have sleep infinity, and I wanted to see how Greg's read example would work (which it didn't).

Letting other users stop/restart simple bash daemons – use signals or what?

I have a web server where I run some slow-starting programs as daemons. These sometimes need quick restarting (or stopping) when I recompile them or switch to another installation of them.
Inspired by http://mywiki.wooledge.org/ProcessManagement, I'm writing a script
called daemonise.sh that looks like
#!/bin/sh
while :; do
./myprogram lotsadata.xml
echo "Restarting server..." 1>&2
done
to keep a "daemon" running. Since I sometimes need to stop it, or just
restart it, I run that script in a screen session, like:
$ ./daemonise.sh & DPID=$!
$ screen -d
Then perhaps I recompile myprogram, install it to a new path, start
the new one up and want to kill the old one:
$ screen -r
$ kill $DPID
$ screen -d
This works fine when I'm the only maintainer, but now I want to let
someone else stop/restart the program, no matter who started it. And
to make things more complicated, the daemonise.sh script in fact
starts about 16 programs, making it a hassle to kill every single one
if you don't know their PIDs.
What would be the "best practices" way of letting another user
stop/restart the daemons?
I thought about shared screen sessions, but that just sounds hacky and
insecure. The best solution I've come up with for now is to wrap
starting and killing in a script that catches certain signals:
#!/bin/bash
DPID=
trap './daemonise.sh & DPID=$!' USR1
trap 'kill $DPID' USR2 EXIT
# Ensure trapper wrapper doesn't exit:
while :; do
sleep 10000 & wait $!
done
Now, should another user need to stop the daemons and I can't do it,
she just has to know the pid of the wrapper, and e.g. sudo kill -s
USR2 $wrapperpid. (Also, this makes it possible to run the daemons
on reboots, and still kill them cleanly.)
Is there a better solution? Are there obvious problems with this
solution that I'm not seeing?
(After reading Greg's Bash Wiki, I'd like to avoid any solution involving pgrep or PID-files …)
I recommend a PID based init script. Anyone with sudo privileged to the script will be able to start and stop the server processes.
On improving your approach: wouldn't it be advisable to make sure that your sleep command in sleep 10000 & wait $! gets properly terminated if your pidwrapper script exits somehow?
Otherwise there would remain a dangling sleep process in the process table for quite some time.
Similarly, wouldn't it be cleaner to terminate myprogram in daemonise.sh properly on restart (i. e. if daemonise.sh receives a TERM signal)?
In addition, it is possible to suppress job notification messages and test for pid existence before killing.
#!/bin/sh
# cat daemonise.sh
# cf. "How to suppress Terminated message after killing in bash?",
# http://stackoverflow.com/q/81520
trap '
echo "server shut down..." 1>&2
kill $spid1 $spid2 $spid3 &&
wait $spid1 $spid2 $spid3 2>/dev/null
exit
' TERM
while :; do
echo "Starting server..." 1>&2
#./myprogram lotsadata.xml
sleep 100 &
spid1=${!}
sleep 100 &
spid2=${!}
sleep 100 &
spid3=${!}
wait
echo "Restarting server..." 1>&2
done
#------------------------------------------------------------
#!/bin/bash
# cat pidwrapper
DPID=
trap '
kill -0 ${!} 2>/dev/null && kill ${!} && wait ${!} 2>/dev/null
./daemonise.sh & DPID=${!}
' USR1
trap '
kill -0 ${!} 2>/dev/null && kill ${!} && wait ${!} 2>/dev/null
kill -0 $DPID 2>/dev/null && kill $DPID && wait ${DPID} 2>/dev/null
' USR2
trap '
trap - EXIT
kill -0 $DPID 2>/dev/null && kill $DPID && wait ${DPID} 2>/dev/null
kill -0 ${!} 2>/dev/null && kill ${!} && wait ${!} 2>/dev/null
exit 0
' EXIT
# Ensure trapper wrapper does not exit:
while :; do
sleep 10000 & wait $!
done
#------------------------------------------------------------
# test
{
wrapperpid="`exec sh -c './pidwrapper & echo ${!}' | head -1`"
echo "wrapperpid: $wrapperpid"
for n in 1 2 3 4 5; do
sleep 2
# start daemonise.sh
kill -s USR1 $wrapperpid
sleep 2
# kill daemonise.sh
kill -s USR2 $wrapperpid
done
sleep 2
echo kill $wrapperpid
kill $wrapperpid
}

Resources