Docker container with a shell script ignores SIGTERM - bash

I have a very simple Docker container which runs a bash script:
# syntax=docker/dockerfile:1.4
FROM alpine:3
WORKDIR /app
RUN apk add --no-cache \
curl bash sed uuidgen
COPY demo.sh /app/demo.sh
RUN chmod +x /app/*.sh
CMD ["bash", "/app/demo.sh"]
#!/bin/bash
echo "Test 123.."
sleep 5m
echo "After sleep"
When running the container with docker run <image> the container cannot be stopped with docker stop <name>, it can only be killed.
I tried searching but everything with "bash" and "docker" leads me to managing docker on host with shell scripts.

sleep is an example of an uninterruptible command; your shell never receives the SIGTERM until sleep completes.
A common workaround is to run sleep in the background, and immediately wait on it, so that it's the shell built-in wait that's running when the signal arrives, and wait is interruptible.
echo "Test 123..."
sleep 5 & wait
echo "After sleep"

Can you try to add this before the sleep statement ?
trap "echo Container received EXIT" EXIT
Or do docker stop -t 5 container for example.

Related

Docker bash shell script does not catch SIGINT or SIGTERM

I have the following two files in a directory:
Dockerfile
FROM debian
WORKDIR /app
COPY start.sh /app/
CMD ["/app/start.sh"]
start.sh (with permissions 755 using chmod +x start.sh)
#!/bin/bash
trap "echo SIGINT; exit" SIGINT
trap "echo SIGTERM; exit" SIGTERM
echo Starting script
sleep 100000
I then run the following commands:
$ docker build . -t tmp
$ docker run --name tmp tmp
I then expect that pressing Ctrl+C would send a SIGINT to the program, which would print SIGINT to the screen then exit, but that doesn't happen.
I also try running $ docker stop tmp, which I expect would send a SIGTERM to the program, but checking $ docker logs tmp after shows that SIGTERM was not caught.
Why are SIGINT and SIGTERM not being caught by the bash script?
Actually, your Dockerfile and start.sh entrypoint script work as is for me with Ctrl+C, provided you run the container with one of the following commands:
docker run --name tmp -it tmp
docker run --rm -it tmp
Documentation details
As specified in docker run --help:
the --interactive = -i CLI flag asks to keep STDIN open even if not attached
(typically useful for an interactive shell, or when also passing the --detach = -d CLI flag)
the --tty = -t CLI flag asks to allocate a pseudo-TTY
(which notably forwards signals to the shell entrypoint, especially useful for your use case)
Related remarks
For completeness, note that there are several related issues that can make docker stop take too much time and "fall back" to docker kill, which can arise when the shell entrypoint starts some other process(es):
First, when the last line of the shell entrypoint runs another, main program, don't forget to prepend this line with the exec builtin:
exec prog arg1 arg2 ...
But when the shell entrypoint is intended to run for a long time, trapping signals (at least INT / TERM, but not KILL) is very important;
{see also this SO question: Docker Run Script to catch interruption signal}
Otherwise, if the signals are not forwarded to the children processes, we run the risk of hitting the "PID 1 zombie reaping problem", for instance
{see also this SO question for details: Speed up docker-compose shutdown}
CTRL+C sends a signal to docker running on that console.
To send a signal to the script you could use
docker exec -it <containerId> /bin/sh -c "pkill -INT -f 'start\.sh'"
Or include echo "my PID: $$" on your script and send
docker exec -it <containerId> /bin/sh -c "kill -INT <script pid>"
Some shell implementations in docker might ignore the signal.
This script will correctly react to pkill -15. Please note that signals are specified without the SIG prefix.
#!/bin/sh
trap "touch SIGINT.tmp; ls -l; exit" INT TERM
trap "echo 'really exiting'; exit" EXIT
echo Starting script
while true; do sleep 1; done
The long sleep command was replaced by an infinite loop of short ones since sleep may ignore some signals.
The solution I found was to just use the --init flag.
docker run --init [MORE OPTIONS] IMAGE [COMMAND] [ARG...]
Per their docs...

Linux script run with run-this-one doesn't work with docker

I'm experiencing an issue in which I run a command in a cronjob and want to make sure that it's not already being executed. I achieve that running as run-one [command] (man-page).
If I want to cancel the already running command and force the new command to run, I run as run-this-one [command].
At least this is what I expected, but if the command runs a docker container, the other process seems to be terminated (but isn't), the terminal shows Terminated, but continues to show the command output that is running in the container (but the commands after the container ends running are not executed). In this case, the command that runs run-this-one is not executed (not expected).
Example:
/path/to/file.sh
#!/bin/bash
set -eou pipefail
echo "sleep started..." >&2
docker run --rm alpine /bin/sh -c 'echo "sleep started inside..." && sleep 5 && echo "sleep ended inside..."'
echo "sleep ended..." >&2
If I run in a terminal window sudo run-one /path/to/file.sh, and then run in another terminal (before the previous command ends running) the command sudo run-one /path/to/file.sh, this command is not executed, as expected, and that command ends succesfully.
Terminal1:
user#host:/path$ sudo run-one /path/to/file.sh
sleep started...
sleep started inside...
sleep ended inside...
sleep ended...
user#host:/path$
Terminal2:
user#host:/path$ sudo run-one /path/to/file.sh
user#host:/path$
But if I run in a terminal window sudo run-one /path/to/file.sh, and then run in another terminal (before the previous command ends running) the command sudo run-this-one /path/to/file.sh, this command is not executed, which is not expected, and that command shows in the terminal Terminated, with the terminal showing user#host:/path$, but the output in the container still shows (the command is still running in the container created in the 1st terminal).
Terminal1:
user#host:/path$ sudo run-one /path/to/file.sh
sleep started...
sleep started inside...
Terminated
user#host:/path$ sleep ended inside...
# terminal doesn't show new input from the keyboard, but I can run commands after
Terminal2:
user#host:/path$ sudo run-this-one /path/to/file.sh
user#host:/path$
It works if the file is changed to:
/path/to/file.sh
#!/bin/bash
set -eou pipefail
echo "sleep started..." >&2
sleep 5
echo "sleep ended..." >&2
The above script file with docker was just an example, in my case it's different, but the problem is the same, and occurs independently of running the container with or without -it.
Someone knows why this is happening? Is there a (not very complex and not very hackish) solution to this problem? I've executed the above commands in Ubuntu 20.04 inside a VirtualBox machine (with vagrant).
Update (2021-07-15)
Based on #ErikMD comment and #DannyB answer, I put a trap and a cleanup function to remove the container, as can be seen in the script below:
/path/to/test
#!/bin/bash
set -eou pipefail
trap 'echo "[error] ${BASH_SOURCE[0]}:$LINENO" >&2; exit 3;' ERR
RED='\033[0;31m'
NC='\033[0m' # No Color
function error {
msg="$(date '+%F %T') - ${BASH_SOURCE[0]}:${BASH_LINENO[0]}: ${*}"
>&2 echo -e "${RED}${msg}${NC}"
exit 2
}
file="${BASH_SOURCE[0]}"
command="${1:-}"
if [ -z "$command" ]; then
error "[error] no command entered"
fi
shift;
case "$command" in
"cmd1")
function cleanup {
echo "cleaning $command..."
sudo docker rm --force "test-container"
}
trap 'cleanup; exit 4;' ERR
args=( "$file" "cmd:unique" )
echo "$command: run-one ${args[*]}" >&2
run-one "${args[#]}"
;;
"cmd2")
function cleanup {
echo "cleaning $command..."
sudo docker rm --force "test-container"
}
trap 'cleanup; exit 4;' ERR
args=( "$file" "cmd:unique" )
echo "$command: run-this-one ${args[*]}" >&2
run-this-one "${args[#]}"
;;
"cmd:unique")
"$file" "cmd:container"
;;
"cmd:container")
echo "sleep started..." >&2
sudo docker run --rm --name "test-container" alpine \
/bin/sh -c 'echo "sleep started inside..." && sleep 5 && echo "sleep ended inside..."'
echo "sleep ended..." >&2
;;
*)
echo -e "${RED}[error] invalid command: $command${NC}"
exit 1
;;
esac
If I run /path/to/test cmd1 (run-one) and /path/to/test cmd2 (run-this-one) in another terminal, it works as expected (the cmd1 process is stopped and removes the container, and the cmd2 process runs successfully).
If I run /path/to/test cmd2 in 2 terminals, it also works as expected (the 1st cmd2 process is stopped and removes the container, and the 2nd cmd2 process runs successfully).
But not so good: in the 2 cases above, sometimes the 2nd process stops with an error before the 1st removes the container (this can occur intermittently, probably due to a race condition).
And it gets worse: if I run /path/to/test cmd1 in 2 terminals, both commands fail, although the 1st cmd1 should run successfully (it fails because the 2nd cmd1 removes the container in the cleanup).
I tried to put the cleanup in the cmd:unique command instead (removing from the other 2 places), so as to call only by the single process running, to avoid the problem above, but weirdly the cleanup is not called there, even if the trap is also defined there.
Just to simplify your question, I would use this command to reproduce the problem:
run-one docker run --rm -it alpine sleep 10
As can be seen - either with run-one and run-this-one - the behavior is definitely not the desired one.
Since the command creates a process managed by docker, I suspect that the run-one set of tools is not the right tool for the job, since docker containers should not be killed with pkill, but rather with docker kill.
One relatively easy solution, is to embrace the way docker wants you to kill containers, and create your short run-one scripts that handle docker properly.
run-one-docker.sh
#!/usr/bin/env bash
if [[ "$#" -lt 2 ]]; then
echo "Usage: ./run-one-docker.sh NAME COMMAND"
echo "Example: ./run-one-docker.sh temp alpine sleep 10"
exit 1
fi
name="$1"
command=("${#:2}")
container_is_running() {
[ "$( docker container inspect -f '{{.State.Running}}' "$1" 2> /dev/null)" == "true" ]
}
if container_is_running "$name"; then
echo "$name is already running, aborting"
exit 1
else
docker run --rm -it --name "$name" "${command[#]}"
fi
run-this-one-docker.sh
#!/usr/bin/env bash
if [[ "$#" -lt 2 ]]; then
echo "Usage: ./run-this-one-docker.sh NAME COMMAND"
echo "Example: ./run-this-one-docker.sh temp alpine sleep 10"
exit 1
fi
name="$1"
command=("${#:2}")
container_is_running() {
[ "$( docker container inspect -f '{{.State.Running}}' "$1" 2> /dev/null)" == "true" ]
}
if container_is_running "$name"; then
echo "killing old $name"
docker kill "$name" > /dev/null
fi
docker run --rm -it --name "$name" "${command[#]}"

exec as a pipeline component

For our application running inside a container it is preferable that it receives a SIGTERM when the container is being (gracefully) shutdown. At the same time, we want it's output to go to a log file.
In the startscript of our docker container, we had therefore been using bash's exec similar to this
exec command someParam >> stdout.log
That worked just fine, command replaced the shell that had been the container's root process and would receive the SIGTERM.
Since the application tends to log a lot, we decided to add log rotation by using Apache's rotatelogs tool, i.e.
exec command | rotatelogs -n 10 stdout.log 10M
Alas, it seems that by using the pipe, exec can no longer have command replace the shell. When looking at the processes in the running container with pstree -p, it now looks like this
mycontainer#/#pstree -p
start.sh(1)-+-command(118)
`-rotatelogs(119)
So bash remains the root process, and does not pass the SIGTERM on to command.
Before stumbling upon exec, I had found an approach that installs a signal handler into the bash script, which would then itself send a SIGTERM to the command process using kill. However, this became really convoluted, getting the PID was also not always straightforward, and I would like to preserve the convenience of exec when it comes to signal handling and get piping for log rotation.
Any idea how to accomplish this?
Perhaps you want
exec sh -c 'command | rotatelogs -n 10 stdout.log 10M'
I was able to get around this by using process substitution. For your specific case the following may work.
exec command > >(rotatelogs -n 10 stdout.log 10M)
To reproduce the scenario I built this simple Dockerfile
FROM perl
SHELL ["/bin/bash", "-c"]
# The following will gracefully terminate upon docker stop
CMD exec perl -e '$SIG{TERM} = sub { $|++; print "Caught a sigterm!\n"; sleep(5); die "is the end!" }; sleep(30);' 2>&1 > >(tee /my_log)
# The following won't gracefully terminate upon docker stop
#CMD exec perl -e '$SIG{TERM} = sub { $|++; print "Caught a sigterm!\n"; sleep(5); die "is the end!" }; sleep(30);' 2>&1 | tee /my_log
Build docker build -f Dockerfile.meu -t test .
Run docker run --name test --rm -ti test
Stop it docker stop test
Output:
Caught a sigterm!
is the end! at -e line 1.

Why do sleep & wait in bash?

I'm having trouble understanding the startup commands for the services in this docker-compose.yml. The two relevant lines from the .yml are:
command: "/bin/sh -c 'while :; do sleep 6h & wait $${!}; nginx -s reload; done & nginx -g \"daemon off;\"'"
and
entrypoint: "/bin/sh -c 'trap exit TERM; while :; do certbot renew; sleep 12h & wait $${!}; done;'"
Why send the sleep command to the background and then wait on it? Why not just do sleep 6h directly? Also, is the double dollar sign just escaping the dollar sign in ${!}?
I'm finding other places where sleep and wait are used in conjunction, but none seem to have any explanation of why:
http://www.masteringunixshell.net/qa17/bash-how-to-wait-seconds.html
https://stackoverflow.com/a/13301329/828584
https://superuser.com/a/753984/98583
It makes sense to sleep in background and then wait, when one wants to handle signals in a timely manner.
When bash is executing an external command in the foreground, it does
not handle any signals received until the foreground process
terminates
(detailed explanation here).
While the second example implements a signal handler, for the first one it makes no difference whether the sleep is executed in foreground or not. There is no trap and the signal is not propagated to the nginx process.
To make it respond to the SIGTERM signal, the entrypoint should be something this:
/bin/sh -c 'nginx -g \"daemon off;\" & trap exit TERM; while :; do sleep 6h & wait $${!}; nginx -s reload; done'
To test it:
docker run --name test --rm --entrypoint="/bin/sh" nginx -c 'nginx -g "daemon off;" & trap exit TERM; while :; do sleep 20 & wait ${!}; echo running; done'
Stop the container
docker stop test
or send the TERM signal (docker stop sends a TERM followed by KILL if the main process does not exit)
docker kill --signal=SIGTERM test
By doing this, the scripts exits immediately. Now if we remove the wait ${!} the trap is executed when sleep ends. All that works well for the second example too.
Note: in both cases the intention is to check certificate renewal every 12h and reload the configuration every 6h as mentioned in the guide
The two commands do that just fine. IMHO the additional wait in the first example is just an oversight of the developers.
EDITED:
It seems the rationalization above, which was meant to give possible reasons behind the background sleep, might create some confusion.
(There is a related post Why use nginx with “daemon off” in background with docker?).
While the command suggested in the answer above is an improvement over the one in the question it is still flawed because, as mentioned in the linked post, the nginx server should be the main process and not a child. That can be easily achieved using the exec system call. The script becomes:
'while :; do sleep 6h; nginx -s reload; done & exec nginx -g "daemon off;"'
(More info in section Configure app as PID 1 in Docker best practices)
This, IMHO, is far better because not only is nginx monitored but it also handle signals. A configuration reload (nginx -s reload), for example, can also be done manually by simply sending the HUP signal to the docker container (See Controlling nginx).
The only reason I see:
If you killall -INT sleep, this won't affect main script.
Try this:
while true ;do sleep 12; echo yes;done
Then send a Interrupt signal:
killall -INT sleep
This will break the job!
Try now
while true ;do sleep 12 & wait $! ; echo yes;done
Then again:
killall -INT sleep
Job won't break!
Sample output, hitting killall -INT sleep from another window:
user#myhost:~$ while true ;do sleep 12; echo yes;done
break
user#myhost:~$ while true ;do sleep 12 & wait $! ; echo yes;done
[1] 30632
[1]+ Interrupt sleep 12
yes
[1] 30636
[1]+ Interrupt sleep 12
yes
[1] 30638
[1]+ Interrupt sleep 12
yes
[1] 30640

Creating a continuous background job during provisioning

During the provisioning of a VM I want to start a job which shall run in the background. This job shall continuously check whether certain files have been changed. In the vagrant file I reference a script which contains the following line (which does nothing but echo "x" every 3 seconds):
nohup sh -c 'while true; do sleep 3; echo x; done' &
If I run this directly in the command line a job is created, which I can check using jobs.
If I however run it from outside the VM using
vagrant ssh -c "nohup sh -c 'while true; do sleep 3; echo x; done' &"
or if it is executed as part of the provisioning nothing seems to happen. (There is no job & no nohup.out file was created.)
I tried the following two answers to questions which seem to address the same issue:
(1) This answer suggests to "properly daemonize" which didn't work for me. I tried the following:
vagrant ssh -c "nohup sh -c 'while true; do sleep 3; echo x; done' 0<&- &>/dev/null &"
(2) The second answer says to add "sleep 1" which didn't work either:
vagrant ssh -c "nohup sh -c 'while true; do sleep 3; echo x; done' & sleep 1"
For both attempts directly executing the command on the command line worked just fine however executing it via vagrant ssh -c or by provisioning didn't seem to do anything.
This is how it works in my case
Vagrantfile provisioning
hub.vm.provision "shell", path: "script/run-test.sh", privileged: false, run: 'always', args: "#{selenium_version}"
I call a run-test script to be run as vagrant user (is privileged: false)
The interesting part of the script is
nohup java -jar /test/selenium-server-standalone-$1.jar -role hub &> /home/vagrant/nohup.grid.out&
in my case I start a java daemon and I redirect the output of nohup in a specific file in my vagrant home. If I check the job is running and owned by vagrant user.
For me worked running commands in screen like:
screen -dm bash -c "my_cmd"
in provision shell scripts.

Resources