Elasticsearch Docker stop seems to ignore SIGKILL - elasticsearch

I'm trying to use Elasticsearch in Docker for local dev. While I can find containers that work, when docker stop is sent, the containers hang for the default 10s, then docker forcibly kills the container. My assumption here is that ES is either not on PID 1 or other services prevent it from shutting down immediately.
I'm curious if anyone can expand on this, or explain why this is happening more accurately. I'm running numerous tests and 10s+ to shutdown is just annoying when other containers shutdown after 1-2s.

If you don't want to wait the 10 seconds, you can run a docker kill instead of a docker stop. You can also adjust the timeout on docker stop with the -t option, e.g. docker stop -t 2 $container_id to only wait 2 seconds instead of the default 10.
As for why it's ignoring the sigkill, that may depend on what image you are running (there's more than one for elasticsearch). However, if pid 1 is a shell like /bin/sh or /bin/bash, it will not pass signals through. If pid 1 is the elasticsearch process, it may ignore the signal, or 10 seconds may not be long enough for it to fully cleanup and shutdown.

Related

Using timeout with docker run from within script

In my Travis CI, part of my verification is to start a docker container and verify that it doesn't fail within 10 seconds.
I have a yarn script docker:run:local that calls docker run -it <mytag> node app.js.
If I call the yarn script with timeout from a bash shell, it works fine:
$ timeout 10 yarn docker:run:local; test $? -eq 124 && echo "Container ran for 10 seconds without error"
This calls docker run, lets it run for 10 seconds, then kills it (if not already returned). If the exit code is 124, the timeout did expire, which means the container was still running. Exactly what I need to verify that my docker container is reasonably sane.
However, as soon as I run this same command from within a script, either in a test.sh file called from the shell, or if putting it in another yarn script and calling yarn test:docker, the behaviour is completely different. I get:
ERRO[0000] error waiting for container: context canceled
Then the command hangs forever, there's no 10 second timeout, I have to ctrl-Z it and then kill -9 the process. If I run top I now have a docker process using all my CPU forever. If using timeout with any other command like sleep 20 && echo "Finished sleeping", this does not happen, so I suspect it may have something to do with how docker works in interactive mode or something, but that's only my guess.
What's causing timeout docker:run to fail from a script but work fine from a shell and how do I make this work?
Looks like running docker in interactive mode is causing the issue.
Run docker in detached more by removing the -it and allowing it to run in default detached mode or specify -d instead of -it and so:
docker run -d <mytag> node
or
docker run <mytag> node

ntpd -qg: Use with timeout

working on Pi3
Situation: only one server in /etc/ntp.conf is given and this given address is invalid (no NTP-Server running on that address).
Problem: running ntpd -qg does never end, since there is no timeout like in ntpdate -t 60.
Question: Can one specify a timeout for ntpd? If not, how can you assure the process ends after time x?
For now on startup the pi executes a bash-script that tries to get actual time from given NTP-Server in /etc/ntp.conf and then hangs in the process since there is no NTP-Server available on that address. So the process is running from start and i can't call another ntpd until the initial ntpd-process is killed.
Any work around?
PS: I would like not to use ntpdate since it is tagged as a retiring package
EDIT:
The RPi3 is located in an isolated network. Online NTP-servers are no option in my case.
There is a timeout command usually shipped with coreutils that allows you to set timeout on any command (even if it does not support it on its own). E.g.
timeout 60 ntpd -qg
To run run ntpd -qg and have it time out after 60s. If the command finished, you should get its return value, if the timeout intervened, you get 124.

Make container stop itself

Is there any native way to make a Docker container stop itself? I can't find anything in the documentation.
I have a container that does some stuff, and I want to hook into the completion of that.
One way I thought of doing this was blocking with docker wait until the container stops itself, and then I can restart it with a docker start and continue on to the subsequent commands that depend on those jobs being complete.
For instance:
docker run -d --name=my-container ...
# Wait for my-container to stop itself
docker wait my-container
# Once it stops itself, start it again.
docker start my-container
# Some other commands here that depend on my-container to finish its jobs...
But I can't find any way on the documentation to make a container stop itself.
There is docker stop to stop a container from outside. To stop a container from inside, you could kill the entrypoint process (the process specified in your docker run command, or the ENTRYPOINT or last CMD specified in the Dockerfile, etc.).
Don't run the container in detached mode (remove the -d) It'll run in the foreground until the entrypoint/cmd exits.
You may need to use the pseudo-tty (-t) command-line option.

How can I gracefully recover from an attached Docker container terminating?

Say I run this Docker command in one Terminal window:
$ docker run --name stackoverflow --rm ubuntu /bin/bash -c "sleep 5"
And before it exits I run this in a second Terminal window:
$ docker run -it --rm --pid=container:stackoverflow terencewestphal/htop
I'll successfully see htop running in the second container, displaying the bash sleep process running. So far so good.
After 5 seconds, the first container will exit with code 0. All good.
At this time, the second container will exit with code 137 (SIGILL). This also makes sense to me since the second container is just attached to the first one.
The problem is that this messes up macOS's Terminal.app's state:
The Terminal's cursor disappears.
Clicking the Terminal window causes mouse location characters to be entered as input.
I'm hoping to find a way to avoid messing up Terminal.app state. Any suggestions?
You can't avoid such behaviour, because it is the htop duty to setup the terminal state after its termination, but it can't do it when terminated with SIGKILL. However, you can fix this terminal window yourself with the reset command, which is intended to initialize the terminal state.
About the "attached" container:
The --pid=container:<name> option means that the new container would be run in the PID namespace of first container and as the pid_namespaces(7) man page says:
If the "init" process of a PID namespace terminates, the kernel
terminates all of the processes in the namespace via a SIGKILL signal.

start-stop-daemon weird behaviour

I'm creating a pallet crate for elasticsearch. I was stuck on the service not starting however after looking at the logs it seems that it's not really anything to do with pallet. I am using the elasticsearch apt package for 1.0 which includes an init script. If I run sudo service elasticsearch start then ES starts with no problems. If pallet does this for me then it records standard out as having started it successfully
start elasticsearch
* Starting Elasticsearch Server
...done.
However it is not started.
sudo service elasticsearch status
* elasticsearch is not running
I messed around with the init script and I found if I added sleep 1 after starting the daemon then it works correctly with pallet.
start-stop-daemon --start -b --user "$ES_USER" -c "$ES_USER" --pidfile "$PID_FILE" --exec $DAEMON -- $DAEMON_OPTS
#this sleep will allow it to work
#sleep 1
log_end_msg $?
I don't understand what is going on?
I've seen issues like this before, too. It generally comes down to expecting something to have finished before the script finishes, which may not always happen with services since they fork off background tasks that may still get killed when the ssh connection is terminated.
For these kinds of things you should use Pallet's built in code for running things under supervision. This also has the advantage of making it very easy to switch from plain init.d to runit or daemontools later, which is especially useful for Elasticsearch because it's a JVM process and nearly any JVM will eventually crash if you let it run long enough.

Resources