I have just reviewed a PR with the following || operator in a Makefile:
docker-compose -f docker-compose.yml up -d --scale worker=3 \
|| docker-compose -f docker-compose.yml down ; false)
The rationale is that if docker-compose up fails, docker-compose down will clean up any unnecessary running containers.
However, if docker-compose up is atomic, this is probably unnecessary.
Is docker-compose up atomic, meaning either all planned containers, or none, are running after its invocation?
No, it isn't atomic. Spare containers may be still running.
A simple CTRL+C test solved the problem. docker-compose up exited and some containers are running.
Related
I have some tests in the form of bash scripts that work like this:
Start services using docker-compose
Run test logic
Shut down services using docker-compose
#!/bin/bash
set -e
cd "$(dirname "${BASH_SOURCE[0]}")"
# 1
docker-compose up --detach
# 2
./tests.sh
# 3
docker-compose down --rmi all --remove-orphans
The step (2) might fail and crash the script.
How can I ensure that all Docker services from step (1) are shut-down in all cases?
Typically, with a trap:
trap_exit() {
docker-compose down --rmi all --remove-orphans
}
trap trap_exit EXIT
./tests.sh
I am running spark workers using docker, replicated using a docker-compose setup:
version: '2'
services:
spark-worker:
image: bitnami/spark:latest
environment:
- SPARK_MODE=worker
- SPARK_MASTER_URL=spark://1.1.1.1:7077
deploy:
mode: replicated
replicas: 4
When I run docker-compose exec spark-worker ls, for example, it only runs on usually the first replica. Is there a way to broadcast these commands to all of the replicas?
docker-compose version 1.29.2, build 5becea4c
Docker version 20.10.7, build f0df350
There's no built-in facility for this, but you could construct something fairly easily. For example:
docker ps -q --filter label=com.docker.compose.service=spark-worker |
xargs -ICID docker exec CID ls
I would propose a simpler command than what #larsks proposed, which leverages the docker-compose command itself.
SERVICE_NAME=spark-worker
for id in $(docker-compose ps -q $SERVICE_NAME); do
docker exec -t $id "echo" "hello"
done;
I am newer in big data domain, and this is my first time using Docker. I just found this amazing project: https://kiwenlau.com/2016/06/26/hadoop-cluster-docker-update-english/ which create a hadoop cluster composed of one master and two slaves using Docker.
After doing all the installation, I just run containers and they work fine. There is start-containers.sh file which give me the hand to lunch the cluster. I decide to install some tools like sqoop to import my local relational data base to Hbase, and that's work fine. After that I stop all Docker container in my pc by tapping
docker stop $(docker ps -a -q)
In the second day, when I tried to relaunch containers by running the same script ./start-container.sh , I found this error:
start hadoop-master container...
start hadoop-slave1 container...
start hadoop-slave2 container...
Error response from daemon: Container
e942e424a3b166452c9d2ea1925197d660014322416c869dc4a982fdae1fb0ad is
not running
even, I lunch this daemon; containers of my cluster cannot connect to each other, and I can't access to data which is stored on Hbase.
First can any one tell me why this daemon don't work.
PS: in the start-container.sh file there is a line which removes containers if they exist before creating them, I delete this line because If I don't delete them, every time I do all things from the beginning.
After searching I found that is preferable to use the docker compose which give me the hand to lunch all container together.
But I can't found how to translate my start-container.sh file to docker-compose.yml file. Is this the best way to lunch all my containers in the same time ? This is the content of start-containers.sh file:
#!/bin/bash
sudo docker network create --driver=bridge hadoop
# the default node number is 3
N=${1:-3}
# start hadoop master container
#sudo docker rm -f hadoop-master &> /dev/null
echo "start hadoop-master container..."
sudo docker run -itd \
--net=hadoop \
-p 50070:50070 \
-p 8088:8088 \
-p 7077:7077 \
-p 16010:16010 \
--name hadoop-master \
--hostname hadoop-master \
spark-hadoop:latest &> /dev/null
# sudo docker run -itd \
# --net=hadoop \
# -p 5432:5432 \
# --name postgres \
# --hostname hadoop-master \
# -e POSTGRES_PASSWORD=0000
# --volume /media/mobelite/0e5603b2-b1ad-4662-9869-8d0873b65f80/postgresDB/postgresql/10/main:/var/lib/postgresql/data \
# sameersbn/postgresql:10-2 &> /dev/null
# start hadoop slave container
i=1
while [ $i -lt $N ]
do
# sudo docker rm -f hadoop-slave$i &> /dev/null
echo "start hadoop-slave$i container..."
port=$(( 8040 + $i ))
sudo docker run -itd \
-p $port:8042 \
--net=hadoop \
--name hadoop-slave$i \
--hostname hadoop-slave$i \
spark-hadoop:latest &> /dev/null
i=$(( $i + 1 ))
done
# get into hadoop master container
sudo docker exec -it hadoop-master bash
Problems with restarting containers
I am not sure if I understood the mentioned problems with restarting containers correctly. Thus in the following, I try to concentrate on potential issues I can see from the script and error messages:
When starting containers without --rm, they will remain in place after being stopped. If one tries to run a container with same port mappings or same name (both the case here!) afterwards that fails due to the container already being existent. Effectively, no container will be started in the process. To solve this problem, one should either re-create containers everytime (and store all important state outside of the containers) or detect an existing container and start it if existent. With names it can be as easy as doing:
if ! docker start hadoop-master; then
docker run -itd \
--net=hadoop \
-p 50070:50070 \
-p 8088:8088 \
-p 7077:7077 \
-p 16010:16010 \
--name hadoop-master \
--hostname hadoop-master \
spark-hadoop:latest &> /dev/null
fi
and similar for the other entries. Note that I do not understand why one would
use the combination -itd (interactive, assign TTY but go to background) for
a service container like this? I'd recommend going with just -d here?
Other general scripting advice: Prefer bash -e (causes the script to stop on unhandled errors).
Docker-Compose vs. Startup Scripts
The question contains some doubt whether docker-compose should be the way to go or if a startup script should be preferred. From my point of view, the most important differences are these:
Scripts are good on flexibility: Whenever there are things that need to be detected from the environment which go beyond environment variables, scripts provide the needed flexibility to execute commands and to be have environment-dependently. One might argue that this goes partially against the spirit of the isolation of containers to be dependent on the environment like this, but a lot of Docker environments are used for testing purposes where this is not the primary concern.
docker-compose provides a few distinct advantages "out-of-the-box". There are commands up and down (and even radical ones like down -v --rmi all) which allow environments to be created and destroyed quickly. When scripting, one needs to implement all these things separately which will often result in less complete solutions. An often-overlooked advantage is also portability concerns: docker-compose exists for Windows as well. Another interesting feature (although not so "easy" as it sounds) is the ability to deploy docker-compose.yml files to Docker clusters. Finally docker-compose also provides some additional isolation (e.g. all containers become part of a network specifically created for this docker-compose instance by default)
From Startup Script to Docker-Compose
The start script at hand is already in a good shape to consider moving to a docker-compose.yml file instead. The basic idea is to define one service per docker run instruction and to transform the commandline arguments into their respective docker-compose.yml names. The Documentation covers the options quite thoroughly.
The idea could be as follows:
version: "3.2"
services:
hadoop-master:
image: spark-hadoop:latest
ports:
- 50070:50070
- 8088:8088
- 7077:7077
- 16010:16010
hadoop-slave1:
image: spark-hadoop:latest
ports:
- 8041:8042
hadoop-slave2:
image: spark-hadoop:latest
ports:
- 8042:8042
hadoop-slave2:
image: spark-hadoop:latest
ports:
- 8043:8042
Btw. I could not test the docker-compose.yml file because the image spark-hadoop:latest does not seem to be available through docker pull:
# docker pull spark-hadoop:latest
Error response from daemon: pull access denied for spark-hadoop, repository does not exist or may require 'docker login'
But the file above might be enough to get an idea.
I have a shell script .sh to stop and remove my running containers, which usually works very well. But in one out of ten times the docker rm command seems to be executed before docker stop finished, so it raises an error that the container is still running. What can i do to prevent that error?
docker stop nostalgic_shirley
docker rm nostalgic_shirley
Output
nostalgic_shirley
nostalgic_shirley
Error response from daemon: You cannot remove a running container xxxx. Stop the container before attempting removal or force remove
if i then try to use docker rm nostalgic_shirley again, it works fine.
I should not use docker stop --force as the shutdown should be gracefully
If you really care to stop the container before deleting:
docker stop nostalgic_shirley
while [ ! -z "$(docker ps | grep nostalgic_shirley)" ]; do sleep 2; done
docker rm nostalgic_shirley
If not, simply delete it with "force" option (You can even skip stopping):
docker rm -f nostalgic_shirley
Do you have by any chance any other process/user login to container sometimes when it did not stopped ?
Since as you mentioned this is something not happening frequently : have you checked the state of environment when this happens for parameters like CPU utilization, memory etc.
Have you tried
a) stopping the container with overriding -t flag
Use "-t" flag of "docker stop" command
-t, --time=10
Seconds to wait for stop before killing it
b) Use "-f" flag of "docker rm" command
-f, --force[=false]
Force the removal of a running container (uses SIGKILL)
I'm running Docker Toolbox on VirtualBox on Windows 10.
I'm having an annoying issue where if I docker exec -it mycontainer sh into a container - to inspect things, the shell will abruptly exit randomly back to the host shell, while I'm typing commands. Some experimenting reveals that it's when I press two letters at the same time (as is common when touch typing) that causes the exit.
The container will still be running.
Any ideas what this is?
More details
Here's a minimal docker image I'm running inside. Essentially, I'm trying to deploy kubernetes clusters to AWS via kops, but because I'm on Windows, I have to use a container to run the kops commands.
FROM alpine:3.5
#install aws-cli
RUN apk add --no-cache \
bind-tools\
python \
python-dev \
py-pip \
curl
RUN pip install awscli
#install kubectl
RUN curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl
RUN chmod +x ./kubectl
RUN mv ./kubectl /usr/local/bin/kubectl
#install kops
RUN curl -LO https://github.com/kubernetes/kops/releases/download/$(curl -s https://api.github.com/repos/kubernetes/kops/releases/latest | grep tag_name | cut -d '"' -f 4)/kops-linux-amd64
RUN chmod +x kops-linux-amd64
RUN mv kops-linux-amd64 /usr/local/bin/kops
I build this image:
docker build -t mykube .
I run this in the working directory of my the project I'm trying to deploy:
docker run -dit -v "${PWD}":/app mykube
I exec into the shell:
docker exec -it $containerid sh
Inside the shell, I start running AWS commands as per here.
Here's some example output:
##output of previous dig command
;; Query time: 343 msec
;; SERVER: 10.0.2.3#53(10.0.2.3)
;; WHEN: Wed Feb 14 21:32:16 UTC 2018
;; MSG SIZE rcvd: 188
##me entering a command
/ # aws s3 mb s3://clus
##shell exits abruptly to host shell while I'm writing
DavidJ#DavidJ-PC001 MINGW64 ~/git-workspace/webpack-react-express (master)
##container is still running
$ docker ps --all
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
37a341cfde83 mykube "/bin/sh" 5 minutes ago Up 3 minutes gifted_bhaskara
##nothing in docker logs
$ docker logs --details 37a341cfde83
A more useful update
Adding the -D flag gives an important clue:
$ docker -D exec -it 04eef8107e91 sh -x
DEBU[0000] Error resize: Error response from daemon: no such exec
/ #
/ #
/ #
/ #
/ # sdfsdfjskfdDEBU[0006] [hijack] End of stdin
DEBU[0006] [hijack] End of stdout
Also, I've ascertained that what specifically is causing the issue is pressing two letters at the same time (which is quite common when I'm touch typing).
There appears to be a github issue for this here, though this one is for docker for windows, not docker toolbox.
This issue appears to be a bug with docker and windows. See the github issue here.
As a work around, prefix your docker exec command with winpty, which comes with git bash.
eg.
winpty docker exec -it mycontainer sh
Check the USER which is the one you are login with when doing a docker exec -it yourContainer sh.
Its .bahsrc, .bash_profile or .profile might include a command which would explain why the session abruptly quits.
Check also the logs associated to that container (docker logs --details yourContainer) in order to see if that closed session generated anything in stderr.
Reasons I can think of for a process to be killed in your container include:
Pid 1 exiting in the container. This would cause the container to go into a stopped state, but a restart policy could have restarted it. See your docker container inspect output to see if this is happening. This is the most common cause I've seen.
Out of memory on the OS, where the kernel would then kill processes. View your system logs and dmesg to see if this is happening.
Exceeding the container memory limit, where docker would kill the container, possibly restarting it depending on your policy. You would again view docker container inspect but the status will have different details.
Process being killed on the host, potentially by a security tool.
Perhaps a selinux or apparmor policy being violated.
Networking issues. Never encountered it myself, but since docker is a client / server design, there's a potential for a network disconnect to drop the exec session.
The server itself is failing, and you'd see various logs in syslog / dmesg indicating problems it can't recover from.