I' m new to Docker and still searching for a safe way to update production code without losing any valuable data.
So far the way we update our production machine is like this:
docker build the new code
docker push the image
docker pull the image (on the preferred machine)
docker stack rm && docker stack deploy
I' ve read countless guides about backups, but still can't understand if you lose something and what this is if you don't backup and something goes wrong. So I have some questions:
When you docker stack rm the container, you delete it? And if yes do I lose something by doing that (e.g volumes)?
Should I backup the container and its volumes (which i still don't understand how to do it), or just the image? Or just create a new tag when docker build my new code and I am safe?
Thank you
When you docker rm a container, you delete the container filesystem, but you don't affect any volumes that might have been attached to that container. If you docker run a new container that mounts the same volumes, it will see their content.
You'd never back up an entire container. You do need to back up the contents of volumes.
A good practice is to design your application to not store anything in local files at all: store absolutely everything in a database or other "remote" storage. The actual storage doesn't have to be in Docker. Then you can back up the database the same way you would any other database, and freely delete and create as many copies of the container as you need (possibly by adjusting replica counts in Swarm or Kubernetes).
Related
I have read that there is a significant hit to performance when mounting shared volumes on windows. How does this compared to only having say the postgres DB inside of a docker volume (not shared with host OS) or the rate of reading/writing from/to flat files?
Has anyone found any concrete numbers around this? I think even a 4x slowdown would be acceptable for my usecase if it is only for disc IO performance... I get the impression that mounted + shared volumes are significantly slower on windows... so I want to know if foregoing this sharing component help improve matters into an acceptable range.
Also if I left Postgres on bare metal can all of my docker apps access Postgres still that way? (That's probably preferred I would imagine - I have seen reports of 4x faster read/write staying bare metal) - but I still need to know... because my apps deal with lots of copy / read / moving of flat files as well... so need to know what is best for that.
For example, if shared volumes are really bad vs keeping it only on the container, then I have options to push files over the network to avoid the need for a shared mounted volume as a bottleneck...
Thanks for any insights
You only pay this performance cost for bind-mounted host directories. Named Docker volumes or the Docker container filesystem will be much faster. The standard Docker Hub database images are configured to always use a volume for storage, so you should use a named volume for this case.
docker volume create pgdata
docker run -v pgdata:/var/lib/postgresql/data -p 5432:5432 postgres:12
You can also run PostgreSQL directly on the host. On systems using the Docker Desktop application you can access it via the special hostname host.docker.internal. This is discussed at length in From inside of a Docker container, how do I connect to the localhost of the machine?.
If you're using the Docker Desktop application, and you're using volumes for:
Opaque database storage, like the PostgreSQL data: use a named volume; it will be faster and you can't usefully directly access the data even if you did have it on the host
Injecting individual config files: use a bind mount; these are usually only read once at startup so there's not much of a performance cost
Exporting log files: use a bind mount; if there is enough log I/O to be a performance problem you're probably actively debugging
Your application source code: don't use a volume at all, run the code that's in the image, or use a native host development environment
I want to understand better how Kubernetes works,and there are some doubts I haven't founded answer in documentation.
I have a simple Kubernetes Cluster, a Master, and 2 Workers.
I have created a docker image of my app which is stores in dockerhub.
I created a deployment_file.yaml, where I state that I want to deploy my app container in worker 3, thanks to node affinity.
If imagePullPolicy set to Always
Who is downloading the image from dockerhub, the master itself, or is it the worker were this image will be deployed???
If it is the master who pulls the image, then it transfer replicas of this images to the workers?
When a the image is pulled, is it stored in any local folder in kubernetes?
I would like to understand better how data is transferred. Thanks.
Each of the minions (workers) will pull the docker image and store it locally. docker image ls will show the list of image on the minions.
To address where are the images are stored. Take a look at SO answer here.
Who is downloading the image from dockerhub?
The Kubernetes itself doesn't pull anything. Images are downloading via runtime'a client, e.g Docker
When a the image is pulled, is it stored in any local folder in kubernetes?
Again, it is the runtime's task. Depending on the runtime and its configuration it might be different. In case of Docker it will be like it described in #Praveen Sripati's link
I have a OSX, and I would like to know if is possible to persist a container between OS reboots. I'm currently using my machine to host my code and using it to install platforms or languages like Node.js and Golang. I would like to create my environment inside a container, and also leave my code inside it, but without losing the container if my machine reboots. Is it possible? I didn't find anything related.
Your container never killed if your system reboot except you start container with --rm which will remove on stop.
Your container will restart automatically if you start container with docker run -dit --restart always my_container
As per " also leave my codes inside it" this question is concern there is two solution to avoid loss of data or code and any other configuration.
You lose data because
It is possible to store data within the writable layer of a container,
but there are some downsides:
The data doesn’t persist when that container is no longer running, and
it can be difficult to get the data out of the container if another
process needs it.
https://docs.docker.com/storage/
So here is the solution.
Docker offers three different ways to mount data into a container from
the Docker host: volumes, bind mounts, or tmpfs volumes. When in
doubt, volumes are almost always the right choice. Keep reading for
more information about each mechanism for mounting data into
containers.
https://docs.docker.com/storage/#good-use-cases-for-tmpfs-mounts
Here how you can persist the nodejs code and golang code
docker run -v /nodejs-data-host:/nodejs-container -v /go-data-host:/godata-container -dit your_image
As per packages|runtimes (nodejs and go) is the concern they persist if your container killed or stop because they store in docker image.
I pulled a standard docker ubuntu image and ran it like this:
docker run -i -t ubuntu bash -l
When I do an ls inside the container I see a proper filesystem and I can create files etc. How is this different from a VM? Also what are the limits of how big a file can I create on this container filesystem? Also is there a way I can create a file inside the container filesystem that persists in the host filesystem after the container is stopped or killed?
How is this different from a VM?
A VM will lock and allocate resources (disk, CPU, memory) for its full stack, even if it does nothing.
A Container isolates resources from the host (disk, CPU, memory), but won't actually use them unless it does something. You can launch many containers, if they are doing nothing, they won't use memory, CPU or disk.
Regarding the disk, those containers (launched from the same image) share the same filesystem, and through a COW (copy on Write) mechanism and UnionFS, will add a layer when you are writing in the container.
That layer will be lost when the container exits and is removed.
To persists data written in a container, see "Manage data in a container"
For more, read the insightful article from Jessie Frazelle "Setting the Record Straight: containers vs. Zones vs. Jails vs. VMs"
I am mounting host machine directory as a volume into docker to share files from host.
It has a git repository that is being used by all docker instances. I want different docker instances should checkout different branch and use parallely without copying it locally again as it will take more time and storage.
Instead I want common place to share files into docker instances initially and all docker should process it differently and consumes space as any increamental change.
Docker-volume with read/write permission is not seems to be suitable for this use-case, any other way to achieve this ?
Edited:
Currently I am using as a volume but it become common place for all docker instances. If docker-1 checkout branch A, and docker-2 checkout branch B at same time, it is changing files for docker-1.