I pulled a standard docker ubuntu image and ran it like this:
docker run -i -t ubuntu bash -l
When I do an ls inside the container I see a proper filesystem and I can create files etc. How is this different from a VM? Also what are the limits of how big a file can I create on this container filesystem? Also is there a way I can create a file inside the container filesystem that persists in the host filesystem after the container is stopped or killed?
How is this different from a VM?
A VM will lock and allocate resources (disk, CPU, memory) for its full stack, even if it does nothing.
A Container isolates resources from the host (disk, CPU, memory), but won't actually use them unless it does something. You can launch many containers, if they are doing nothing, they won't use memory, CPU or disk.
Regarding the disk, those containers (launched from the same image) share the same filesystem, and through a COW (copy on Write) mechanism and UnionFS, will add a layer when you are writing in the container.
That layer will be lost when the container exits and is removed.
To persists data written in a container, see "Manage data in a container"
For more, read the insightful article from Jessie Frazelle "Setting the Record Straight: containers vs. Zones vs. Jails vs. VMs"
Related
I have read that there is a significant hit to performance when mounting shared volumes on windows. How does this compared to only having say the postgres DB inside of a docker volume (not shared with host OS) or the rate of reading/writing from/to flat files?
Has anyone found any concrete numbers around this? I think even a 4x slowdown would be acceptable for my usecase if it is only for disc IO performance... I get the impression that mounted + shared volumes are significantly slower on windows... so I want to know if foregoing this sharing component help improve matters into an acceptable range.
Also if I left Postgres on bare metal can all of my docker apps access Postgres still that way? (That's probably preferred I would imagine - I have seen reports of 4x faster read/write staying bare metal) - but I still need to know... because my apps deal with lots of copy / read / moving of flat files as well... so need to know what is best for that.
For example, if shared volumes are really bad vs keeping it only on the container, then I have options to push files over the network to avoid the need for a shared mounted volume as a bottleneck...
Thanks for any insights
You only pay this performance cost for bind-mounted host directories. Named Docker volumes or the Docker container filesystem will be much faster. The standard Docker Hub database images are configured to always use a volume for storage, so you should use a named volume for this case.
docker volume create pgdata
docker run -v pgdata:/var/lib/postgresql/data -p 5432:5432 postgres:12
You can also run PostgreSQL directly on the host. On systems using the Docker Desktop application you can access it via the special hostname host.docker.internal. This is discussed at length in From inside of a Docker container, how do I connect to the localhost of the machine?.
If you're using the Docker Desktop application, and you're using volumes for:
Opaque database storage, like the PostgreSQL data: use a named volume; it will be faster and you can't usefully directly access the data even if you did have it on the host
Injecting individual config files: use a bind mount; these are usually only read once at startup so there's not much of a performance cost
Exporting log files: use a bind mount; if there is enough log I/O to be a performance problem you're probably actively debugging
Your application source code: don't use a volume at all, run the code that's in the image, or use a native host development environment
I have a OSX, and I would like to know if is possible to persist a container between OS reboots. I'm currently using my machine to host my code and using it to install platforms or languages like Node.js and Golang. I would like to create my environment inside a container, and also leave my code inside it, but without losing the container if my machine reboots. Is it possible? I didn't find anything related.
Your container never killed if your system reboot except you start container with --rm which will remove on stop.
Your container will restart automatically if you start container with docker run -dit --restart always my_container
As per " also leave my codes inside it" this question is concern there is two solution to avoid loss of data or code and any other configuration.
You lose data because
It is possible to store data within the writable layer of a container,
but there are some downsides:
The data doesn’t persist when that container is no longer running, and
it can be difficult to get the data out of the container if another
process needs it.
https://docs.docker.com/storage/
So here is the solution.
Docker offers three different ways to mount data into a container from
the Docker host: volumes, bind mounts, or tmpfs volumes. When in
doubt, volumes are almost always the right choice. Keep reading for
more information about each mechanism for mounting data into
containers.
https://docs.docker.com/storage/#good-use-cases-for-tmpfs-mounts
Here how you can persist the nodejs code and golang code
docker run -v /nodejs-data-host:/nodejs-container -v /go-data-host:/godata-container -dit your_image
As per packages|runtimes (nodejs and go) is the concern they persist if your container killed or stop because they store in docker image.
What are the implications of exporting /var/lib/docker over NFS? The idea is to store the docker images in a server and export it to hosts which has limited memory to store and run containers. This would be useful to avoid having each host download and store it's own library of docker image. The hosts may make use of FS-Cache to limit the data transfer over network.
The /var/lib/docker directory is designed to be exclusively accessed by a single daemon, and should never be shared with multiple daemons.
Having multiple daemons use the same /var/lib/docker can lead to many issues, and possible data corruption.
For example, the daemon keeps an in-memory state of which images are in use (by containers), and which ones not; multiple daemons using those image won't keep track of that (an image may be in use by another daemon), and remove the image while it's in use.
Docker also stores various other files in /var/lib/docker, such as a key/value store for user-defined networks, which is not designed to be accessed concurrently by multiple daemons.
I ran into some issues with my EC2 micro instance and had to terminate it and create a new one in its place. But it seems even though the old instance is no longer visible in the list, it is still using up some space on my disk. My df -h is listed below:
Filesystem Size Used Avail Use%
/dev/xvda1 7.8G 7.0G 719M 91% /
When I go to the EC22 console I see there are 3 volumes each 8gb in the list. One of them is attached (/dev/xvda) and this one is showing as "in-use". The other 2 are simply showing as "Available"
Is the terminated instance really using up my disk space? If yes, how to free it up?
I have just solved my problem by running this command:
sudo apt autoremove
and a lot of old packages are going to be removed, for instance many files like this linux-aws-headers-4.4.0-1028
Amazon Elastic Block Storage (EBS) is a service that provides virtual disks for use with Amazon EC2. It is network-attached storage that persists even when an EC2 instance is stopped or terminated.
When launching an Amazon EC2 instance, a boot volume is automatically attached to the instance. The contents of the boot volume is copied from an Amazon Machine Image (AMI), which can be chosen from a pre-populated list (including the ability to create your own AMI).
When an Amazon EC2 instance is Stopped, all EBS volumes remain attached to the instance. This allows the instance to be Started with the same configuration as when it was stopped.
When an Amazon EC2 instance is Terminated, EBS volumes might or might not be deleted, based upon the Delete on Termination setting of each volume:
By default, boot volumes are deleted when an instance is terminated. This is because the volume was originally just a copy of an AMI, so there is unlikely to be any important data on the volume. (Hint: Don't store data on a boot volume.)
Additional volumes default to "do not delete on termination", on the assumption that they contain data that should be retained. When the instance is terminated, these volumes will remain in an Available state, ready to be attached to another instance.
So, if you do not require any content on your remaining EBS volumes, simply delete them. In future, when launching instances, keep an eye on the Delete on Termination setting to make the clean-up process simpler.
Please note that the df -h command is only showing currently-attached volumes. It is not showing the volumes in Available state, since they are not visible to that instance. The concept of "Disk Space" typical refers to the space within an EBS volume, while "EBS Storage" refers to the volumes themselves. So, the 7GB of the volume that is used is related to that specific (boot) volume.
If you are running out of space on an EBS volume, see: Expanding the Storage Space of an EBS Volume on Linux. Expanding the volume involves:
Creating a snapshot
Creating a new (bigger) volume from the snapshot
Swapping the disks (requiring a Stop/Start if you are swapping a boot volume)
These 2 steps add an extra hard drive to your EC2 and format it for use:
Attach an extra hard drive (EBS: Elastic Block Storage) to an EC2
Format an EBS drive attached to an EC2
Here's pricing info. Free Tier includes 30GB. Afterward it's $1.25/month for 10GB on a General Purpose SSD (gp2).
To see how much space you are using/need:
Check your current disk use/available in Linux with df -h.
Check the size of a directory in Linux with du -sh [path].
I need some sort of distributed file system running on a CoreOS cluster.
As such I'd like to run HDFS on CoreOS nodes. Is this possible?
I can see 2 options;
Expand CoreOS - Install HDFS directly onto CoreOS - not ideal as it breaks the whole concept of CoreOS's containerisation and would mean installing a lot of additional components
Somehow run HDFS in a Docker container on CoreOS and set affinities
Option 2 seems like the best approach, however, there are some potential blockers;
How do I reliably expose the physical disks to the Docker container running HDFS?
How do you scale container affinities?
How does this work the the Name nodes etc?
Cheers.
I'll try to provide two possibilities. I haven't tried either of these, so they are mostly suggestions. But could get you down the right path.
The first, if you want to do HDFS and it requires device access on the host, would be to run the HDFS daemons in a privileged container that had access to the required host devices (the disks directly). See https://docs.docker.com/reference/run/#runtime-privilege-linux-capabilities-and-lxc-configuration for information on the --privileged and --device flags.
In theory, you could pass the devices to the container that is handling the access to disks. Then you could use something like --link to talk to each other. The NameNode would store the metadata on the host using a volume (passed with -v). Though, given the little reading I have done about NameNode, it seems like there won't be a good solution yet for high availability anyways and it is a single point of failure.
The second option to explore, if you are looking for a clustered file system and not HDFS in particular, would be to check out the recent Ceph FS support added to the kernel in CoreOS 471.1.0: https://coreos.com/releases/#471.1.0. You might then be able to use the same approach of privileged container to access host disks to build a Ceph FS cluster. Then you might have a 'data only' container that had Ceph tools installed to mount a directory on the Ceph FS cluster, and expose this as a volume for other containers to use.
Though both of these are only ideas and I haven't used HDFS or Ceph personally (though I am keeping an eye on Ceph and would like to try something like this soon as a proof of concept).