I am mounting host machine directory as a volume into docker to share files from host.
It has a git repository that is being used by all docker instances. I want different docker instances should checkout different branch and use parallely without copying it locally again as it will take more time and storage.
Instead I want common place to share files into docker instances initially and all docker should process it differently and consumes space as any increamental change.
Docker-volume with read/write permission is not seems to be suitable for this use-case, any other way to achieve this ?
Edited:
Currently I am using as a volume but it become common place for all docker instances. If docker-1 checkout branch A, and docker-2 checkout branch B at same time, it is changing files for docker-1.
Related
I' m new to Docker and still searching for a safe way to update production code without losing any valuable data.
So far the way we update our production machine is like this:
docker build the new code
docker push the image
docker pull the image (on the preferred machine)
docker stack rm && docker stack deploy
I' ve read countless guides about backups, but still can't understand if you lose something and what this is if you don't backup and something goes wrong. So I have some questions:
When you docker stack rm the container, you delete it? And if yes do I lose something by doing that (e.g volumes)?
Should I backup the container and its volumes (which i still don't understand how to do it), or just the image? Or just create a new tag when docker build my new code and I am safe?
Thank you
When you docker rm a container, you delete the container filesystem, but you don't affect any volumes that might have been attached to that container. If you docker run a new container that mounts the same volumes, it will see their content.
You'd never back up an entire container. You do need to back up the contents of volumes.
A good practice is to design your application to not store anything in local files at all: store absolutely everything in a database or other "remote" storage. The actual storage doesn't have to be in Docker. Then you can back up the database the same way you would any other database, and freely delete and create as many copies of the container as you need (possibly by adjusting replica counts in Swarm or Kubernetes).
I have read that there is a significant hit to performance when mounting shared volumes on windows. How does this compared to only having say the postgres DB inside of a docker volume (not shared with host OS) or the rate of reading/writing from/to flat files?
Has anyone found any concrete numbers around this? I think even a 4x slowdown would be acceptable for my usecase if it is only for disc IO performance... I get the impression that mounted + shared volumes are significantly slower on windows... so I want to know if foregoing this sharing component help improve matters into an acceptable range.
Also if I left Postgres on bare metal can all of my docker apps access Postgres still that way? (That's probably preferred I would imagine - I have seen reports of 4x faster read/write staying bare metal) - but I still need to know... because my apps deal with lots of copy / read / moving of flat files as well... so need to know what is best for that.
For example, if shared volumes are really bad vs keeping it only on the container, then I have options to push files over the network to avoid the need for a shared mounted volume as a bottleneck...
Thanks for any insights
You only pay this performance cost for bind-mounted host directories. Named Docker volumes or the Docker container filesystem will be much faster. The standard Docker Hub database images are configured to always use a volume for storage, so you should use a named volume for this case.
docker volume create pgdata
docker run -v pgdata:/var/lib/postgresql/data -p 5432:5432 postgres:12
You can also run PostgreSQL directly on the host. On systems using the Docker Desktop application you can access it via the special hostname host.docker.internal. This is discussed at length in From inside of a Docker container, how do I connect to the localhost of the machine?.
If you're using the Docker Desktop application, and you're using volumes for:
Opaque database storage, like the PostgreSQL data: use a named volume; it will be faster and you can't usefully directly access the data even if you did have it on the host
Injecting individual config files: use a bind mount; these are usually only read once at startup so there's not much of a performance cost
Exporting log files: use a bind mount; if there is enough log I/O to be a performance problem you're probably actively debugging
Your application source code: don't use a volume at all, run the code that's in the image, or use a native host development environment
What are the implications of exporting /var/lib/docker over NFS? The idea is to store the docker images in a server and export it to hosts which has limited memory to store and run containers. This would be useful to avoid having each host download and store it's own library of docker image. The hosts may make use of FS-Cache to limit the data transfer over network.
The /var/lib/docker directory is designed to be exclusively accessed by a single daemon, and should never be shared with multiple daemons.
Having multiple daemons use the same /var/lib/docker can lead to many issues, and possible data corruption.
For example, the daemon keeps an in-memory state of which images are in use (by containers), and which ones not; multiple daemons using those image won't keep track of that (an image may be in use by another daemon), and remove the image while it's in use.
Docker also stores various other files in /var/lib/docker, such as a key/value store for user-defined networks, which is not designed to be accessed concurrently by multiple daemons.
I've been trying to get to grips with Amazons AWS services for a client. As is evidenced by the very n00bish question(s) I'm about to ask I'm having a little trouble wrapping my head round some very basic things:
a) I've played around with a few instances and managed to get LAMP working just fine, the problem I'm having is that the code I place in /var/www doesn't seem to be shared across those machines. What do I have to do to achieve this? I was thinking of a shared EBS volume and changing Apaches document root?
b) Furthermore what is the best way to upload code and assets to an EBS/S3 volume? Should I setup an instance to handle FTP to the aforementioned shared volume?
c) Finally I have a basic plan for the setup that I wanted to run by someone that actually knows what they are talking about:
DNS pointing to Load Balancer (AWS Elastic Beanstalk)
Load Balancer managing multiple AWS EC2 instances.
EC2 instances sharing code from a single EBS store.
An RDS instance to handle database queries.
Cloud Front to serve assets directly to the user.
Thanks,
Rich.
Edit: My Solution for anyone that comes across this on google.
Please note that my setup is not finished yet and the bash scripts I'm providing in this explanation are probably not very good as even though I'm very comfortable with the command line I have no experience of scripting in bash. However, it should at least show you how my setup works in theory.
All AMIs are Ubuntu Maverick i386 from Alestic.
I have two AMI Snapshots:
Master
Users
git - Very limited access runs git-shell so can't be accessed via SSH but hosts a git repository which can be pushed to or pulled from.
ubuntu - Default SSH account, used to administer server and deploy code.
Services
Simple git repository hosting via ssh.
Apache and PHP, databases are hosted on Amazon RDS
Slave
Services
Apache and PHP, databases are hosted on Amazon RDS
Right now (this will change) this is how deploy code to my servers:
Merge changes to master branch on local machine.
Stop all slave instances.
Use Git to push the master branch to the master server.
Login to ubuntu user via SSH on master server and run script which does the following:
Exports (git-archive) code from local repository to folder.
Compresses folder and uploads backup of code to S3 with timestamp attached to the file name.
Replaces code in /var/www/ with folder and gives appropriate permissions.
Removes exported folder from home directory but leaves compressed file intact with containing the latest code.
5 Start all slave instances. On startup they run a script:
Apache does not start until it's triggered.
Use scp (Secure copy) to copy latest compressed code from master to /tmp/www
Extract code and replace /var/www/ and give appropriate permissions.
Start Apache.
I would provide code examples but they are very incomplete and I need more time. I also want to get all my assets (css/js/img) being automatically being pushed to s3 so they can be distibutes to clients via CloudFront.
EBS is like a harddrive you can attach to one instance, basically a 1:1 mapping. S3 is the only shared storage stuff in AWS, otherwise you will need to setup an NFS server or similar.
What you can do is put all your php files on s3 and then sync them down to a new instance when you start it.
I would recommend bundling a custom AMI with everything you need installed (apache, php, etc) and setup a cron job to sync php files from s3 to your document root. Your workflow would be, upload files to s3, let server cron sync files.
The rest of your setup seems pretty standard.
I have a Scalr EC2 cluster, and want an easy way to synchronize files across all instances.
For example, I have a bunch of files in /var/www on one instance, I want to be able to identify all of the other hosts, and then rsync to each of those hosts to update their files.
ls /etc/aws/hosts/app/
returns the IP addresses of all of the other instances
10.1.2.3
10.1.33.2
10.166.23.1
Ideas?
As Zach said you could use S3.
You could download one of many clients out there for mapping drives to S3. (search for S3 and webdav).
If I was going to go this route I would setup an S3 bucket with all my shared files and use jetS3 in a cronJob to sync each node's local drive to the bucket (pulling down S3 bucket updates). Then since I normally use eclipse & ant for building, I would create a ANT job for deploying updates to the S3 bucket (pushing updates up to the S3 bucket).
From http://jets3t.s3.amazonaws.com/applications/synchronize.html
Usage: Synchronize [options] UP <S3Path> <File/Directory>
(...)
or: Synchronize [options] DOWN
UP : Synchronize the contents of the Local Directory with S3.
DOWN : Synchronize the contents of S3 with the Local Directory
...
I would recommend the above solution, if you don't need cross-node file locking. It's easy and every system can just pull data from a central location.
If you need more cross-node locking:
An ideal solution would be to use IBM's GPFS, but IBM doesn't just give it away (at least not yet). Even though it's designed for High Performance interconnects it also has the ability to be used over slower connections. We used it as a replacement for NFS and it was amazingly fast ( about 3 times faster than NFS ). There maybe something similar that is open source, but I don't know. EDIT: OpenAFS may work well for building a clustered filesystem over many EC2 instances.
Have you evaluated using NFS? Maybe you could dedicate one instance as an NFS host.