According | to | countless | sources, Docker provides ultra-lightweight virtualization by sharing system resources across containers, instead of allocating copies of those resources per container.
I've even read articles where it is boasted that you could "run dozens, even hundreds of containers on the same VM."
But if my app requires 2GB RAM to run, and the underlying physical machine has only 8GB RAM on it, I would normally only be able to run 3 instances of my app on it (leaving ~2GB for system memory, utilities, etc.).
Does Docker do some kind of magic with RAM, allowing me to actually run dozens of containers, each one allocated 2GB RAM, but somehow sharing unused memory under the hood?
Or are those statements more media hype than anything else?
When people talk about running "dozens or hundreds of containers" they are normally thinking about microservices; small applications that do a specific task. Each of these may have memory usage measured in KBs rather than MBs, and probably not GBs, and as such there is no reason a decent machine couldn't run dozens or hundreds of them.
There is actually a competition (I think it's on-going) to get as many containers as possible running on a Raspberry Pi. The result currently stands at over a thousand, but admittedly these containers won't be running a real-life application.
Regarding memory, the answer is "it's complicated". If you're using the AUFS or Overlay driver, containers with the same base image should be able to share "memory pages"; meaning shared libraries shouldn't need to get loaded twice for two containers. This isn't something special though; normal processes running on the host will work the same way.
At the end of the day, containers are little more than isolated processes. We can easily run dozens or hundreds of processes on a host, so it's not unfeasible to run dozens or hundreds of containers.
A Docker container only consumes the resources that it needs as it needs them. So yes you could literally run hundreds of machines on one box as long as they are not all actively consuming your resources. That is what makes Docker unique; the fact that a container will use what resources it can and then release them making them available for another container on the same host. It is best practice to let the container and Docker handle allocating resources instead of doing a hard assign of them.
The alternative would be a virtual machine. Each virtual machine that you run has to run a full linux kernal, and the host OS will hold a chunk of memory aside for the virtualized environment. This means that you can really only run a couple VMs on all but the heaviest duty hardware.
A container does NOT run a kernel- it just runs a single process (plus sub processes). This means that you can run as many processes in containers as you could if you were running those same processes without containers- each thinks it is running on a separate machine, but they all just show up as processes on the host kernel.
There is no magic that will make you able to use RAM dozens of times over. But you can pack smaller processes in together a LOT tighter than you could using virtual machines for seperation.
Related
i have C# application(s) to run for different Processing like Insertion of Record, Extraction of Text, Printing etc..
inernally it having different exe to run given Modules...
i want to efficiently use this application(s) to run in machine as per Machine configuration.
Example: let's say Machine having 8 GB RAM configuration..
i can start multiple instance of single application to improve processing speed.
But concern is, how can i decide number to run parallel instance per application based on machine configuration..
Is there any functionality in C# which say exe to run in given memory limit ?
if any one can advise
Thanks
Applications running in windows uses Virtual memory, for example : a process in 32 bit system has 2 GB, but in 64 bit it will be 1 TB of virtual memory limit, but how the process fit and the memory limit in the physical RAM is handled by the OS, which is Windows here, so you don't have control on how operating system handles physical memory.
I suggest using Parallel Class for parallel processing with C#, the performance will depend on the computer's specifications.
I know there are lots of docker experts around but I spent considerable time to find out some facts and figure about Docker's run time performance, but unfortunately i could not get any concrete answer. Let me start with telling you my System's configuration:
(a) Running CentOS 6.5 on a machine having 48GB RAM, 1TB Disc and 12 Core CPUs.
(b) I build up a Docker image which is having size almost 6.5GB
Below are questions if someone can answer for the benefit of readers:
(a) Now with the given configuration, question comes that how many containers I can run in parallel without break any functionality?
(b) Assume I have two Images each having size 3.5GB, then is it suggested to run multiple small size images or we get a good performance with big sized image?
(c) What is the best file systems option to use with Docker?
EDIT: more information
(d) Actually I'm trying to put many compilers inside a container and trying to give facility to users to compile their languages online. This tool is under development and will replace my existing website compileonlone.com. Things are going fine, I build up two images with few compilers in each. I'm able to run around 250 containers successfully and after that I start getting too many files opened. After 250 containers, my RAM is reaching somewhere 40GB and CPU utilization is around 50%,.
Main problem I'm facing is removal of the old containers. Because user will come and compile his code and then will go away, so I need to remove those container after certain period of time but when I'm trying to remove such stopped containers using docker rm -v, its slowing down main docker process and its almost hanging. I mean docker -d daemon which is listening at /var/run/docker.sock. Not sure if there is any other way around to clean these containers or I have a bug. Here is the detail of Docker:
# docker info
Containers: 1016
Images: 41
Storage Driver: devicemapper
Pool Name: docker-0:20-258-pool
Pool Blocksize: 64 Kb
Data file: /var/lib/docker/devicemapper/devicemapper/data
Metadata file: /var/lib/docker/devicemapper/devicemapper/metadata
Data Space Used: 17820.7 Mb
Data Space Total: 102400.0 Mb
Metadata Space Used: 102.4 Mb
Metadata Space Total: 2048.0 Mb
Execution Driver: native-0.2
Kernel Version: 3.17.2-1.el6.elrepo.x86_64
Operating System: <unknown>
WARNING: No swap limit support
If someone can help me on how to delete old containers in fastest way then it will be great. Simple shell script and all are not working. I already have tried like
#docker rm -v $(docker ps -a |grep Exited | awk '{print $1}')
but its completely slowing down main docker process and its unable to create new containers while this removal process is running.
Thanks for your time taken to answer these questions, which will help me as well as many others in going ahead with Docker.
a): A container is like a process. This question is like asking "how many processes can I run in parallel". It is not answerable without knowing what the processes are doing. Please add this information to your question.
b) Both 3.5GB and 6.5GB are very large for a Docker image. Best practice is to put one application in one container: if you have an application that size, then great. If not, maybe you have put your application's data into the image. This is not a good idea because the layered filesystem is slower than a regular filesystem, and you won't be wanting any of the features of layering or snapshotting on your transactional data.
The documentation on managing data explains how to mount regular disk so it is accessible from your containers.
Edit, after more information was supplied
d) Using up RAM implies the containers are still running. If there is some way within the logic of your site to know when a container is no longer needed you can docker kill it, then docker rm to remove the disk storage. Or docker rm -f does those two operations in one.
After a lot R&D and discussing with many experts, I found a solution to delete containers with lightening speed. Its simple you have run your docker daemon with dm.blkdiscard=false option as follow.
docker -d --storage-opt dm.blkdiscard=false
By the way here is what I have developed. Here I need to create and delete containers with a high speed
http://codingground.tutorialspoint.com
Hope this will help many others.
I am running Cloudera Hadoop on my laptop and Oracle VirtualBox VM.
I have given 5.6 GB out of mine 8 and six from eight cores as well.
And still I am not able to keep it up and running.
Even without load services would not stay up and running and when I try a query at least Hive will be down within 20 minutes. And sometimes they go down like dominoes: one after another.
More memory seemed to help some: with 3GB and all services, Hue was blinking with red colors when the Hue itself managed to get up. And after rebooting it would takes 30 - 60 minutes before I manage to get the system up enough to even try running anything on it.
There has been two sensible notes (that I have managed to find):
- Warning of swapping.
- Crashing note when the system used 26 GB of virtual memory which was not enough.
My dataset is less than one megabyte, so it is hard to understand why the system would go up to dozens of gigabytes, but for whatever was reason for that has passed: now the system is running more steadily around the 5.6 GB that I have given to it after closing down a few services: see my answer to myself.
And still it is just more stable. Right after I got a warning of swapping and the Hive went down again. What could be reason for more-or-less all Hadoop services going down if the VM starts to swap?
I don't have enough reputation to post the picture to here, but when Hive went down again it was swapping 13 pages / second and utilizing 5.9 GB / 5.6 GB. So basically my system starts crashing more-or-less right after it start to swap. "428 pages were swapped to disk in the previous 15 minute(s)"
I have used default installation options as far as hard drive is concerned.
Only addition is a shared folder between Windows and VM. That works somewhat strangely locking files all the time, so I used it just like FTP and only for passing files from one system to another. Thus I can go days without using it, but systems still crash, so that is not the cause either.
Now that the system is mostly up, services crash still about twice a day: Service Monitor and Hive are quite even with their crashing frequency. After those come Activity Monitor and Event Server, which appear to crash always together. I believe Yarn crashes as well, but it gets up on its own. Last time Hive crashed first, and then it got followed by Service Monitor, Hive (second time), Activity Monitor and Event Server all.
As swap is disk, perhaps the problem is with disk:
# cat /etc/fstab
# swapoff -a
# badblocks -v /dev/VolGroup/lv_swap
Checking blocks 0 to 8388607
Checking for bad blocks (read-only test): done
Pass completed, 0 bad blocks found.
# badblocks -vw /dev/VolGroup/lv_swap
Checking for bad blocks in read-write mode
From block 0 to 8388607
Testing with pattern 0xaa: done
Reading and comparing: done
Testing with pattern 0x55: done
Reading and comparing: done
Testing with pattern 0xff: done
Reading and comparing: done
Testing with pattern 0x00: done
Reading and comparing: done
Pass completed, 0 bad blocks found.
So nothing wrong with swap disk and I have not noticed any disk error anywhere else either.
Note that you could check file system from Windows side also. But I expect that if you make Windows to fix your Linux file system, you have good chances of destroying your Linux with that, so I did my checks somewhat pessimistically, because AFAIK these commands are safe to execute.
About half of the services kept going down, so giving more specifics would be a long story.
I succeeded to get the system more stable by closing down flume, hbase, impala, ks_indexer, oozie, spark and sqoop. And by increasing more memory to some remaining services that complained they had not been given enough memory.
Also I fixed couple of thing on the Windows side, I am not sure which one of these helped:
- MsMpEng.exe kept my hard drive busy. I didn't have permissions to kill it, but I decreased its priority to lowest possible.
- CcmExec.exe got to loop on my DVD and kept reading it for forever. This I solved by taking the DVD out from the drive. Then later on I killed the process tree to keep it from bothering for a while.
I found these using Windows resource manager.
The VM requires 4GB: http://www.cloudera.com/content/cloudera-content/cloudera-docs/DemoVMs/Cloudera-QuickStart-VM/cloudera_quickstart_vm.html You should use that.
I am not clear whether you are using the QuickStart VM though. It's set up to run just the essential services and tuned to conserve memory rather than exploit lots of memory.
It sounds like you are running your own installation, on one virtual machine, on your Windows machine. You may be running an entire cluster's worth of services on one desktop machine. Each of these services has master, worker processes, monitoring processes, etc. You don't need most of them.
You also probably have left memory settings at default suitable for a server-class machine of 16+ GB RAM. Remember these services usually run across many machines, not all on one.
Finally, you're clearly swapping, and that makes things incredibly slow. Remember this is all through a VM too!
Bottom line, use the QuickStart VM if you really want a 1-machine cluster tuned correctly. If you want a real cluster or more services, you need more hardware.
Also consider: cloudera.com/live contains a full CDH 5.1 cluster + sample data, running on demand on AWS. Of course, the advantage of the VM is that you can BYOD, but if you're simply looking for a hands-on Hadoop experience, Live is a great option.
Hypervisors and Memory Management
I have been using virtual machines for years and never really had any issues. I have primarily used VMWare's free single ESXi host and had nothing but success. Because I have never had any issues I have never delved in much deeper. I have however always been very wary of loading the system up and get a lot of spare resources handy.
I have recently purchased a new server and we have decided to give Hyper-V a try and see how that goes. We have a fairly small team but utilise lots of servers for testing etc.
My question relates to memory and how much I need to leave free or available for the host machine to run appropriately.
Setup:Dell Server 24 Cores: 48GB Ram
When I run taskmgr in the windows host instance I see the following:
Physical Memory: 49139
Cached: 14933
Available: 17743
Free: 2982
What exactly do these figures mean? What is the difference between free and available?
My server uses hardly any CPU resources ever and has 10 Production servers running on it without a single user complaint ever about speed of the services.
Am I able to run up another server with 2GB ram effectivly leaving 982MB free? or am I starting to push my requirements a little?
Thanks for the help.
You shouldn’t use the host partition for anything other than Hyper-V (although you can run security and infrastructure software such as management agents, backup agents and firewalls). Therefore, that 2GB recommendation assumes you aren’t going to run any extra applications or server roles in the parent partition.
Hyper-V doesn’t let you allocate memory directly to the host partition. It essentially uses whatever memory is left over. Therefore, you have to remember to leave 2GB of your host server’s memory not allocated so it’s available for the parent partition.
Source
My understanding is that node.js is designed to scale by adding processes rather than by spawning threads in a process. In fact, from watching an awesome introductory video by Ryan Dahl, I get the idea that spawning threads is forbidden in node.js. I like the simplicity of this approach, but I am concerned that there might be downside when running on Windows, since processes creation is more expensive on Windows than Linux.
Given modern hardware and the fact that node.js processes can be expected to be relatively long running, does process overhead still create a significant advantage for Linux when considering hosting node.js? To put it in concrete terms, if we assume an organization that is using the Windows stack only, but is planning a big move onto node.js, is there a point in considering a new OS because of this issue?
No. Node.js runs in only 1 process and doesn't spawn processes during execution.
The reason you might have gotten the impression that node uses processes to scale is because you can add a process per CPU core to enable node to take advantage of your multicore computer (you'll need a load balancer like solution for this tho). Still: you don't spawn processes on the fly. So yes, you can run node perfectly fine on Windows (or Azure) without too much of a performance hit (if any).