I use Meosos for batch Jobs. Jobs will be running as a docker container by the framework. The are 2 salves running on each VM. The resource for each Job was set to
CPUS - 0.1
MEM - 1G
Its a 4 core machine and mesos was considering it as 8 core as there are 2 slaves in each VM. So, it tried to overload the VM by submitting too many tasks, literally up to 80 jobs ( (4+4)/0.1 = 80). So, during the peak load VM used to crash.
Tried changing the CPU to 0.5 so that the VM will not be overloaded. (( (4+4)/0.5 = 20)). But, looks like CPU usage still goes up to 95%. The tasks are not CPU intensive task, but not sure why it is trying to consume 95%.
Is it like, tasks will be using the resource no matter even it actually requires them? So, it will allocate 0.5 by default or max to 0.5 in case it requires?
Having two agents on the same host/VM is more like an antipattern. If you want to oversubscribe on resources, have a look at the Mesos docs at http://mesos.apache.org/documentation/latest/oversubscription/
Related
I have deployed opentelemetry-collector as a container by pulling the image from https://hub.docker.com/r/otel/opentelemetry-collector/tags. I checked the container's memory usage using docker stats command and I got MEM USAGE / LIMIT -> 15.3MiB / 7.667GiB
Is there any possibility to reduce the memory usage for this default image to below 10MiB
I want to reduce the opentelemetry-collector container's memory usage to below 10MiB
You can add --memory=10m to your docker run command.
https://docs.docker.com/config/containers/resource_constraints/
Now - this isn't magic. If the process needs more than that to run then it will just crash.
If that is the case, then you will need to look at changing the configuration of the service and/or possibly its source code.
My system is running with QNX6.5 and it has 4 cpu cores. But I don't know which and all processes are running in each core. Is there any way to know in detail.
Thanks in advance
Processes typically run multiple threads (at least one - main thread); so the thread is actual running unit, not the process (and core affinity is settable per thread). Thus you'd need to know on which core(s) threads are running.
There is "%l" format option that tells you on what CPU particular thread is executing on:
# pidin -F "%b %50h %i %l" -p random
tid thread name cpu
1 1 0
Runmask : 0x0000007f
Inherit Mask: 0x0000007f
2 Timer Thread 1
Runmask : 0x0000007f
Inherit Mask: 0x0000007f
3 3 6
Runmask : 0x0000007f
Inherit Mask: 0x0000007f
Above we print thread id, thread name, run/inherit cpu masks and top right column is current cpu threads are running on, for the process called "random".
The best tooling for analyzing the details of process scheduling in QNX is the "System Analysis Toolkit", which uses the instrumentation features of the QNX kernel to provide a log of every scheduling event and message pass.
For QNX 6.5, the documentation can be found here: http://www.qnx.com/developers/docs/6.5.0SP1.update/index.html#./com.qnx.doc.instr_en_instr/about.html
Got the details by using below command.
pidin rmasks
which will give "pid, tid, and name" of every threads.
From the runmask value we can identify in which core it is running.
For me thread details also fine.
I'm porting an existing project with Grunt file watches to a Docker development container. The source files are bind-mounted into the container, and Grunt watches the files for changes (this can probably be optimized, but my current concern is: simply get the current setup working within Docker).
On the Mac, I'm experiencing enormous CPU usage, so I read the performance tuning guide for osxfs. The guide mentions the cached and delegated volume modes.
The description for delegated says:
the container’s view is authoritative
(permit delays before updates on the container appear in the host)
For cached:
[…] provides all the guarantees of the delegated configuration, and some additional guarantees around the visibility of writes performed by containers. As such, cached typically improves the performance of read-heavy workloads, at the cost of some temporary inconsistency between the host and the container.
In comparison to which setting does cached improve performance? Is "read-heavy workloads" seen from the container's perspective?
To cut a long story short: What's the optimal setting to reduce CPU usage for a development environment which uses file watches? cached or delegated?
Ok, so I did some testing and here's my results. Setup:
MacBook Air 11", early 2014
macOS 10.12.6
Docker 17.06.0-ce-mac19 (18663)
watch task polling for ~ 1,000 files
The culprit processes eating up CPU cycles in the host are hyperkit and com.docker.osxfs. The following percentage values are the median CPU usage taken over five samples:
delegated: 18.7 % hyperkit + 0.0 % com.docker.osxfs = 18.7 %
cached: 24.3 % hyperkit + 0.1 % com.docker.osxfs = 24.4 %
default aka. consistent: 152.0 % hyperkit + 68.9 % com.docker.osxfs = 220.9 % (!)
Functionality-wise I didn't notice any difference. When changing a file outside the container the changes were picked up virtually immediately by the watch in each of the three cases. So I'm going to use the delegated mode now.
Recently we've encountered a problem with Ruby inside a Docker container. Despite quite low load, application tends to consume huge amounts of memory and after some time under mentioned load it OOMs.
After some investigation we narrowed down problem to the one-liner
docker run -ti -m 209715200 ruby:2.1 ruby -e 'while true do array = []; 3000000.times do array << "hey" end; puts array.length; end;'
On some machines it OOMed (was killed by oom-killer because of exceeding the limit) soon after the start, but on some it worked, though slowly, without OOMs. It seems like (only seems, maybe it's not the case) in some configurations ruby is able to deduce cgroup's limits and adjust it's GC.
Configurations tested:
CentOS 7, Docker 1.9 — OOM
CentOS 7, Docker 1.12 — OOM
Ubuntu 14.10, Docker 1.9 — OOM
Ubuntu 14.10, Docker 1.12 — OOM
MacOS X Docker 1.12 — No OOM
Fedora 23 Docker 1.12 — No OOM
If you look at the memory consumption of ruby process, in all cases it behaved similar to this picture, staying on the same level slightly below the limit, or crashing into the limit and being killed.
Memory consumption plot
We want to avoid OOMs at all cost, because it reduces resiliency and poses a risk of loosing data. Memory really needed for the application is way below the limit.
Do you have any suggestions as of what to do with ruby to avoid OOMing, possibly by loosing in performance?
We can't figure out what're the significant differences between tested installations.
Edit: Changing the code or increasing memory limit are not available. First one because we run fluentd with community plugins which we have no control of, second one because it won't guarantee that we won't face this issue again in the future.
You can try to tweak rubies garbage collection via environment variables (depending on your ruby version):
RUBY_GC_MALLOC_LIMIT=4000100
RUBY_GC_MALLOC_LIMIT_MAX=16000100
RUBY_GC_MALLOC_LIMIT_GROWTH_FACTOR=1.1
Or call garbage collection manualy via GC.start
For your example, try
docker run -ti -m 209715200 ruby:2.1 ruby -e 'while true do array = []; 3000000.times do array << "hey" end; puts array.length; array = nil; end;'
to help the garbage collector.
Edit:
I don't have a comparable environment to yours. On my machine (14.04.5 LTS, docker 1.12.3, RAM 4GB, Intel(R) Core(TM) i5-3337U CPU # 1.80GHz) the following looks quite promising.
docker run -ti -m 500MB -e "RUBY_GC_MALLOC_LIMIT_GROWTH_FACTOR=1" \
-e "RUBY_GC_MALLOC_LIMIT=5242880" \
-e "RUBY_GC_MALLOC_LIMIT_MAX=16000100" \
-e "RUBY_GC_HEAP_INIT_SLOTS=500000" \
ruby:2.1 ruby -e 'while true do array = []; 3000000.times do array << "hey" end; puts array.length; puts `ps -o rss -p #{Process::pid}`.chomp.split("\n").last.strip.to_i / 1024.0 / 1024 ; puts GC.stat; end;'
But every ruby app needs a different setup for fine tuning and if you experience memory leaks, your lost.
I don't think this is a docker issue. You're overusing the resources of the container and Ruby tends to not behave well once you hit memory thresholds. It can GC, but if another process tries to take some memory or Ruby attempts to allocate again while you are maxed out then the kernel will (usually) kill the process with the most memory. If you're worried about memory usage on a server, add some threshold alerts at 80% RAM and allocate the appropriately sized resources for the job. When you start hitting thresholds, allocate more RAM or look at the particular job parameters/allocations to see if it needs to be redesigned to have a lower footprint.
Another potential option if you really want to have a nice fixed memory band to GC against is to use JRuby and set the JVM max memory to leave a little wiggle room on the container memory. The JVM will manage OOM within its own context better as it isn't sharing those resources with other processes nor letting the kernel think the server is dying.
I had a similar issue with a few java based Docker containers that were running on a single Docker host. The problem was each container saw the total available memory of the host machine and assumed it could use all of that memory for itself. It didn't run GC very often and I ended up getting out of memory exceptions. I ended up manually limiting the amount of memory each container could use and I no longer got OOMs. Within the contianer I also limited the memory of the JVM.
Not sure if this is the same issue you're seeing but it could be related.
https://docs.docker.com/engine/reference/run/#/runtime-constraints-on-resources
We are doing a PoC by installing Greenplum on AWS environment. We have setup each of our segment servers as d2.8xlarge instance types which has 240 GB of RAM with no SWAP.
I am now trying to setup the gp_vmem_protect_limit using the formula mentioned in gpdb documents and the value is coming to 25600MB.
But in one of the Zendesk Notes it says that gp_vmem_protect_limit will be breached when "sessions executing on this segment are attempting together to use more than configured limit. " Does the segment in this text mean Segment Host or number of primary segments?
Also, with the Eager Free option being set I see that the memory utilization is very poor when running the TPC-DS benchmark with 5 concurrent users. I would like to improve the memory utilization of the environment and below are the other memory configurations
gpconfig -c gp_vmem_protect_limit -v 25600MB
gpconfig -c max_statement_mem -v 16384MB
gpconfig -c statement_mem -v 2400MB
Any suggestions?
Thanks,
Jayadeep
There is a calculator for it!
http://greenplum.org/calc/
You should also add a swap file or disk. It is pretty easy to do in Amazon too. I would add at least a 4GB swap file to each host when you have 240GB of RAM.