Ruby OOM in container - ruby

Recently we've encountered a problem with Ruby inside a Docker container. Despite quite low load, application tends to consume huge amounts of memory and after some time under mentioned load it OOMs.
After some investigation we narrowed down problem to the one-liner
docker run -ti -m 209715200 ruby:2.1 ruby -e 'while true do array = []; 3000000.times do array << "hey" end; puts array.length; end;'
On some machines it OOMed (was killed by oom-killer because of exceeding the limit) soon after the start, but on some it worked, though slowly, without OOMs. It seems like (only seems, maybe it's not the case) in some configurations ruby is able to deduce cgroup's limits and adjust it's GC.
Configurations tested:
CentOS 7, Docker 1.9 — OOM
CentOS 7, Docker 1.12 — OOM
Ubuntu 14.10, Docker 1.9 — OOM
Ubuntu 14.10, Docker 1.12 — OOM
MacOS X Docker 1.12 — No OOM
Fedora 23 Docker 1.12 — No OOM
If you look at the memory consumption of ruby process, in all cases it behaved similar to this picture, staying on the same level slightly below the limit, or crashing into the limit and being killed.
Memory consumption plot
We want to avoid OOMs at all cost, because it reduces resiliency and poses a risk of loosing data. Memory really needed for the application is way below the limit.
Do you have any suggestions as of what to do with ruby to avoid OOMing, possibly by loosing in performance?
We can't figure out what're the significant differences between tested installations.
Edit: Changing the code or increasing memory limit are not available. First one because we run fluentd with community plugins which we have no control of, second one because it won't guarantee that we won't face this issue again in the future.

You can try to tweak rubies garbage collection via environment variables (depending on your ruby version):
RUBY_GC_MALLOC_LIMIT=4000100
RUBY_GC_MALLOC_LIMIT_MAX=16000100
RUBY_GC_MALLOC_LIMIT_GROWTH_FACTOR=1.1
Or call garbage collection manualy via GC.start
For your example, try
docker run -ti -m 209715200 ruby:2.1 ruby -e 'while true do array = []; 3000000.times do array << "hey" end; puts array.length; array = nil; end;'
to help the garbage collector.
Edit:
I don't have a comparable environment to yours. On my machine (14.04.5 LTS, docker 1.12.3, RAM 4GB, Intel(R) Core(TM) i5-3337U CPU # 1.80GHz) the following looks quite promising.
docker run -ti -m 500MB -e "RUBY_GC_MALLOC_LIMIT_GROWTH_FACTOR=1" \
-e "RUBY_GC_MALLOC_LIMIT=5242880" \
-e "RUBY_GC_MALLOC_LIMIT_MAX=16000100" \
-e "RUBY_GC_HEAP_INIT_SLOTS=500000" \
ruby:2.1 ruby -e 'while true do array = []; 3000000.times do array << "hey" end; puts array.length; puts `ps -o rss -p #{Process::pid}`.chomp.split("\n").last.strip.to_i / 1024.0 / 1024 ; puts GC.stat; end;'
But every ruby app needs a different setup for fine tuning and if you experience memory leaks, your lost.

I don't think this is a docker issue. You're overusing the resources of the container and Ruby tends to not behave well once you hit memory thresholds. It can GC, but if another process tries to take some memory or Ruby attempts to allocate again while you are maxed out then the kernel will (usually) kill the process with the most memory. If you're worried about memory usage on a server, add some threshold alerts at 80% RAM and allocate the appropriately sized resources for the job. When you start hitting thresholds, allocate more RAM or look at the particular job parameters/allocations to see if it needs to be redesigned to have a lower footprint.
Another potential option if you really want to have a nice fixed memory band to GC against is to use JRuby and set the JVM max memory to leave a little wiggle room on the container memory. The JVM will manage OOM within its own context better as it isn't sharing those resources with other processes nor letting the kernel think the server is dying.

I had a similar issue with a few java based Docker containers that were running on a single Docker host. The problem was each container saw the total available memory of the host machine and assumed it could use all of that memory for itself. It didn't run GC very often and I ended up getting out of memory exceptions. I ended up manually limiting the amount of memory each container could use and I no longer got OOMs. Within the contianer I also limited the memory of the JVM.
Not sure if this is the same issue you're seeing but it could be related.
https://docs.docker.com/engine/reference/run/#/runtime-constraints-on-resources

Related

memory usage grows until VM crashes while running Wildfly 9 with Java 8

We are having an issue with virtual servers (VMs) running out of native memory. These VMs are running:
Linux 7.2(Maipo)
Wildfly 9.0.1
Java 1.8.0._151 running with (different JVMs have different heap sizes. They range from 0.5G to 2G)
The JVM args are:
-XX:+UseG1GC
-XX:SurvivorRatio=1
-XX:NewRatio=2
-XX:MaxTenuringThreshold=15
-XX:-UseAdaptiveSizePolicy
-XX:G1HeapRegionSize=16m
-XX:MaxMetaspaceSize=256m
-XX:CompressedClassSpaceSize=64m
-javaagent:/<path to new relic.jar>
After about a month, sometimes longer, the VMs start to use all of their swap space and then eventually the OOM-Killer notices that java is using too much memory and kills one of our JVMs.
The amount of memory being used by the java process is larger than heap + metaSpace + compressed as revealed by using -XX:NativeMemoryTracking=detail
Are there tools that could tell me what is in this native memory(like a heap dump but not for the heap)?
Are there any tools that can map java heap usage to native memory usage (outside the heap) that are not jemalloc? I have used jemalloc to try to achieve this but the graph that is being drawn contains only hex values and not human readable class names so I cant really get anything out of it. Maybe I'm doing something wrong or perhaps I need another tool.
Any suggestions would be greatly appreciated.
You can use jcmd.
Start application with -XX:NativeMemoryTracking=summary or -
XX:NativeMemoryTracking=detail
Use jcmd to monitor the NMT (native memory tracker)
jcmd "pid" VM.native_memory baseline //take the baseline
jcmd "pid" VM.native_memory detail.diff // use based on your need to analyze more on change in native memory from its baseline

Docker volume with Grunt file watch

I'm porting an existing project with Grunt file watches to a Docker development container. The source files are bind-mounted into the container, and Grunt watches the files for changes (this can probably be optimized, but my current concern is: simply get the current setup working within Docker).
On the Mac, I'm experiencing enormous CPU usage, so I read the performance tuning guide for osxfs. The guide mentions the cached and delegated volume modes.
The description for delegated says:
the container’s view is authoritative
(permit delays before updates on the container appear in the host)
For cached:
[…] provides all the guarantees of the delegated configuration, and some additional guarantees around the visibility of writes performed by containers. As such, cached typically improves the performance of read-heavy workloads, at the cost of some temporary inconsistency between the host and the container.
In comparison to which setting does cached improve performance? Is "read-heavy workloads" seen from the container's perspective?
To cut a long story short: What's the optimal setting to reduce CPU usage for a development environment which uses file watches? cached or delegated?
Ok, so I did some testing and here's my results. Setup:
MacBook Air 11", early 2014
macOS 10.12.6
Docker 17.06.0-ce-mac19 (18663)
watch task polling for ~ 1,000 files
The culprit processes eating up CPU cycles in the host are hyperkit and com.docker.osxfs. The following percentage values are the median CPU usage taken over five samples:
delegated: 18.7 % hyperkit + 0.0 % com.docker.osxfs = 18.7 %
cached: 24.3 % hyperkit + 0.1 % com.docker.osxfs = 24.4 %
default aka. consistent: 152.0 % hyperkit + 68.9 % com.docker.osxfs = 220.9 % (!)
Functionality-wise I didn't notice any difference. When changing a file outside the container the changes were picked up virtually immediately by the watch in each of the three cases. So I'm going to use the delegated mode now.

Greenplum gp_vmem_protect_limit configuration

We are doing a PoC by installing Greenplum on AWS environment. We have setup each of our segment servers as d2.8xlarge instance types which has 240 GB of RAM with no SWAP.
I am now trying to setup the gp_vmem_protect_limit using the formula mentioned in gpdb documents and the value is coming to 25600MB.
But in one of the Zendesk Notes it says that gp_vmem_protect_limit will be breached when "sessions executing on this segment are attempting together to use more than configured limit. " Does the segment in this text mean Segment Host or number of primary segments?
Also, with the Eager Free option being set I see that the memory utilization is very poor when running the TPC-DS benchmark with 5 concurrent users. I would like to improve the memory utilization of the environment and below are the other memory configurations
gpconfig -c gp_vmem_protect_limit -v 25600MB
gpconfig -c max_statement_mem -v 16384MB
gpconfig -c statement_mem -v 2400MB
Any suggestions?
Thanks,
Jayadeep
There is a calculator for it!
http://greenplum.org/calc/
You should also add a swap file or disk. It is pretty easy to do in Amazon too. I would add at least a 4GB swap file to each host when you have 240GB of RAM.

redis bgsave failed because fork Cannot allocate memory

all:
here is my server memory info with 'free -m'
total used free shared buffers cached
Mem: 64433 49259 15174 0 3 31
-/+ buffers/cache: 49224 15209
Swap: 8197 184 8012
my redis-server has used 46G memory, there is almost 15G memory left free
As my knowledge,fork is copy on write, it should not failed when there has 15G free memory,which is enough to malloc necessary kernel structures .
besides, when redis-server used 42G memory, bgsave is ok and fork is ok too.
Is there any vm parameter I can tune to make fork return success ?
More specifically, from the Redis FAQ
Redis background saving schema relies on the copy-on-write semantic of fork in modern operating systems: Redis forks (creates a child process) that is an exact copy of the parent. The child process dumps the DB on disk and finally exits. In theory the child should use as much memory as the parent being a copy, but actually thanks to the copy-on-write semantic implemented by most modern operating systems the parent and child process will share the common memory pages. A page will be duplicated only when it changes in the child or in the parent. Since in theory all the pages may change while the child process is saving, Linux can't tell in advance how much memory the child will take, so if the overcommit_memory setting is set to zero fork will fail unless there is as much free RAM as required to really duplicate all the parent memory pages, with the result that if you have a Redis dataset of 3 GB and just 2 GB of free memory it will fail.
Setting overcommit_memory to 1 says Linux to relax and perform the fork in a more optimistic allocation fashion, and this is indeed what you want for Redis.
Redis doesn't need as much memory as the OS thinks it does to write to disk, so may pre-emptively fail the fork.
Modify /etc/sysctl.conf and add:
vm.overcommit_memory=1
Then restart sysctl with:
On FreeBSD:
sudo /etc/rc.d/sysctl reload
On Linux:
sudo sysctl -p /etc/sysctl.conf
From proc(5) man pages:
/proc/sys/vm/overcommit_memory
This file contains the kernel virtual memory accounting mode. Values are:
0: heuristic overcommit (this is the default)
1: always overcommit, never check
2: always check, never overcommit
In mode 0, calls of mmap(2) with MAP_NORESERVE set are not checked, and the default check is very weak, leading to the risk of getting a process "OOM-killed". Under Linux 2.4
any non-zero value implies mode 1. In mode 2 (available since Linux 2.6), the total virtual address space on the system is limited to (SS + RAM*(r/100)), where SS is the size
of the swap space, and RAM is the size of the physical memory, and r is the contents of the file /proc/sys/vm/overcommit_ratio.
Redis's fork-based snapshotting method can effectively double physical memory usage and easily OOM in cases like yours. Reliance on linux virtual memory for doing snapshotting is problematic, because Linux has no visibility into Redis data structures.
Recently a new redis-compatible project Dragonfly has been released. Among other things, it solves the OOM problem entirely. (disclosure - I am the author of this project).

memory limit in Node.js (and chrome V8)

In many places in the web, you will see:
What is the memory limit on a node process?
and the answer:
Currently, by default V8 has a memory limit of 512mb on 32-bit systems, and 1gb on 64-bit systems. The limit can be raised by setting --max-old-space-size to a maximum of ~1gb (32-bit) and ~1.7gb (64-bit), but it is recommended that you split your single process into several workers if you are hitting memory limits.
Can somebody confirm this is the case as Node.js seems to update frequently?
And more importantly, will it be the case in the near future?
I want to write JavaScript code which might have to deal with 4gb of javascript objects (and speed might not be an issue).
If I can't do it in Node, I will end up doing in java (on a 64bit machine) but I would rather not.
This has been a big concern for some using Node.js, and there are good news. The new memory limit for V8 is now unknown (not tested) for 64bit and raised to as much as 32bit address space allows in 32bit environments.
Read more here: http://code.google.com/p/v8/issues/detail?id=847
Starting nodejs app with a heap memory of 8 GB
node --max-old-space-size=8192 app.js
See node command line options documentation or run:
node --help --v8-options
I'm running a proc now on Ubuntu linux that has a definite memory leak and node 0.6.0 is pushing 8gb. Think it's handled :).
Memory Limit Max Value is 3049 for 32bit users
If you are running Node.js with os.arch() === 'ia32' is true, the max value you can set is 3049
under my testing with node v11.15.0 and windows 10
if you set it to 3050, then it will overflow and equal to be set to 1.
if you set it to 4000, then it will equal to be set to 51 (4000 - 3049)
Set Memory to Max for Node.js
node --max-old-space-size=3049
Set Memory to Max for Node.js with TypeScript
node -r ts-node/register --max-old-space-size=3049
See: https://github.com/TypeStrong/ts-node/issues/261#issuecomment-402093879
It looks like it's true. When I had tried to allocate 50 Mb string in buffer:
var buf = new Buffer(50*1024*1024);
I've got an error:
FATAL ERROR: CALL_AND_RETRY_2 Allocation failed - process out of memory
Meantime there was about 457 Mb of memory usage by Node.js in process monitor.

Resources