I am running a Go process under a cgroup with an OOM Killer, and I want to print the heap memory info by such as command "go tool pprofhttp://localhost:6060/debug/pprof/heap" or any other way which can figure out the top memory consumers when the process is killed (Btw, it is better to print the heap info by the Go process itself such as call some method by the Go process), so I can know top memory consumers and fix it next time.
Dear Stackoverflowians,
I’m having a problem with a Spring Cloud Stream app using a Kafka Streams binder. It’s only in our own Pivotal Cloud Foundry (CF) environment where this issue occurs. I have kind of hit a wall at this point and so I turn to you and your wisdom!
When the application starts up I see the following error
<snip>
2019-08-07T15:17:58.36-0700 [APP/PROC/WEB/0]OUT current active tasks: [0_3, 1_3, 2_3, 3_3, 4_3, 0_7, 1_7, 5_3, 2_7, 3_7, 4_7, 0_11, 1_11, 5_7, 2_11, 3_11, 4_11, 0_15, 1_15, 5_11, 2_15, 3_15, 4_15, 0_19, 1_19, 5_15, 2_19, 3_19, 4_19, 0_23, 1_23, 5_19, 2_23, 3_23, 4_23, 5_23]
2019-08-07T15:17:58.36-0700 [APP/PROC/WEB/0]OUT current standby tasks: []
2019-08-07T15:17:58.36-0700 [APP/PROC/WEB/0]OUT previous active tasks: []
2019-08-07T15:18:02.67-0700 [API/0] OUT Updated app with guid 2db4a719-53ee-4d4a-9573-fe958fae1b4f ({"state"=>"STOPPED"})
2019-08-07T15:18:02.64-0700 [APP/PROC/WEB/0]ERR terminate called after throwing an instance of 'std::system_error'
2019-08-07T15:18:02.64-0700 [APP/PROC/WEB/0]ERR what(): Resource temporarily unavailable
2019-08-07T15:18:02.67-0700 [CELL/0] OUT Stopping instance 516eca4f-ea73-4684-7e48-e43c
2019-08-07T15:18:02.67-0700 [CELL/SSHD/0]OUT Exit status 0
2019-08-07T15:18:02.71-0700 [APP/PROC/WEB/0]OUT Exit status 134
2019-08-07T15:18:02.71-0700 [CELL/0] OUT Destroying container
2019-08-07T15:18:03.62-0700 [CELL/0] OUT Successfully destroyed container
The key here being the line with
what(): Resource temporarily unavailable
The error is related to the number of partitions. If I set the partition count to 12 or less things work. If I double it, the process fails to start with this error.
This doesn’t happen on my local windows dev machine. It also doesn’t happen in my local docker environment when I wrap this app in a docker image and run. I can take the same image and push it to CF or push the app as a java app, I get this error.
Here is some information about the kafka streams app. We have an input topic with a number of partitions. The topic is the output of debezium connector and basically it’s a change log of a bunch of database tables. The topology is not super complex but it’s not trivial. Its job is to aggregate the table update information back into our aggregates. We end up with 17 local stores in the topology. I have a strong suspicion this issue has something to do with rocksdb and the resources available to the CF container the app is in. But I have not the faintest idea what the Resource is that’s “temporarily unavailable”.
As I mentioned, I tried deploying it as a docker container with various jdk8 jvms, different base images centos, debian, I tried various different CF java buildbacks, I tried limiting java heap with relation to max container memory size (thinking that maybe it has something to do with native memory allocation) to no avail.
I’ve also asked our ops folks to up some limits on the containers and open files limit changed from the initial 16k to now 500k+. I saw some file lock related errors as below but they went away after this change.
2019-08-01T15:46:23.69-0700 [APP/PROC/WEB/0]ERR Caused by: org.rocksdb.RocksDBException: lock : /home/vcap/tmp/kafka-streams/cms-cdc/0_7/rocksdb/input/LOCK: No locks available
2019-08-01T15:46:23.69-0700 [APP/PROC/WEB/0]ERR at org.rocksdb.RocksDB.open(Native Method)
However the error what(): Resource temporarily unavailable with higher number of partitions persists.
ulimit -a on the container looks like this
~$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 1007531
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 524288
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
I really do need to understand what the root of this error is. It’s hard to plan in this case not knowing what limit we’re hitting here.
Hope to hear your ideas. Thanks!
Edit:
Is there maybe some way to get more verbose error messages from the rocksdb library or a way to maybe build it so it outputs more info?
Edit 2
I have also tried to customize the rocksdb memory settings using org.apache.kafka.streams.state.RocksDBConfigSetter
The defaults are in org.apache.kafka.streams.state.internals.RocksDBStore#openDB(org.apache.kafka.streams.processor.ProcessorContext)
First I made sure the Java heap settings were well below the container process size limit and left nothing to the memory calculator by setting
JAVA_OPTS: -XX:MaxDirectMemorySize=100m -XX:ReservedCodeCacheSize=240m -XX:MaxMetaspaceSize=145m -Xmx1000m
With this I tried:
1.
Lowering the write buffer size
org.rocksdb.Options#setWriteBufferSize(long)
org.rocksdb.Options#setMaxWriteBufferNumber(int)
2.
Setting max_open_files to half the limit for the container (the total of all db instances)
org.rocksdb.Options#setMaxOpenFiles(int)
3.
I tried turning off the block cache altogether
org.rocksdb.BlockBasedTableConfig#setNoBlockCache
4.
I also tried setting cache_index_and_filter_blocks = true after re-enabling block cache
https://github.com/facebook/rocksdb/wiki/Block-Cache#caching-index-and-filter-blocks
All to no avail. The above issue is still happening when I set higher number of partitions (24) on the input topic. Now that I have RocksDBConfigSetter with logging in it, I can see that the error happens exactly when the rocksdb is configured.
Edit 3
I still haven't gotten to the bottom of this. I have asked the question on https://www.facebook.com/groups/rocksdb.dev and was advised to trace system calls with strace or similar, but I was not able to obtain the required permissions to do that in our environment.
It has eaten up so much time that I had to settle for a workaround for now. What I ended up doing is refactored the topology to
1) minimize the number of materialized ktables (and the number of resulting rocksdb instances) and
2) break up the topology among multiple processes.
This allowed me to turn on and off topology parts in separate deployments with spring profiles and has given me some limited way forward for now.
It seems that MRI makes duplication of memory allocation for every new thread.
I use Ubuntu x64, ruby-2.2.4 (rvm), and this what i get:
Just started irb:
I see pmap -d 1656 59760K (allocated memory, or '[ stack ]' for the program stack [man pmap(1)]) memory usage:
And when creating a thread:
I see pmap -d 1656 127352K memory usage:
So, I see duplication 59760K -> 127352K of memory allocation.
Such behavior is similar to result of the fork() call, which being used for creation a new process, makes a copy of its calling process data ('copy-on-write' is out this context) for new process.
But Thread is created in the same process and shares its data, and it looks strange...
In practice, it means that Thread in Ruby has similar to Process restriction in memory usage: new thread creation fails when allocated memory getting closer to physical memory size.
I am curious, WHY?
UPDATE
It's not duplication memory but additional allocation for ~50K for each thread.
Thanks #tadman for suggestion that it's an overhead and not something like copying memory in the fork()'s way.
I am running systemd version 219.
root#EVOvPTX1_RE0-re0:/var/log# systemctl --version
systemd 219
+PAM -AUDIT -SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP -LIBCRYPTSETUP -GCRYPT +GNUTLS +ACL +XZ -LZ4 -SECCOMP +BLKID -ELFUTILS +KMOD -IDN
I have a service, let's call it foo.service which has the following.
[Service]
MemoryLimit=1G
I have deliberately added code to allocate 1M memory 4096 times which causes
4G memory alloc when a certain event is received. The idea is that after
the process consumes 1G of address space, memory alloc would start failing.
However, this does not seem to be the case. I am able to alloc 4G memory
without any issues. This tells me that the memory limit specified in the
service file is not enforced.
Can anyone let me know what am I missing ?
I looked at the proc file system - file named limits. This shows that the
Max address space is Unlimited, which also confirms that the memory limit
is not getting enforced.
This distinction is that you have allocated memory, but you haven't actually used it. In the output of top, this is the difference between the "VIRT" memory column (allocated) and the "RES" column (actually used).
Try modifying your experiment to assign values to elements of a large array instead of just allocating memory and see if you hit the memory limit that way.
Reference: Resident and Virtual memory on Linux: A short example
My program, which I've run numerous times on different clusters suddenly stops. The log:
15/04/20 19:19:59 INFO scheduler.TaskSetManager: Finished task 12.0 in stage 15.0 (TID 374) in 61 ms on ip-XXX.compute.internal (16/24)
15/04/20 19:19:59 INFO storage.BlockManagerInfo: Added rdd_44_14 in memory on ip-XXX.compute.internal:37999 (size: 16.0 B, free: 260.6 MB)
Killed
What does "Killed" mean and why does it occur? There's no other errors.
"Killed" usually means that the OS has terminated the process by sending a SIGKILL signal. This is an unblockable signal that terminates a process immediately. It's often used as an OOM (out-of-memory) process killer -- if the OS decides that memory resources are getting dangerously low, it can pick a process to kill to try to free some memory.
Without more information, it's impossible to tell whether your process was killed because of memory problems or for some other reason. The kind of information you might be able to provide to help diagnose what's going on includes: how long was the process running before it was killed? can you enable and provide more verbose debug output from the process? is the process termination associated with any particular pattern of communication or processing activity?
Try setting yarn.nodemanager.vmem-check-enabled to false in your program's Spark config, something like this:
val conf = new SparkConf().setAppName("YourProgramName").set("yarn.nodemanager.vmem-check-enabled","false")
val sc = new SparkContext(conf)
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-avoid-being-killed-by-YARN-node-manager-td22199.html
maybe the vm problem
ensure you have swap partition.
ensure vm.swappiness is not zero.