JVM Buffer pool only grows - elasticsearch

I'm ElasticSearch in production and using prometheus to scrape the metrics. Looking at the graphs I could see the jvm_buffer_pool metric just grow until finally crashed
As I understood the buffer pool is outside of GC, but how to clean it up?

The JVM has Direct ByteBuffers which are on heap objects which proxy off-heap memory. The ByteBuffer is tens of bytes even if the off-heap memory is 1 GB. When the GC cleans up this proxy object because it is no longer referenced, the off-heap memory is also released.
If the off-heap memory isn't being released, it because;
it's on heap proxies are being retained. i.e. the memory is needed.
ElasticSearch is allocating off-heap memory directly and the library has a leak (unlikely)
I would try allowing more direct memory to see if this helps. -XX:MaxDirectMemorySize=64g or whatever you can spare.

Related

Cassandra java process using more memory than its allocated max heap size (Xmx)

We have our cassandra cluster which runs Apache Cassandra 3.11.4 in set of unix hosts (18). each of these host has 96G of RAM and we have configured heap size to -Xms=64G -Xmx=64G but top command (top -M) on hosts shows the actual memory utilization is ~85G on average i.e. much higher than allocated heap (64G).
the trends of memory usage are like, during startup of cassandra daemon, top -M show the process has already occupied ~75G which (75G-64G)=9G more than allocated heap size, and this memory utilization increases over time and reaches to max 85G in just 3-4 hours and remains at that stage throughout the time, while the heap utilization (~40-50%) is normal, GS activities are usual, minor GC kicks in as usual.
have confirmed that the total off-heap memory utilized by all the keyspaces are below 2G on each hosts.
We are unable to trace what else is consuming the RAM in addition to the allocated heap.
Besides the heap memory, Cassandra uses also the off-heap memory, for example for keeping compression metadata, bloom filters, and some other things. From documentation (1, 2):
Compression metadata is stored off-heap and scales with data on disk. This often requires 1-3GB of off-heap RAM per terabyte of data on disk, though the exact usage varies with chunk_length_in_kb and compression ratios.
Bloom filters are stored in RAM, but are stored offheap, so operators should not consider bloom filters when selecting the maximum heap size.
You can monitor heap & offheap memory usage using the JMX, for example. (I've seen setups, where bloom filter alone occupied ~40Gb of RAM, but it was heavily dependent on the number of the unique partition keys)
Too big heaps are usually not recommended because they can use long pauses, etc. It of course depends on the workload, but you can try 31Gb or lower (or just use default settings). Plus, you need to leave the memory for Linux file buffers so it will cache often used files. That is the reason why by default Cassandra allocates only 1/4th of system memory for heap.

Memory Allocation of Apache Ignite vs Redis (jemalloc)?

How does Apache Ignite do memory allocation to avoid memory fragmentation, specifically I'm trying to compare Ignite's approach to Redis's (jemalloc) approach.
Apache Ignite uses Durable Memory instead of heap allocation. This means there are no fragmentation issues in malloc sense. It splits memory into 4k pages and writes stored data to pages, recycling them as needed.
Even if it didn't, Ignite uses Java, which has relocating GC and thus not vulnerable to memory fragmentation - it can always compact its heap. But it may also result in GC pauses, which we avoid by having Durable Memory.
It is possible that page memory itself will be fragmented, hence we have fillFactor metric to track this.

why elasticsearch cluster's heap usage so mutch?

why my elasticsearch cluster's heap usage so mutch. while there just a littile request?
Elasticsearch uses memory for few different purposes, all in order to provide faster search times and high indexing throughput.
The heap usage you pasted doesn't look high. Elasticsearch runs its Java process in the JVM. The JVM manages the heap and clears memory through garbage collection. looking at heap usage graph you will usually see the usage increasing constantly, and then sudden drops - which happen when the GC (Garbage collector) releases the unused memory.
Do you have a specific issue with memory? it looks ok - ranging between 40%-70% is perfectly normal.

why the memory fragmentation is less than 1 in Redis

Redis support 3 memory allocator: libc, jemalloc, tcmalloc. When i do memory usage test, i find that mem_fragmentation_ratio in INFO MEMORY could be less than 1 with libc allocator. With jemalloc or tcmalloc, this value is greater or equal than 1 as it should be.
Could anyone explain why mem_fragmentation_ratio is less than 1 with libc?
Redis version:2.6.12. CentOS 6
Update:
I forgot to mention that one possible reason is that swap happens and mem_fragmentation_ratio will be < 1.
But when i do my test, i adjust swapiness, even turn swap off. The result is the same. And my redis instance actually do not cost too much memory.
Generally, you will have less fragmentation with jemalloc or tcmalloc, than with libc malloc. This is due to 4 factors:
more granular allocation classes for jemalloc and tcmalloc. It reduces internal fragmentation, especially when Redis has to allocate a lot of very small objects.
better algorithms and data structures to prevent external fragmentation (especially for jemalloc). Obviously, the gain depends on your long term memory allocation patterns.
support of "malloc size". Some allocators offer an API to return the size of allocated memory. With glibc (Linux), malloc does not have this capability, so it is emulated by explicitly adding an extra prefix to each allocated memory block. It increases internal fragmentation. With jemalloc and tcmalloc (or with the BSD libc malloc), there is no such
overhead.
jemalloc (and tcmalloc with some setting changes) can be more aggressive than glibc to release memory to the OS - but again, it depends on the allocation patterns.
Now, how is it possible to get inconsistent values for mem_fragmentation_ratio?
As stated in the INFO documentation, the mem_fragmentation_ratio value is calculated as the ratio between memory resident set size of the process (RSS, measured by the OS), and the total number of bytes allocated by Redis using the allocator.
Now, if more memory is allocated with libc (compared to jemalloc,tcmalloc), or if more memory is used by some other processes on your system during your benchmarks, Redis memory may be swapped out by the OS. It will reduce the RSS (since a part of Redis memory will not be in main memory anymore). The resulting fragmentation ratio will be less than 1.
In other words, this ratio is only relevant if you are sure Redis memory has not been swapped out by the OS (if it is not the case, you will have performance issues anyway).
Other than swap, I know 2 ways to make "memory fragmentation ratio" to be less than 1:
Have a redis instance with little or no data, but thousands of idling client connections. From my testing, it looks like redis will have to allocate about 20 KB of memory for each client connections, but most of it won't actually be used (i.e. won't appear in RSS) until later.
Have a master-slave setup with let's say 8 GB of repl-backlog-size. The 8 GB will be allocated as soon as the replication starts (on master only for version <4.0, on both master and slave otherwise), but the memory will only be used as we start writing to the master. So the ratio will be way below 1 initially, and then get closer and closer to 1 as the replication backlog get filled.

why does the redis memory usage not reduce when del half of keys

Redis is used to save data but it costs a lot of memory, and its memory usage up to 52.5%.
I deleted half of the keys in redis, and the return code of the delete operation is ok, but its memory usage doesn't reduce.
What's the reason? Thanks in Advance.
My operation code is as below:
// save data
m_pReply = (redisReply *)redisCommand(m_pCntxt, "set %b %b", mykey.data(), mykey.size(), &myval, sizeof(myval));
// del data
m_pReply = (redisReply *)redisCommand(m_pCntxt, "del %b", mykey.data(), mykey.size());
The redis info:
redis 127.0.0.1:6979> info
redis_version:2.4.8
redis_git_sha1:00000000
redis_git_dirty:0
arch_bits:64
multiplexing_api:epoll
gcc_version:4.4.6
process_id:28799
uptime_in_seconds:1289592
uptime_in_days:14
lru_clock:127925
used_cpu_sys:148455.30
used_cpu_user:38023.92
used_cpu_sys_children:23187.60
used_cpu_user_children:123989.72
connected_clients:22
connected_slaves:0
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0
used_memory:31903334872
used_memory_human:29.71G
used_memory_rss:34414981120
used_memory_peak:34015653264
used_memory_peak_human:31.68G
mem_fragmentation_ratio:1.08
mem_allocator:jemalloc-2.2.5
loading:0
aof_enabled:0
changes_since_last_save:177467
bgsave_in_progress:0
last_save_time:1343456339
bgrewriteaof_in_progress:0
total_connections_received:820
total_commands_processed:2412759064
expired_keys:0
evicted_keys:0
keyspace_hits:994257907
keyspace_misses:32760132
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:11672476
vm_enabled:0
role:slave
master_host:192.168.252.103
master_port:6479
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
db0:keys=66372158,expires=0
Please refer to Memory allocation section on the following link:
http://redis.io/topics/memory-optimization
I quoted it here:
Redis will not always free up (return) memory to the OS when keys are
removed. This is not something special about Redis, but it is how most
malloc() implementations work. For example if you fill an instance
with 5GB worth of data, and then remove the equivalent of 2GB of data,
the Resident Set Size (also known as the RSS, which is the number of
memory pages consumed by the process) will probably still be around
5GB, even if Redis will claim that the user memory is around 3GB. This
happens because the underlying allocator can't easily release the
memory. For example often most of the removed keys were allocated in
the same pages as the other keys that still exist.
Since Redis 4.0.0 there's a command for this:
MEMORY PURGE
Should do the trick: https://redis.io/commands/memory-purge
Note however that command docs state:
This command is currently implemented only when using jemalloc as an allocator, and evaluates to a benign NOOP for all others.
And the README reminds us that:
Redis is compiled and linked against libc
malloc by default, with the exception of jemalloc being the default on Linux
systems. This default was picked because jemalloc has proven to have fewer
fragmentation problems than libc malloc.
A good starting point is to use the Redis CLI command: MEMORY DOCTOR.
It can give you very valuable information and point you to the potential issue.
some useful links:
MEMORY DOCTOR command docs
What is defragmentation and what are the Redis defragmentation configs
example:
Peak memory: In the past this instance used more than 150% the memory that is currently using. The allocator is normally not able to release memory after a peak, so you can expect to see a big fragmentation ratio, however this is actually harmless and is only due to the memory peak, and if the Redis instance Resident Set Size (RSS) is currently bigger than expected, the memory will be used as soon as you fill the Redis instance with more data. If the memory peak was only occasional and you want to try to reclaim memory, please try the MEMORY PURGE command, otherwise the only other option is to shutdown and restart the instance.
High total RSS: This instance has a memory fragmentation and RSS overhead greater than 1.4 (this means that the Resident Set Size of the Redis process is much larger than the sum of the logical allocations Redis performed). This problem is usually due either to a large peak memory (check if there is a peak memory entry above in the report) or may result from a workload that causes the allocator to fragment memory a lot. If the problem is a large peak memory, then there is no issue. Otherwise, make sure you are using the Jemalloc allocator and not the default libc malloc. Note: The currently used allocator is "jemalloc-5.1.0".
High allocator fragmentation: This instance has an allocator external fragmentation greater than 1.1. This problem is usually due either to a large peak memory (check if there is a peak memory entry above in the report) or may result from a workload that causes the allocator to fragment memory a lot. You can try enabling 'activedefrag' config option.

Resources