How to configure maxEntriesLocalHeap in ehcache? - spring

The Ehcache docs (https://www.ehcache.org/documentation/2.8/configuration/cache-size.html) describe maxEntriesLocalHeap as:
The maximum number of cache entries or bytes a cache can use in local heap memory, or, when set at the CacheManager level
(maxBytesLocalHeap only), a local pool available to all caches under
that CacheManager. This setting is required for every cache or at the
CacheManager level.
Does this mean that for this configuration :
<cache
name="myCache"
maxEntriesLocalHeap="5000"
eternal="false"
overflowToDisk="false"
timeToLiveSeconds="10000"
memoryStoreEvictionPolicy="FIFO" />
The maximum number of objects that can be added to the cache is 5000. These objects can contain multiple child objects but just the top level parent object is added as an entry. So under the hood the amount of maxEntriesLocalHeap objects could grow to 15000 (at this point the oldest object is swapped out as new objects are added) if each object has a reference to two other objects. Is this correct ?

Yes.
The maxEntriesLocalHeap enforcement is only going to count the number of key/value pairs you store in the cache. It's up to you as the user to have a good understanding of the heap overhead of retaining each of these entries, and ensuring that your configured Java heap can cope with this load.

Related

Does the Ehcache memory limitation apply to collection values?

I'm using Ehcache 3.9. I have a cache with key -> List<Value>. I set the memory limit on the heap to 1 MB.
Let's say I add an entry to the cache: key1 -> emptyList. Then I add many Value objects to the list of key1. Would those Value objects contribute to the size of the cache as perceived by Ehcache. In other words, would Ehcache still be able to limit the cache to 1 MB if the cache grows by extending value objects, instead of adding entries.
Thanks.

How can I configure an EhCache cache to use an LRU eviction strategy in version 3.8 of ehcache?

In version 3.8, how can I configure an EhCache cache to use an LRU eviction strategy?
I've looked at the EvictionAdvisor, but it only seems to get called for the most recently inserted item. So I can in essence say "yes" or "no" on evicting the most recently added item. But it is not useful in identifying other items that should be evicted.
I seem to recall that in EhCache 2.8 (it's been awhile), I could provide information in the ehcache.xml configuration file to specify that the cache use an LRU eviction strategy.
int these two documentation mentioned that ehcache is using LRU as default eviction strategy :
A cache eviction algorithm is a way of deciding which element to evict when the cache is full. In Ehcache, the MemoryStore may be limited in size (see How to Size Caches for more information). When the store gets full, elements are evicted. The eviction algorithms in Ehcache determine which elements are evicted. The default is LRU.
https://www.ehcache.org/documentation/2.8/apis/cache-eviction-algorithms.html
Ehcache uses Last Recently Used (LRU) as the default eviction strategy for the memory stores. The eviction strategy determines which cache entry is to be evicted when the cache is full.
https://springframework.guru/using-ehcache-3-in-spring-boot/

Understanding elasticsearch jvm heap usage

Folks,
I am trying reduce my memory usage in my elasticsearch deployment (Single node cluster).
I can see 3GB JVM heap space being used.
To optimize I first need to understand the bottleneck.
I have limited understanding of how is JVM usage is split.
Field data looks to consume 1.5GB and filter cache & query cache combined consume less than 0.5GB, that adds upto 2GB at the max.
Can someone help me understand where does elasticsearch eats up rest of 1GB?
I can't tell for your exact setup, but in order to know what's going on in your heap, you can use the jvisualvm tool (bundled with the jdk) together with marvel or the bigdesk plugin (my preference) and the _cat APIs to analyze what's going on.
As you've rightly noticed, the heap hosts three main caches, namely:
the fielddata cache: unbounded by default, but can be controlled with indices.fielddata.cache.size (in your case it seems to be around 50% of the heap, probably due to the fielddata circuit breaker)
the node query/filter cache: 10% of the heap
the shard request cache: 1% of the heap but disabled by default
There is nice mindmap available here (Kudos to Igor KupczyƄski) that summarizes the roles of caches. That leaves more or less ~30% of the heap (1GB in your case) for all other object instances that ES needs to create in order to function properly (see more about this later).
Here is how I proceeded on my local env. First, I started my node fresh (with Xmx1g) and waited for green status. Then I started jvisualvm and hooked it onto my elasticsearch process. I took a heap dump from the Sampler tab so I can compare it later on with another dump. My heap looks like this initially (only 1/3 of max heap allocated so far):
I also checked that my field data and filter caches were empty:
Just to make sure, I also ran /_cat/fielddata and as you can see there's no heap used by field data yet since the node just started.
$ curl 'localhost:9200/_cat/fielddata?bytes=b&v'
id host ip node total
TMVa3S2oTUWOElsBrgFhuw iMac.local 192.168.1.100 Tumbler 0
This is the initial situation. Now, we need to warm this all up a bit, so I started my back- and front-end apps to put some pressure on the local ES node.
After a while, my heap looks like this, so its size has more or less increased by 300 MB (139MB -> 452MB, not much but I ran this experiment on a small dataset)
My caches have also grown a bit to a few megabytes:
$ curl 'localhost:9200/_cat/fielddata?bytes=b&v'
id host ip node total
TMVa3S2oTUWOElsBrgFhuw iMac.local 192.168.1.100 Tumbler 9066424
At this point I took another heap dump to gain insights into how the heap had evolved, I computed the retained size of the objects and I compared it with the first dump I took just after starting the node. The comparison looks like this:
Among the objects that increased in retained size, he usual suspects are maps, of course, and any cache-related entities. But we can also find the following classes:
NIOFSDirectory that are used to read Lucene segment files on the filesystem
A lot of interned strings in the form of char arrays or byte arrays
Doc values related classes
Bit sets
etc
As you can see, the heap hosts the three main caches, but it is also the place where reside all other Java objects that the Elasticsearch process needs and that are not necessarily cache-related.
So if you want to control your heap usage, you obviously have no control over the internal objects that ES needs to function properly, but you can definitely influence the sizing of your caches. If you follow the links in the first bullet list, you'll get a precise idea of what settings you can tune.
Also tuning caches might not be the only option, maybe you need to rewrite some of your queries to be more memory-friendly or change your analyzers or some fields types in your mapping, etc. Hard to tell in your case, without more information, but this should give you some leads.
Go ahead and launch jvisualvm the same way I did here and learn how your heap is growing while your app (searching+indexing) is hitting ES and you should quickly gain some insights into what's going on in there.
Marvel only plots some instances on the heap which needs to be monitored like Caches in this case.
The caches represent only a portion of the total heap usage. There are a lot many other instances which will occupy the heap memory and those may not have a direct plotting on this marvel interface.
Hence, Not all heap occupied in ES is only by the cache.
In order to clearly understand the exact usage of heap by different instances, you should take heap dump of the process and then analyze it using a Memory Analyzer tool which can provide you with the exact picture.

LRU cache with objects that should not be removed

I use an LRU cache (LruCache class from android.util) in my Android app. which is generally working fine.
Now I have a special requirement for this LRU cache: I would like that some objects are never removed. Explaining the situation: I have an array of objects (named mymetadata objects) that should be never removed and I have a lot of other objects (named dynamicdata objects) that should be removed with the LRU rule. I would like to store mymetadata objects in the LRU cache, because the array of objects can also grow and using an LRU cache helps avoiding running out of memory.
Is there any trick to guarantee that mymetadata objects are never removed from the LRU cache? Or should I simply access an object from the array so that it is marked as last used?
Is there any trick to guarantee that mymetadata is never removed
from the LRU cache? Or should I simply access a object of the array
and it is marked as last used?
Besides regularly touching the objects you want to keep in the LRU cache (to force raising their ranks), I don't see what else could be done. One problem with this might be when should these objects be touched and what is the performance impact of this operation?
A different approach would be to split the storage of your objects depending on their persistence. Keep a standard map for your persistent objects and an LRU cache for objects that can expire. This mix of two data structures can then be hidden behind a single interface similar to the ones of Map or LruCache (each query is directed to the right internal storage).
I would like to put the mymetadata objects into the LRU cache, because the array of objects can also grow
This seems to be in conflict with your "never removed" requirement for some object. How do you decide when a persistent object is allowed to expire?
Anyway, yet another approach would consist in reimplementing the LRU cache data structure, keeping two separate ordered lists of objects instead of a single one: one for mymetadata objects and one for dynamicdata objects. Each query to this data structure is then directed to the right list and both kind of objects can expire independently (the size of the cache can also be chosen independently for each set of objects). But both kind of objects are stored in the same hash table / map.

Configuring redis to consistently evict older data first

I'm storing a bunch of realtime data in redis. I'm setting a TTL of 14400 seconds (4 hours) on all of the keys. I've set maxmemory to 10G, which currently is not enough space to fit 4 hours of data in memory, and I'm not using virtual memory, so redis is evicting data before it expires.
I'm okay with redis evicting the data, but I would like it to evict the oldest data first. So even if I don't have a full 4 hours of data, at least I can have some range of data (3 hours, 2 hours, etc) with no gaps in it. I tried to accomplish this by setting maxmemory-policy=volatile-ttl, thinking that the oldest keys would be evicted first since they all have the same TTL, but it's not working that way. It appears that redis is evicting data somewhat arbitrarily, so I end up with gaps in my data. For example, today the data from 2012-01-25T13:00 was evicted before the data from 2012-01-25T12:00.
Is it possible to configure redis to consistently evict the older data first?
Here are the relevant lines from my redis.cnf file. Let me know if you want to see any more of the cofiguration:
maxmemory 10gb
maxmemory-policy volatile-ttl
vm-enabled no
AFAIK, it is not possible to configure Redis to consistently evict the older data first.
When the *-ttl or *-lru options are chosen in maxmemory-policy, Redis does not use an exact algorithm to pick the keys to be removed. An exact algorithm would require an extra list (for *-lru) or an extra heap (for *-ttl) in memory, and cross-reference it with the normal Redis dictionary data structure. It would be expensive in term of memory consumption.
With the current mechanism, evictions occur in the main event loop (i.e. potential evictions are checked at each loop iteration before each command is executed). Until memory is back under the maxmemory limit, Redis randomly picks a sample of n keys, and selects for expiration the most idle one (for *-lru) or the one which is the closest to its expiration limit (for *-ttl). By default only 3 samples are considered. The result is non deterministic.
One way to increase the accuracy of this algorithm and mitigate the problem is to increase the number of considered samples (maxmemory-samples parameter in the configuration file).
Do not set it too high, since it will consume some CPU. It is a tradeoff between eviction accuracy and CPU consumption.
Now if you really require a consistent behavior, one solution is to implement your own eviction mechanism on top of Redis. For instance, you could add a list (for non updatable keys) or a sorted set (for updatable keys) in order to track the keys that should be evicted first. Then, you add a daemon whose purpose is to periodically check (using INFO) the memory consumption and query the items of the list/sorted set to remove the relevant keys.
Please note other caching systems have their own way to deal with this problem. For instance with memcached, there is one LRU structure per slab (which depends on the object size), so the eviction order is also not accurate (although more deterministic than with Redis in practice).

Resources