Does the Ehcache memory limitation apply to collection values? - ehcache

I'm using Ehcache 3.9. I have a cache with key -> List<Value>. I set the memory limit on the heap to 1 MB.
Let's say I add an entry to the cache: key1 -> emptyList. Then I add many Value objects to the list of key1. Would those Value objects contribute to the size of the cache as perceived by Ehcache. In other words, would Ehcache still be able to limit the cache to 1 MB if the cache grows by extending value objects, instead of adding entries.
Thanks.

Related

How can I configure an EhCache cache to use an LRU eviction strategy in version 3.8 of ehcache?

In version 3.8, how can I configure an EhCache cache to use an LRU eviction strategy?
I've looked at the EvictionAdvisor, but it only seems to get called for the most recently inserted item. So I can in essence say "yes" or "no" on evicting the most recently added item. But it is not useful in identifying other items that should be evicted.
I seem to recall that in EhCache 2.8 (it's been awhile), I could provide information in the ehcache.xml configuration file to specify that the cache use an LRU eviction strategy.
int these two documentation mentioned that ehcache is using LRU as default eviction strategy :
A cache eviction algorithm is a way of deciding which element to evict when the cache is full. In Ehcache, the MemoryStore may be limited in size (see How to Size Caches for more information). When the store gets full, elements are evicted. The eviction algorithms in Ehcache determine which elements are evicted. The default is LRU.
https://www.ehcache.org/documentation/2.8/apis/cache-eviction-algorithms.html
Ehcache uses Last Recently Used (LRU) as the default eviction strategy for the memory stores. The eviction strategy determines which cache entry is to be evicted when the cache is full.
https://springframework.guru/using-ehcache-3-in-spring-boot/

HBase: Why are there evicted blocks before the max size of the BlockCache is reached?

I am currently using a stock configuration of Apache HBase, with RegionServer heap at 4G and BlockCache sizing at 40%, so around 1.6G. No L2/BucketCache configured.
Here are the BlockCache metrics after ~2K requests to RegionServer. As you can see, there were blocks evicted already, probably leading to some of the misses.
Why were they evicted when we aren't even close to the limit?
Size 2.1 M Current size of block cache in use (bytes)
Free 1.5 G The total free memory currently available to store more cache entries (bytes)
Count 18 Number of blocks in block cache
Evicted 14 The total number of blocks evicted
Evictions 1,645 The total number of times an eviction has occurred
Mean 10,984 Mean age of Blocks at eviction time (seconds)
StdDev 5,853,922 Standard Deviation for age of Blocks at eviction time
Hits 1,861 Number requests that were cache hits
Hits Caching 1,854 Cache hit block requests but only requests set to cache block if a miss
Misses 58 Block requests that were cache misses but set to cache missed blocks
Misses Caching 58 Block requests that were cache misses but only requests set to use block cache
Hit Ratio 96.98% Hit Count divided by total requests count
What you are seeing is the effect of the LRU treating blocks with three levels of priority: single-access, multi-access, and in-memory. For the default L1 LruBlockCache class their share of the cache can be set with (default values in brackets):
hbase.lru.blockcache.single.percentage (25%)
hbase.lru.blockcache.multi.percentage (50%)
hbase.lru.blockcache.memory.percentage (25%)
For the 4 GB heap example, and 40% set aside for the cache, you have 1.6 GB heap, which is further divided into 400 MB, 800 MB, and 400 MB for each priority level, based on the above percentages.
When a block is loaded from storage it is flagged as single-access usually, unless the column family it belongs to has been configured as IN_MEMORY = true, setting its priority to in-memory (obviously). For single-access blocks, if another read access is requesting the same block, it is flagged as multi-access priority.
The LruBlockCache has an internal eviction thread that runs every 10 seconds and checks if the blocks for each level together are exceeding their allowed percentage. Now, if you scan a larger table once, and assuming the cache was completely empty, all of the blocks are tagged single-access. If the table was 1 GB in size, you have loaded 1 GB into a 400 MB cache space, which the eviction thread then is going to reduce in due course. In fact, dependent on how long the scan is taking, the 10 seconds of the eviction thread is lapsing during the scan and will start to evict blocks once you exceed the 25% threshold.
The eviction will first evict blocks from the single-access area, then the multi-access area, and finally, if there is still pressure on the heap, from the in-memory area. That is also why you should make sure your working set for in-memory flagged column families is not exceeding the configured cache area.
What can you do? If you have mostly single-access blocks, you could tweak the above percentages to give more to the single-access area of the LRU.

Efficient way to search and sort data with elasticsearch as a datastore

We are using elasticsearch as a primary data store to save data and our indexing strategy is time based(for example, we create an index every 6 hours - configurable). The search-sort queries that come to our application contain time range; and based on input time range we calculate the indices need to be used for searching data.
Now, if the input time range is large - let's say 6 months, and we delegate the search-sort query to elasticsearch then elasticsearch will load all the documents into memory which could drastically increase the heap size(we have a limitation on the heap size).
One way to deal with the above problem is to get the data index by index and sort the data in our application ; indices are opened/closed accordignly; for example, only latest 4 indices are opened all the time and remaining indices are opened/closed based on the need. I'm wondering if there is any better way to handle the problem in hand.
UPDATE
Option 1
Instead of opening and closing indexes you could experiment with limiting the field data cache size.
You could limit the field data cache to a percentage of the JVM heap size or a specific size, for example 10Gb. Once field data is loaded into the cache it is not removed unless you specifically limit the cache size. Putting a limit will evict the oldest data in the cache and so avoid an OutOfMemoryException.
You might not get great performance but then it might not be worse than opening and closing indexes and would remove a lot of complexity.
Take into account that Elasticsearch loads all of the documents in the index when it performs a sort so that means whatever limit you put should be big enough to load that index into memory.
See limiting field data cache size
Option 2
Doc Values
This means writing necessary meta data to disk at index time, so that means the "fielddata" required for sorting lives on disk and not in memory. It is not a huge amount slower than using in memory fielddata and in fact can alleviate problems with garbage collection as less data is loaded into memory. There are some limitations such as string fields needing to be not_analyzed.
You could use a mixed approach and enable doc values on your older indexes and use faster and more flexible fielddata on current indexes (if you could classify your indexes in that way). That way you don't penalize the queries on "active" data.
See Doc Values documentation

How to configure maxEntriesLocalHeap in ehcache?

The Ehcache docs (https://www.ehcache.org/documentation/2.8/configuration/cache-size.html) describe maxEntriesLocalHeap as:
The maximum number of cache entries or bytes a cache can use in local heap memory, or, when set at the CacheManager level
(maxBytesLocalHeap only), a local pool available to all caches under
that CacheManager. This setting is required for every cache or at the
CacheManager level.
Does this mean that for this configuration :
<cache
name="myCache"
maxEntriesLocalHeap="5000"
eternal="false"
overflowToDisk="false"
timeToLiveSeconds="10000"
memoryStoreEvictionPolicy="FIFO" />
The maximum number of objects that can be added to the cache is 5000. These objects can contain multiple child objects but just the top level parent object is added as an entry. So under the hood the amount of maxEntriesLocalHeap objects could grow to 15000 (at this point the oldest object is swapped out as new objects are added) if each object has a reference to two other objects. Is this correct ?
Yes.
The maxEntriesLocalHeap enforcement is only going to count the number of key/value pairs you store in the cache. It's up to you as the user to have a good understanding of the heap overhead of retaining each of these entries, and ensuring that your configured Java heap can cope with this load.

Configuring redis to consistently evict older data first

I'm storing a bunch of realtime data in redis. I'm setting a TTL of 14400 seconds (4 hours) on all of the keys. I've set maxmemory to 10G, which currently is not enough space to fit 4 hours of data in memory, and I'm not using virtual memory, so redis is evicting data before it expires.
I'm okay with redis evicting the data, but I would like it to evict the oldest data first. So even if I don't have a full 4 hours of data, at least I can have some range of data (3 hours, 2 hours, etc) with no gaps in it. I tried to accomplish this by setting maxmemory-policy=volatile-ttl, thinking that the oldest keys would be evicted first since they all have the same TTL, but it's not working that way. It appears that redis is evicting data somewhat arbitrarily, so I end up with gaps in my data. For example, today the data from 2012-01-25T13:00 was evicted before the data from 2012-01-25T12:00.
Is it possible to configure redis to consistently evict the older data first?
Here are the relevant lines from my redis.cnf file. Let me know if you want to see any more of the cofiguration:
maxmemory 10gb
maxmemory-policy volatile-ttl
vm-enabled no
AFAIK, it is not possible to configure Redis to consistently evict the older data first.
When the *-ttl or *-lru options are chosen in maxmemory-policy, Redis does not use an exact algorithm to pick the keys to be removed. An exact algorithm would require an extra list (for *-lru) or an extra heap (for *-ttl) in memory, and cross-reference it with the normal Redis dictionary data structure. It would be expensive in term of memory consumption.
With the current mechanism, evictions occur in the main event loop (i.e. potential evictions are checked at each loop iteration before each command is executed). Until memory is back under the maxmemory limit, Redis randomly picks a sample of n keys, and selects for expiration the most idle one (for *-lru) or the one which is the closest to its expiration limit (for *-ttl). By default only 3 samples are considered. The result is non deterministic.
One way to increase the accuracy of this algorithm and mitigate the problem is to increase the number of considered samples (maxmemory-samples parameter in the configuration file).
Do not set it too high, since it will consume some CPU. It is a tradeoff between eviction accuracy and CPU consumption.
Now if you really require a consistent behavior, one solution is to implement your own eviction mechanism on top of Redis. For instance, you could add a list (for non updatable keys) or a sorted set (for updatable keys) in order to track the keys that should be evicted first. Then, you add a daemon whose purpose is to periodically check (using INFO) the memory consumption and query the items of the list/sorted set to remove the relevant keys.
Please note other caching systems have their own way to deal with this problem. For instance with memcached, there is one LRU structure per slab (which depends on the object size), so the eviction order is also not accurate (although more deterministic than with Redis in practice).

Resources