When overFlowToDisk gets activated in EHCACHE? - caching

I have some questions on "overflowToDisk" attribute of element?
1) I read at this URL that :
overflowToDisk sets whether element can overflow to disk when the memory store has reached the maximum limit.
"Memory" above refers JVM memory allocated for Java process running EHCACHE, or is there any parameter in to specify Cache memory size?
2) When the poces running EHCACHE terminates for some reason, whether this disk gets cleared and everything in cache gets vanished?

Elements start to overflow to the disk when you have more than maxElementsInMemory of them in the memory store. The following example creates a cache that stores 1000 elements in memory, and, if you need to store more, up to 10000 on disk:
<cache name="cacheName"
maxElementsInMemory="1000"
maxElementsOnDisk="10000"
overflowToDisk="true"
timeToIdleSeconds="..."
timeToLiveSeconds="...">
</cache>
For the second question, have a look at the diskPersistent parameter. If it is set to true, Ehcache will keep your data stored on the disk when you stop the JVM. The following example demonstrates this:
<cache name="cacheName"
maxElementsInMemory="1000"
maxElementsOnDisk="10000"
overflowToDisk="true"
diskPersistent="true"
timeToIdleSeconds="..."
timeToLiveSeconds="...">
</cache>

As of Ehcache 2.6, the storage model is no longer an overflow one but a tiered one. In the tiered storage model, all data will always be present in the lowest tier. Items will be present in the higher tiers based on their hotness.
Possible tiers for open source Ehcache are:
On-heap that is on the JVM heap
On-disk which is the lowest one
By definition high tiers have lower latency but less capacity than lower tiers.
So for an open source cache configured with overflowToDisk, all the data will always be inside the disk tier. It will store the key in memory and the data on disk.
Answer copied from this other question.

Related

Aerospike HDD/Memory usage

I'm exploring Aerospike as key-value DB with storing data on disk for safety. Please confirm, that I understand this correctly:
If in namespace configuration I set:
storage-engine device
memory-size 4G
file /opt/aerospike/data/namespace.dat
filesize 16G
data-in-memory false
-> all data will be on disk only, "memory-size" is for indexes only (small usage), all data will be stored in multiple 16GB files (which will be creating automatically), and most important - every read query will trigger reading data from disk?
If in namespace configuration I set:
storage-engine device
memory-size 4G
file /opt/aerospike/data/namespace.dat
filesize 16G
data-in-memory true
-> all data will be on disk and partly in memory, "memory-size" will act like cache and contain 4GB of most used data, all data will be stored in multiple 16GB files (which will be creating automatically), and most important - every read query will trigger checking data from memory and if missing -> reading from disk and adding to memory? What data will be in memory - most used or latest created?
If in namespace configuration I set:
storage-engine memory
memory-size 4G
data-in-memory true
-> all data will be in memory only, I'm limited to 4GB of data and no more?
Aerospike doesn't shuffle data in and out of disk like first generation NoSQL databases do, ones that have a "cache-first" architecture. Aerospike's hybrid memory architecture is such that the primary index (metadata) is always in memory. Depending on the namespace configuration, the data is stored fully on disk or in memory. You define storage for each namespace. If it is in-memory all the data and metadata is in-memory, fully. if the namespace stores its data on a few devices (/dev/sdb, /dev/sdc) the primary index (metadata) is fully in memory and the data is fully on those SSDs.
(1) is data on HDD, and the configuration is correct. If you're using an SSD you probably want to use device instead of file. One thing that isn't true in your question is that Aerospike will first check the post-write-queue on a read.
Aerospike does block writes to optimize around the high-read / low-write performance of HDD and SSD. The size of the block is determined by the write-block-size config parameter (should be 1MB for a HDD). The records are first loaded into a streaming write buffer of an equivalent size. After the buffer is flushed to a block on disk, Aerospike doesn't get rid of this in-memory copy immediately; it remains part of the post-write queue (FIFO). By default, 256 of those blocks are in the queue per-device, or per-file (you can define multiple file lines as the storage device). If your usage pattern is such that reads follow closely after the writes, you'll be getting in-memory access instead of disk access. If your cache_read_pct metric is not single digits and you have DRAM to spare, you probably can benefit from raising the post-write-queue value (max of 2048 blocks per-device).
(2) is an in-memory namespace, persisted to disk. For both (1) and (2) you can use either file (for filesystem based storage) or device (for raw device). Both the primary index (metadata) and storage (data) are in memory for (2). All reads and writes come out of memory, and a secondary write-through goes to the persistence device.
filesize reserves the size of the persistence layer on the filesystem (if you chose to use file and not device). You can have multiple file lines, each of which will be sized from the start to the number given as filesize. memory-size is the maximum amount of memory used by the namespace. This isn't pre-reserved. Aerospike will grow and shrink in memory usage over time, with the maximum for the namespace being its memory-size.
Take a look at What's New in 3.11, specifically the section that touches on in-memory performance improvements. Tuning partition-tree-sprigs and partition-tree-locks will likely boost the performance of your in-memory namespaces.
(3) is a purely in-memory namespace, usually intended to be a cache. The 4G limit affects things such as stop-writes-pct, high-water-memory-pct as those are defined as a percentage of that limit (see evictions, expirations, stop-writes).
There's also a (4) special-case for counters called data-in-index. See storage engine configuration recipes.

EhCache to put new element to disk if memory store full

I would like to use EhCache in combination of memory and disk cache. EhCache should move new elements to disk when memory is full. e.g. I have 100 elements in ehCache memory store and tries to put 101st element and if memory is full then put 101st element to disk not 1st element.
Could you please let me know the cache configuration to achieve this?
Ehcache no longer works that way. The tiering model introduced in Ehcache 2.6 and used since then will always store ALL mappings into the lower tier, disk in your case.
The reason is predictable latency. If Ehcache waited for the memory tier to be full before using the disk, you would see a latency increase maybe at the worst time for your application. While the model were all mappings are written to disk gives you the upper bound for the write latency, while reads may be faster for hot value that are available in memory directly.

How will garbage collection affect the elements of my ehcache that are stored on the heap?

I’m using Hibernate 4.3.11.Final with the accompanying version of ehcache. I have a simple cache configuration which looks like the following:
<defaultCache maxElementsInMemory="10000"
eternal="false"
timeToIdleSeconds="86400"
timeToLiveSeconds="86400"
overflowToDisk="false"
memoryStoreEvictionPolicy="LRU">
</defaultCache>
<cache name="main" />
My question is, because the memory setting is part of the heap and the heap gets garbage collected periodically, what happens when some of the entries in my cache get garbage collected? Is it the same as those entries getting evicted from the cache?
Garbage collection (GC) will never collect entries from a cache to which there is a root path since they are referenced by the cache itself.
To answer the question around offheap, let's say you decide to have 500k mappings in your cache and each mapping is 10k bytes. That amounts to nearly 5GB of cached data. Data that has an impact on the GC when it runs, since it needs to perform operations around it - mark, promotions, compaction depending on GC impl. So offheap answers this problem by placing the objects outside of the area where GC happens in order to enable the application to run with a much smaller heap and thus reduced GC pauses.
All of this does not contradict that it is never the GC that will remove a cache entry. It is the opposite - once evicted, expired or replaced, then a former mapping becomes free for GC as long as there are no more root paths to it.
And this is what the explanation given in this answer says.

EHcache performance in using disk store cache

We are using the ehcache in our application. Look at the following configuration:
<diskStore path="java.io.tmpdir" />
<cache name="service" maxElementsInMemory="50000" eternal="true" overflowToDisk="true"/>
Since we have configured as eternal="true", Is it going to create caches for ever?. Is there a chance of running out of disk space?
What would be the performance impact on disk store?. It is definitely slower than the in-memory cache, but how much impact.
If more caches are stored in the disk, will it cause IO issue of doing multiple file operations?
Please suggest the best practice for a production grade applications. Consider that we have a 3 GB heap memory and 25000 concurrent users accessing the application. But, there is no database used in our application.
The application is deployed in WAS 8.5.5.
eternal=true means mappings will never expire.
overflowToDisk=true means that all mappings put in the cache will end up written on disk, from the first mapping put in the cache. The current Ehcache tiering model (since version 2.6.0) always makes use of the slower store - disk here - in order to give you predictable latency. When a mapping is accessed, it gets faulted into heap for faster retrieval. When too many mappings are faulted in heap, eviction from heap kicks in to keep the heap cache size according to maxElementsInMemory.
Given that you do not size the disk store by setting maxElementsLocalDisk, it defaults to 0 which means no limit. So yes, you may run out of disk space if you never explicitly remove cache entries.
It is quite hard to recommend proper cache size without knowing the details of your application. What I can recommend is that you measure both heap and disk usage and assess when the increased memory usage outweighs the performance gain.

Does ehcache reserve (allocate) heap memory set with maxBytesLocalHeap?

I am using ehache v. 2.8.
But I am not sure if I understand the documentation correctly regarding reservation of the memory for the cache.
If the memory is set in ehcache.xml like this:
<ehcache maxBytesLocalHeap="256M">
(...)
</ehcache>
..will it actually be allocated at start and this cache will use exactly 256MB of heap or does this only mean (like it should, if this attribute is named like it is) that this cache can take at most 256MB of heap?
This means that this cache will do its best to contain 256MB or less of user data.
But note that the actual memory footprint of the cache can be somewhat larger due to internal data structures.
Also in case the cache operates at full capacity, it may temporarily go over size while eviction takes place.

Resources