Read Cache Data from File system or diskpath - caching

If overflowToDisk is enabled and Disk path is configured, then if data is not found in the memory should it automatically search from diskpath?
Refer the configuration mentioned
When overFlowToDisk gets activated in EHCACHE?
My case
1) Cache warm up from DB before application start
2) Load data from DB with loader implementation
3) Initially DB has 2000 data. So we have 1000 in memory (ABC_007) rest 1000 we have in the DISK.
Is this correct?
<cache name="ABC_007"
maxElementsInMemory="1000"
maxElementsOnDisk="10000"
overflowToDisk="true"
timeToIdleSeconds="..."
timeToLiveSeconds="...">
</cache>
If I search for data which is not in ABC_007, it will be retrieved from DISKPATH. Am I right on this one?
Now, if I implement Cache read through functionality that is if the data is not available in Cache (including diskpath), I should search in the DB.
Now I find the Data. Does it repopulate the Cache?
If ABC_007 still consists 1000 elements. Where it will be stored? ABC_007 or disk?
Please Correct my understandings.
For example refer the sample code
Cache cache = manager.getCache("ABC_007");
Element element = null;
String key = null;
for (int i=0 ; i<2000 ; i++) {
key = "keyInCache" + i ;
element = new Element (key , "value1");
cache.put(element);
}
Now when i cross 1000 then as per configuration , 1001 to 2000 elements will be stored in disk .
<cache name="ABC_007"
maxElementsInMemory="1000"
maxElementsOnDisk="10000"
overflowToDisk="true"
timeToIdleSeconds="..."
timeToLiveSeconds="...">
AM I RIGHT ?
Now I want the Value for the
Key = keyInCache1700
element = cache.get(key);
FROM Where I will get the Value ?
My understanding - as ABC_007 cache has maxElementsInMemory="1000" , that means it can srore upto 1000 key value in memory and value for the key keyInCache1700 will be retrieved from the Disk ...
AM I Correct ?

The answer depends on your version of Ehcache.
As of Ehcache 2.6, the storage model is no longer an overflow one but a tiered one.
In the tiered storage model, all data will always be present in the lowest tier.
Items will be present in the higher tiers based on their hotness.
Possible tiers for open source Ehcache are:
On-heap that is on the JVM heap
On-disk which is the lowest one
By definition high tiers have lower latency but less capacity than lower tiers.
So for a cache configured with overflowToDisk, all the data will always be inside the disk tier. It will store the key in memory and the data on disk.
When looking for an entry inside the cache, the tiers are considered from highest to lowest.
In your example, the data will be retrieved as follows:
Search in memory
If found, return it
Search on disk
If found, add to memory tier (hot data) and return it. This can cause another entry to be evicted from memory
Use your cache loader to retrieve it from DB
When found, add it to the cache and return it

I'm just going to outline/ summarize my rough idea of how EHCache works:
It's essentially a HashMap.
It keeps the HashMap of keys in memory, at all times.
The actual content of items can be stored either in memory, or (when there are enough items) overflow to disk.
Conclusion: My understanding is that EHCache knows what keys it has cached, and where the items are currently stored. This is a basic necessity for a cache to retrieve items quickly.
If an item is unknown to EHCache, I wouldn't expect it to go looking on disk for it.
You should definitely implement "read-thru to DB" logic, around your use of the cache. Items not found in the cache must obviously be read from the DB. Adding them to cache at that time would be expected to put them in memory, as they're currently hot (recently used).

Related

Redis cache tags flush memory leak

we using Laravel provided Redis tagged cache to cache query results for models in this way:
cache()->tags([model1, model2...])->remember(sql_query_hash, result_callback)
and if in example after some time and a users peak one of tags in example model1 have 500k of unique cached queries, and there comes update and need to do :
cache()>tags([model1])->flush();
my job gets allowed memory exhausted, for workers we have 128MB of memory. Yes I know if I would increase memory of workers I could flush then 1kk of keys etc. but its not a right way because we have exponential users increase and our project will grow, so maybe some of tags will have 10kk of keys on users peak, and how then I have to flush a cache for tag?
https://github.com/laravel/framework/blob/5.7/src/Illuminate/Cache/RedisTaggedCache.php#L174
this is how laravel flush tags keys, by retrieving all in memory then chunks it in memory again so this array_chunk double the memory usage after getting all keys, and then doing Redis::del operation to remove cached keys for this tag.
I don`t know how it call its a bug o not, but for me need some options does anyone dealing with that problem too, and maybe have some solutions?

How Ignite stores the value on heap

I am thinking of using the Data grid feature that Ignite provides. I am not clear about one aspect of Apache Ignite.
I wanted to know that whenever I put an object into an Ignite Cache ( which stores entries only on heap as no off heap is enabled ), does it serialize the object and stores it on the heap or does it stores the object as it is?
If I access the stored value from a process ( using IgniteCache#get ) running on the same JVM ( whose Heap the Ignite Value is stored in ), will Ignite first De-serialize the value and then give to my process ?
If the answer to the question is yes, then I would like to know that is there a workaround wherein I can bypass the overhead of serialization to improve the performance of my cache gets?
Ignite stores values in binary format and by default value is deserialized each time you read it.
If you set CacheConfiguration.copyOnRead, deserialized value will be shared along with binary form. This can increase read performance, but will also increase memory consumption. Also you should avoid mutating objects read this way.
Another option to avoid deserialization is to use withKeepBinary() flag. When set, cache will return BinaryObject instance instead of deserialized object. Refer to this page for more details: https://apacheignite.readme.io/docs/binary-marshaller

Couchbase: Is it possible to have stale cold cache?

We are considering to use Couchbase as persistent cache layer. Since Couchbase writes cache items to memory first and syncs to disk asynchronously, one concern we have is crash consistency. If some cache item were updated in memory and Couchbase crashes before committing them to disk, those items will be stale when Couchbase restarts.
My question is:
Will Couchbase detect and report those items are stale? If so, we can just discard those items since they are cache.
Is there any other Couchbase-specific ways to deal with the stale cache problem?
I don't think there would be a way to detect if a document is stale, since (in your scenario) they weren't written to disk before a crash.
However, you can specify durability requirements when creating a document. By default, a write is considered successful if it makes it into memory. You can add additional constraints like "PersistTo" (the document must be persisted to N number of nodes before the write is considered successful).

Couchbase - Order of saving documents in memory and on disk

Does Couchbase store documents in-memory first before moving the data to filestore? Is there any configuration available to specify how long the data has to be store in-memory before it can be flushed to file store?
Couchbase architecture is Memory first\Cache thru.
You can't decide if using memory or not, and it write the data to disk as soon as possible.
Part of that is that you need to have enough memory for the amount of data you have.
You do have some policies like Full or Value eviction but again you don't have the control.
But what you can do is in the SDK wait until the data is replicated\persisted to disk.
Couchbase stores data both on disk and in RAM. The default behavior is to write the document to disk at some arbitrary time (usually quickly) after storing in RAM. This leaves a short window where node failure can result in loss of data. I can't find anything in the documentation for the current version of Couchbase, but it used to be that you could request the "set" method to only complete once the data has been persisted to disk (default is to RAM only).
In any case, after writing to RAM, the document will eventually be written to disk. Couchbase keeps a disk write queue which you can check on the metrics report page in the management console. Now, CB does synchronize writes across the cluster, and I believe a write will be synchronized across a cluster before Couchbase will acknowledge that the write happened (e.g. before the write method returns to the caller). Again, the documentation is hard to determine on this, as prior versions the documentation was much more detailed.
If you have more documents than available RAM, only the most-frequently accessed documents will be stored in RAM for quick retrieval, with all others being "evicted" to disk.

Store objects using ehcache +Spring

My application has internationalization for all tables. So all tables has its another table for different languange support with key as language code like 'en-us'. Every time if it hits the db and to show in page then applications get slow. so We implemented by extending AbstractMessageSource class.I referred the link http://forum.springsource.org/showthread.php?t=15223 But based on this stored all the messages are stored in the memory. if table size/number of table grows this mesage hash also grows. then memory problem comes. So we have planned to keep it in disk using ehcache technique. Please provide me the sample. Let me know is this valid option to store the objects?
Change the Map entries in DataSourceMessageSource to:
/** Cache holding already generated MessageFormats per message code and Locale
* Map
/** all messages (for all basenames) per locale
* Map
That will get you going. You also need an ehcache.xml with cache entries for each of the above. You should speicfy overflowToDisk=true.
Note that you will incur a deserialization cost. If you are seeing a high cost in cpu doing that it might be worth restructuring the code to return what you want speficically rather than a map.
Greg Luck

Resources