How Ignite stores the value on heap - caching

I am thinking of using the Data grid feature that Ignite provides. I am not clear about one aspect of Apache Ignite.
I wanted to know that whenever I put an object into an Ignite Cache ( which stores entries only on heap as no off heap is enabled ), does it serialize the object and stores it on the heap or does it stores the object as it is?
If I access the stored value from a process ( using IgniteCache#get ) running on the same JVM ( whose Heap the Ignite Value is stored in ), will Ignite first De-serialize the value and then give to my process ?
If the answer to the question is yes, then I would like to know that is there a workaround wherein I can bypass the overhead of serialization to improve the performance of my cache gets?

Ignite stores values in binary format and by default value is deserialized each time you read it.
If you set CacheConfiguration.copyOnRead, deserialized value will be shared along with binary form. This can increase read performance, but will also increase memory consumption. Also you should avoid mutating objects read this way.
Another option to avoid deserialization is to use withKeepBinary() flag. When set, cache will return BinaryObject instance instead of deserialized object. Refer to this page for more details: https://apacheignite.readme.io/docs/binary-marshaller

Related

How to directly Write in List data type in Ignite Cache on getting value from key ,Cache in the type of IgniteCache<Integer,List<Integer>>

I have my Ignite Cache in form of IgniteCache<Integer,List> cache and I have to put value in list inside IgniteCache directly but when i am trying to put value using cache.get(key).add(value) ,
it not adding value to the list inside cache directly.
Is their any way to add value directly in List inside Cache without storing it in different instance and then put that instance again in cache?
No, you have to adjust the object and save it back using put method.
You can save some time on deserialization using withKeepBinary. But that doesn't make sense for Integers.
Think of traversing your collection into a SQL model if possible.

Data health check tool

I want to perform data health check on huge volume of data, which can be either in RDBMS or cloud file storage like Amazon S3. Which tool would be appropriate for performing data health check, which can give me number of rows, rows not matching a given schema for data type validation, average volume for given time period etc?
I do not want to use any bigdata platform like Qubole or Databricks because of extra cost involved. I found Drools which can perform similar operations but it would need reading full data into memory and associate with a POJO before validation. Any alternatives would be appreciated where I do not have to load full data into memory.
You can avoid loading full data in memory by implementing the StatelessKieSession object of drools. StatelessKieSession works only on the current event it does not maintain the state of any event also does not keep objects in the memory. Read more about StatelessKieSession here.
Also, you can use Stateful KieSession and give an expiry to an event using the #expires declaration which expiries event after the specified time. Read more about #expires here.

Couchbase - Order of saving documents in memory and on disk

Does Couchbase store documents in-memory first before moving the data to filestore? Is there any configuration available to specify how long the data has to be store in-memory before it can be flushed to file store?
Couchbase architecture is Memory first\Cache thru.
You can't decide if using memory or not, and it write the data to disk as soon as possible.
Part of that is that you need to have enough memory for the amount of data you have.
You do have some policies like Full or Value eviction but again you don't have the control.
But what you can do is in the SDK wait until the data is replicated\persisted to disk.
Couchbase stores data both on disk and in RAM. The default behavior is to write the document to disk at some arbitrary time (usually quickly) after storing in RAM. This leaves a short window where node failure can result in loss of data. I can't find anything in the documentation for the current version of Couchbase, but it used to be that you could request the "set" method to only complete once the data has been persisted to disk (default is to RAM only).
In any case, after writing to RAM, the document will eventually be written to disk. Couchbase keeps a disk write queue which you can check on the metrics report page in the management console. Now, CB does synchronize writes across the cluster, and I believe a write will be synchronized across a cluster before Couchbase will acknowledge that the write happened (e.g. before the write method returns to the caller). Again, the documentation is hard to determine on this, as prior versions the documentation was much more detailed.
If you have more documents than available RAM, only the most-frequently accessed documents will be stored in RAM for quick retrieval, with all others being "evicted" to disk.

Read Cache Data from File system or diskpath

If overflowToDisk is enabled and Disk path is configured, then if data is not found in the memory should it automatically search from diskpath?
Refer the configuration mentioned
When overFlowToDisk gets activated in EHCACHE?
My case
1) Cache warm up from DB before application start
2) Load data from DB with loader implementation
3) Initially DB has 2000 data. So we have 1000 in memory (ABC_007) rest 1000 we have in the DISK.
Is this correct?
<cache name="ABC_007"
maxElementsInMemory="1000"
maxElementsOnDisk="10000"
overflowToDisk="true"
timeToIdleSeconds="..."
timeToLiveSeconds="...">
</cache>
If I search for data which is not in ABC_007, it will be retrieved from DISKPATH. Am I right on this one?
Now, if I implement Cache read through functionality that is if the data is not available in Cache (including diskpath), I should search in the DB.
Now I find the Data. Does it repopulate the Cache?
If ABC_007 still consists 1000 elements. Where it will be stored? ABC_007 or disk?
Please Correct my understandings.
For example refer the sample code
Cache cache = manager.getCache("ABC_007");
Element element = null;
String key = null;
for (int i=0 ; i<2000 ; i++) {
key = "keyInCache" + i ;
element = new Element (key , "value1");
cache.put(element);
}
Now when i cross 1000 then as per configuration , 1001 to 2000 elements will be stored in disk .
<cache name="ABC_007"
maxElementsInMemory="1000"
maxElementsOnDisk="10000"
overflowToDisk="true"
timeToIdleSeconds="..."
timeToLiveSeconds="...">
AM I RIGHT ?
Now I want the Value for the
Key = keyInCache1700
element = cache.get(key);
FROM Where I will get the Value ?
My understanding - as ABC_007 cache has maxElementsInMemory="1000" , that means it can srore upto 1000 key value in memory and value for the key keyInCache1700 will be retrieved from the Disk ...
AM I Correct ?
The answer depends on your version of Ehcache.
As of Ehcache 2.6, the storage model is no longer an overflow one but a tiered one.
In the tiered storage model, all data will always be present in the lowest tier.
Items will be present in the higher tiers based on their hotness.
Possible tiers for open source Ehcache are:
On-heap that is on the JVM heap
On-disk which is the lowest one
By definition high tiers have lower latency but less capacity than lower tiers.
So for a cache configured with overflowToDisk, all the data will always be inside the disk tier. It will store the key in memory and the data on disk.
When looking for an entry inside the cache, the tiers are considered from highest to lowest.
In your example, the data will be retrieved as follows:
Search in memory
If found, return it
Search on disk
If found, add to memory tier (hot data) and return it. This can cause another entry to be evicted from memory
Use your cache loader to retrieve it from DB
When found, add it to the cache and return it
I'm just going to outline/ summarize my rough idea of how EHCache works:
It's essentially a HashMap.
It keeps the HashMap of keys in memory, at all times.
The actual content of items can be stored either in memory, or (when there are enough items) overflow to disk.
Conclusion: My understanding is that EHCache knows what keys it has cached, and where the items are currently stored. This is a basic necessity for a cache to retrieve items quickly.
If an item is unknown to EHCache, I wouldn't expect it to go looking on disk for it.
You should definitely implement "read-thru to DB" logic, around your use of the cache. Items not found in the cache must obviously be read from the DB. Adding them to cache at that time would be expected to put them in memory, as they're currently hot (recently used).

Store objects using ehcache +Spring

My application has internationalization for all tables. So all tables has its another table for different languange support with key as language code like 'en-us'. Every time if it hits the db and to show in page then applications get slow. so We implemented by extending AbstractMessageSource class.I referred the link http://forum.springsource.org/showthread.php?t=15223 But based on this stored all the messages are stored in the memory. if table size/number of table grows this mesage hash also grows. then memory problem comes. So we have planned to keep it in disk using ehcache technique. Please provide me the sample. Let me know is this valid option to store the objects?
Change the Map entries in DataSourceMessageSource to:
/** Cache holding already generated MessageFormats per message code and Locale
* Map
/** all messages (for all basenames) per locale
* Map
That will get you going. You also need an ehcache.xml with cache entries for each of the above. You should speicfy overflowToDisk=true.
Note that you will incur a deserialization cost. If you are seeing a high cost in cpu doing that it might be worth restructuring the code to return what you want speficically rather than a map.
Greg Luck

Resources