Does Custom CacheStore for Ignite automatically do write-behind? - elasticsearch

I have implemented a Custom CacheStore for Ignite to communicate with Elastic Search. Now, if ElasticSearch server is down for some time, will the data in the cache be uploaded to Elastic Search once the ES server is up?

Does Custom CacheStore for Ignite automatically do write-behind?
No. It is disabled by default: https://apacheignite.readme.io/docs/3rd-party-store#section-configuration
setWriteBehindEnabled(boolean) | Sets flag indicating whether write-behind is enabled. | false
Now, if ElasticSearch server is down for some time, will the data in the cache be uploaded to Elastic Search once the ES server is up?
No. Ignite will not send that data again. It is also specified in documentation:
Performance vs. Consistency
Enabling write-behind caching increases performance by performing
asynchronous updates, but this can lead to a potential drop in consistency as
some updates could be lost due to node failures or crashes.

If you enable writeBehind in your CacheConfiguration, Ignite will automatically add writeBehind functionality to your Cache Store. It will wrap your Cache Store with GridCacheWriteBehindStore which implements all the delaying and batching functionality.
So yes, Ignite will automatically do write-behind if you enable it in config.

There is a write-behind mode, that doesn't write entries to the cache store once they are written to Ignite, but postpones them for some time, and then writes accumulated entries in a batch. When the write behind queue size reaches CacheConfiguration#writeBehindFlushSize, further operations are blocked until the queue is flushed. If the underlying database is unavailable for some time, then write operations will be retried until they succeed.
Write-behind documentation: https://apacheignite.readme.io/docs/3rd-party-store#section-write-behind-caching
As opposed to this mode, there is a write-through mode, that makes all operations go to the cache store once they are performed. If cache store fails to save the entry, the operation itself is rolled back.
Write-through documentation: https://apacheignite.readme.io/docs/3rd-party-store#section-read-through-and-write-through

I think that since you are implementing a custom CacheStore, then you need to handle it on your own.
At the same time you may take a look at Apache Ignite write-behind mechanism that will accumulate updates in a queue and send them when required. In theory if you have enough memory for a large queue then it can help you to survive ElasticSearch server downtime. But bear in mind that with write-behind Ignite won't provide consistency guaranties.

Related

Difference between in-memory-store and managed-store in mule cache

What are the main differences between in-memory-store and managed-store in mule cache scope and which gives a best performance.
What is the best way to configure caching in global scope?
We are currently using in-memory-store caching. We are always getting issues with memory outage as we are using a server with less HW configurations. We are using mule 3.7v.
Please provide your suggestions to configure cache in optimized way.
We are facing issue with cache expiration with in-memory-store. cache date is not being expunged after expiration time also. But when we use "managed-store" its working as expected.
Below is my configuration:
In-memory:
This store the data inside system memory. The data stored with In-memory is non-persistent which means in case of API restart or crash, the data been cached will be lost.
Managed-store:
This stores the data in a place defined by ListableObjectStore. The data stored with Managed-store is persistent which means in case of API restart or crash, the data been cached will no be lost.
Source (explained in detail with configuration difference):
http://www.tutorialsatoz.com/caching-in-mule-cache-scope/
One of my friend clearly explained me this difference as follows:
in-memory cache--> It is a temperoy memory storage area where it will store the data. for example: Consider using a VM component in Mule, the data will be stored in VM in the form of in-memory queue
in the case of Managed store--> we can store the data and use it in later stages. example: object store
mainly cache will store the frequently used data. It will reduce the db or http calls by saving the frequently used data or results in cache scope.
But both are for temporary storage only, means they are valid for that particular session alone.

JBoss Data Grid library mode with multiple applications sharing caches - what is efficient way

What is the most efficient way of having Infinispan/JBoss Data Grid in library mode with several applications using same caches?
I currently setup JBoss Data Grid in library mode in EAP 6.3, have about 10 applications and 6 different caches configured.
Cache mode is Replication.
Each application has a cache manager which instantiates the caches that are required by the application. Each cache is used by at least 2 applications.
I hooked up hawtio and can see from JMX beans that multiple cache managers are created with duplicated cache instances.
From the logs, I see:
ISPN000094: Received new cluster view: [pcu-18926|10] (12) [pcu-18926, pcu-24741, pcu-57265, pcu-18397, pcu-26495, pcu-56892, pcu-59913, pcu-53108, pcu-34661, pcu-43165, pcu-32195, pcu-28641]
Does it have a lot of overhead in cache managers talking to each other all the time?
I eventually want to setup 4 cluster nodes with JBoss data grid in library mode so how can I configure so that all applications in one node share same cache manager hence reducing noise?
I can't use JBoss data grid in Server mode which I am aware will fulfil my requirements.
Thanks for any advice.
First of all, I may misunderstand your setup: this log says that there are 10 'nodes'. How many servers do you actually use? If you use the cache to communicate with 10 applications on the same machine, it's a very suboptimal approach; you keep 10 copies of all data and use many RPC to propagate writes between the caches. You should have single local-mode cache and just retrieve a reference to it (probably through JNDI).
Cache managers don't talk to each other, and caches do only when there is an executed operation, or when a node is joining/leaving/crashing (then the caches have to rebalance).
It's JGroups channel that keeps the view and exchanges some messages to detect if the other nodes are alive or other synchronizing messages, but this kind of messages is send once every few seconds, so this has a very low overhead.
On the other hand, each channel keeps several threadpools, and cache manager has a threadpool as well, so there is some memory overhead. From CPU point of view, there is a thread that iterates through the cache and purges expired entries (the task is started every minute), so even with idle cache full of entries some cycles are consumed. If the cache is empty, this has very low consumption (there's not much to iterate through).

Enlisting a Infinispan Cache Store in a Cache Transaction?

I am using Infinispan 6.0.2 via the Wildfly 8.2 sub-system. I have configured a transactional cache that uses a String Based JDBC Cache Store to persist content placed in the infinispan cache.
My concern is that after reading the following in the Infinispan documentation that there is potential for the cache and cache store to become out of sync when putting/updating/removing multiple entries into the cache in the same transaction due to the transaction committing/rolling-back in the cache but only partial succeeding/failing in the cache store.
4.5. Cache Loaders and transactional caches
When a cache is transactional and a cache loader is present, the cache loader won’t be enlisted in the transaction in which the cache is part. That means that it is possible to have inconsistencies at cache loader level: the transaction to succeed applying the in-memory state but (partially) fail applying the changes to the store. Manual recovery would not work with caches stores.
Could some one please clarify if the above statement only refers to loading from a cache store if it also refers to writing to a store as well.
If this is also the case when writing to a cache store are there any recommended strategies/solutions for ensuring a cache and cache store remain in sync?
The driving factors behind this for me is that I am using Infinispan both for write-through and over-flow of business critical data and need confidence that the cache store correctly represents the state of the data.
I have also asked this question on the Infinispan Forums
Many thanks in advance.
It applies to writes as well, failure to write to the store does not affect rest of the transaction.
The reason for this is that the actual persistence API is not transactional (edit: newer versions of Infinispan support transactional persistence, too). Therefore, with 2-phase commits (in first phase - prepare - all locks are acquired, in second one - commit - the write is executed) the write to the store is executed in the second phase. Therefore, the failure cannot rollback changes on different nodes.
Although Infinispan is trying to get close to strongly consistent in-memory database, it is still rather a cache given the guarantees. If you are more interested in the design limitations (and some of them also theoretical limitations), I recommend reading Infinispan wiki.

what happens when a new ehcache cachemanager is created?

In my application I use ehcache with several caches that are backed by a terracotta server.
I noticed that there is a correlation between the size of the data that is saved in the server and the time it takes to create a cache manager instance in the client (the bigger the size the longer it takes).
I couldn't find any info about what actually happens when the cache manager is created.
To my understanding, the data would only be pulled when it is actually requested and not when creating the manager, so what is the overhead?
Any thoughts or references to relevent reading would be much appreciated.
First of all, CacheManager is not related to any data pushing or pulling, it create the caches which contains the elements as name value pairs and holds the data for put/get and other operations. Actually CacheManager do creation, access and removal of Caches.
In-fact when you create a CacheManager that has caches that participates in the terracotta cluster you might see a difference in the time it loads up. The cache manager will establish a connection to the server specified in the config. If there is any pre cache loaders like classes extending BootstrapCacheLoader will affect the load time too. The cache consistency attribute in caches that participate in the cluster has also impact on the load time. Terracotta server by design will push the most hit data to clients in order to reduce cache misses on local and also if the cache is identified for pinning.

Distributed, persistent cache using EHCache

I currently have a distributed cache using EHCache via RMI that works just fine. I was wondering if you can include persistence with the caches to create a distributed, persistent cache.
Alongside this, if the cache was persistent, would it load from the file store, then bootstrap from the cache cluster? Basically, what I want is:
Cache starts
Cache loads persistent objects from the file store
Cache joins the distruted cluster and bootstraps as normal
The usecase behind this is having 2 identical components running on independent machines, distributing the cache to avoid losing data in the event that one of the components fails. The persistence would guard against losing all data on the rare occasion that both components fail.
Would moving to another distribution method (such as Terracotta) support this?
I would take a look at the write-through caching options in EHCache. As described in the link, combining a read-through and write-behind cache will provide persistence to a user-defined data store.
What Terracotta gives you is consistency (so you don't have to worry about resolving conflicts among cluster members). You have the option of defining an interface to your own store (through CacheLoader and CacheWriter or just letting Terracotta persist your data, but I have received mixed signals from Terracotta and documentation on whether TC is appropriate for a system-of-record. If your data is transient and can be blown away at any time (like for web sessions) it might be OK.
Adding bootstrapCacheLoaderFactory element along with cacheEventListenerFactory to the Cache(which which needs to bootstrapped from other nodes when it is down & replicates with other nodes if that node got any updates)
memoryStoreEvictionPolicy="LFU"
diskPersistent="true"
timeToLiveSeconds="86400"
maxElementsOnDisk="1000">

Resources