I have a springboot project used infinispan to run the invalidation mode under the cluster for cache.
The questtion is about infinispan.
In fact, I read the official document: "In invalidation, the caches on different nodes do not actually share any data" and now I am in this situation.
I use the method a provided: Cache.putForExternalRead(key, value) and this method can solve the problem that when I puts the data into the cache of the Node A, the B node invalidates it, But I can't use the springboot annotations, such as #Cacheable.
I also read "Invalidation mode can be used with a shared cache store." from document but I don't know how to do this and I hope you can provide some help.
The goal I hope to achieve is that in the invalidation mode, I put a data into the cache of Node A, Node B will accept a copy data from A.Can I do this with invalidation mode ?
I try use invalidation mode with opening CLusterLoader but there is a risk of getting old value when node get data from other nodes.
I use replicated mode now. However, "replication practically only performs well in small clusters(under 10 nodes)" and "Asynchronous replication is not recommended".So I just can use synchronous replication.
Which performance will be better for invalidation and synchronous replication ?
Looking forward to your help. Thanks
Spring annotations won't fully support INVALIDATION mode unless you use a ClusterLoader. Under the hood annotations use put, we might consider adding a feature to support putForExternalRead behavior in the future, but it won't be there very soon.
Annotations work well with LOCAL, REPL and DIST modes.
ConfigurationBuilder b = new ConfigurationBuilder();
b.persistence()
.addClusterLoader()
.remoteCallTimeout(500);
If you are afraid about getting stale values and not being performant enough with a replicated cache, you might consider using a distributed cache.
Related
I am using the Titan/DynamoDB library to use AWS DynamoDB as a backend for my Titan DB graphs. My app is very read-heavy and I noticed Titan is mostly executing query requests against DynamoDB. I am using transaction- and instance-local caches and indexes to reduce my DynamoDB read units and the overall latency. I would like to introduce a cache layer that is consistent for all my EC2 instances: A read/write-through cache between DynamoDB and my application to store query results, vertices, and edges.
I see two solutions to this:
Implicit caching done directly by the Titan/DynamoDB library. Classes like the ParallelScanner could be changed to read from AWS ElastiCache first. The change would have to be applied to read & write operations to ensure consistency.
Explicit caching done by the application before even invoking the Titan/Gremlin API.
The first option seems to be the more fine-grained, cross-cutting, and generic.
Does something like this already exist? Maybe for other storage backends?
Is there a reason why this does not exist already? Graph DB applications seem to be very read-intensive so cross-instance caching seems like a pretty significant feature to speedup queries.
First, ParallelScanner is not the only thing you would need to change. Most importantly, all the changes you need to make are in DynamoDBDelegate (that is the only class that makes low level DynamoDB API calls).
Regarding implicit caching, you could add a caching layer on top of DynamoDB. For example, you could implement a cache using API Gateway on top of DynamoDB, or you could use Elasticache. Either way, you need to figure out a way to invalidate Query/Scan pages. Inserting/deleting items will cause page boundaries to change so it requires some thought.
Explicit caching may be easier to do than implicit caching. The level of abstraction is higher, so based on your incoming writes it may be easier for you to decide at the application level whether a traversal that is cached needs to be invalidated. If you treat your graph application as another service, you could cache the results at the service level.
Something in between may also be possible (but requires some work). You could continue to use your vertex/database caches as provided by Titan, and use a low value for TTL that is consistent with how frequently you write columns. Or, you could take your caching approach a step further and do the following.
Enable DynamoDB Stream on edgestore.
Use a Lambda function to stream the edgestore updates to a Kinesis Stream.
Consume the Kinesis Stream with edgestore updates in the same JVM as the Gremlin Server on each of your Gremlin Server instances. You would need to instrument the database level cache in Titan to consume the Kinesis stream and invalidate the cached columns as appropriate, in each Titan instance.
Wouldnt it make sense to put an on-heap cache (eg guava cache) in front of a persistent off-heap chronicle cache? So, use the chronicle cache when you get a guava cache miss?
Thanks
When a Chronicle-Map is updated either in process or via replication, you will be called by the MapEventListener which you define when you build the map.
It is a known issue that you don't get events triggered when another process updates the map, although you will be able to get the updated value.
Note: if the cost of deserialization is high, there is often something you can do to reduce this cost, such as using BytesMarshallable or a generated DataValue reference to use the data in place i.e. without deserializing it.
I'm using Infinispan 6.0.0 in a 3-node setup (distributed caching with 2 replicas for each entry, no writes into persistent store) and I'm just reading the file line-by-line and storing that lines' contents into the cache. The speed seems a bit low to me (I can achieve more writes onto the SSD (persistent storage) than into RAM with Infinispan), but there isn't any obvious bottleneck in the test code (I'm using buffered input streams, and their limits certainly aren't reached. As for now, I'm able to write 100K entries each ~45 seconds and that doesn't satisfy me. Assume simplified code snippet:
while ((s = reader.readLine()) != null) {
cache.put(s.substring(0,2), s.substring(2,5));
}
And CacheManager is created as follows:
return new DefaultCacheManager(
GlobalConfigurationBuilder.defaultClusteredBuilder()
.transport().addProperty("configurationFile", "jgroups.xml").build(),
new ConfigurationBuilder()
.clustering().cacheMode(CacheMode.DIST_ASYNC).hash().numOwners(2)
.transaction().transactionMode(TransactionMode.TRANSACTIONAL).lockingMode(LockingMode.OPTIMISTIC)
.build());
What could I be possibly doing wrong?
I am not fully aware of all the asynchronous mode specialities, but I'd afraid that something in the two-phase commit (Prepare and Commit) might force some blocking RPC => waiting for network latency => slow down.
Do you need transactional behaviour? If not, switch them off. If you really need it, you may disable just the autocommit feature and load the cluster via non-transactional operations. Or, you may try one phase commits.
Another option could be mass loading via putAll (with tens or hundreds of entries, depends on your entry size), but routing of this message is not really smart. In transactional mode it could behave a bit better, I guess.
The last option if you just want to load the cluster fast and then operate on it could be transferring the bulk data to each node without Infinispan (using your own JGroups channel, or just with sockets), and loading all nodes with the CACHE_MODE_LOCAL flag.
By default Infinispan follows the Map.put() contract of returning the previous value, so even though you are using the DIST_ASYNC cache mode you're still implicitly performing a synchronous cache.get() for every put.
You can avoid this in two ways:
configurationBuilder.unsafe().unreliableReturnValues(true) will suppress the remote lookup for all the operations on the cache.
cache.getAdvancedCache().withFlags(Flag.IGNORE_RETURN_VALUES).put(k, v) will suppress the remote lookup for a single operation.
We are using infinispan and in our system we have a big object in which we have to push small changes per transaction. I have implemented the DeltaAware interface for this object and also the Delta. The problem i am facing is that the changes are not getting propagated to other nodes and only the initial object state is prapogated to other nodes. Also the delta and commit methods are not called on the big object which implements DeltaAware. Do i need to register this object somewhere other than simply putting it in the cache ?
Thanks
It's probably better if you simply use an AtomicHashMap, which is a construction within Infinispan. This allows you to group a series of key/value pairs as a single value. Infinispan can detect changes in this AtomicHashMap because it implements the DeltaAware interface. AHM is a higher level construct than DeltaAware, and one that probably suits you better.
To give you an example where AtomicHashMaps are used, they're heavily used by JBoss AS7 HTTP session replication, where each session id is mapped to an AtomicHashMap. This means that we can detect when individual session data changes and only replicate that.
Cheers,
Galder
I'm new to MongoDB. I created a Java app using MongoDB as database.
I configured 3 servers in a replica set.
my pseudo code:
{
createUser
getUser
updateUser
}
Here createUser creates the user successfully but getUser fails to return that user in somtimes.
when I analysed it is due to the data replication latency.
How can I overcome this issue?
is there anyway to replicate data immediately when it is created?
is there any other way to get user without fail?
Thx in advance!
If you are certain that the issue is due to replication latency, one thing you can do is make sure your writes are safe and using the w flag. That way, MongoDB will wait until data is replicated to at least n nodes before returning. You can do this from the client driver as well.
MongoDB getLastError
Are you reading with slaveOk=True ? If you read from the ReplicaSet Primary, this shouldn't be an issue either.
The slaveOk property is now known as ReadPreference (.SECONDARY in this case) in newer Mongo Java driver versions. This can be set at the Mongo/DB/Collection level. Note that when you set ReadPreference at these levels, it applies for all callers (i.e. these objects are shared across threads).
Another approach is to try the ReadPreference.SECONDARY and if it fails, try without it and go to the master. This logic can be isolated to your repository layer, so the service layer doesn't have to deal with it. If you are doing this, you may want to set the ReadPreference at the DBQuery object, which is on a per-use basis.
I am not familiar with Java driver, but there are w and j options.
The w option confirms that write operations have replicated to the specified number of replica set members, including the primary.
The j will confirm the write operation only after it has written the operation to the journal.
It looks like you need to use WriteConcern.