I was reading about Janusgraph Cache in Janusgraph documentation.
I have few doubts Regarding the transaction cache. I'm using an embedded janusgrah server in my application.
If I'm only doing a read query for eg. - g. V().has("name","ABC") using gremlin HTTP endpoint, so will this value be cached in transaction cache or database level cache? because here I'm not opening any transaction.
If it is stored in the transaction cache, how updated values will be fetched for this vertex if I have multi-node deployment?
Regarding question 1:
If not created explicitly, transactions are created automatically. From the JanusGraph reference docs:
Every graph operation in JanusGraph occurs within the context of a transaction. According to the TinkerPop’s transactional specification, each thread opens its own transaction against the graph database with the first operation (i.e. retrieval or mutation) on the graph.
A vertex retrieved during a transaction is stored in both the transaction cache and database cache. After closing the transaction the vertex is still in the database cache (but note that since janusgraph-0.5.x the database cache is disabled by default).
Regarding question 2:
Indeed, a JanusGraph instance cannot know about modifications to vertices in the transaction caches of other instances. Only after these transactions have been closed and persisted to the storage and index backends, other instances can read modified vertices from the the backends. This also means that caches in other JanusGraph instances can be out of date, so if you want to be sure that you have the latest data from the backends, you should start a new transaction and disable the database cache (default setting).
Additional comments (added Sept 12):
The vertex caches are private members of JanusGraph and nowhere exposed to the user (not even in the debug logging). Cache hits in a traversal are only visible from a fast (sub-millisecond) return time.
If data consistency between transactions or janusgraph instances matters to you, you can take a look at:
https://docs.janusgraph.org/v0.4/advanced-topics/eventual-consistency/#data-consistency
the new CacheVertex::refresh feature in janusgraph-0.6.0 (still undocumented).
Related
So my understanding around hibernate first level cache was that it is around sessions and transactions. Items remain in the cache during a transaction, but then once a transaction is closed ie request fulfilled it will clean/evict items.
But I wondered if that is wrong does the first level cache keep items after a request has been fulfilled and subsequent GET API requests go to the cache. Is there a time limit when it evicts objects from the cache.
This is in Spring boot.
Your description of the first level cache is correct. It's per session/transaction. After the transaction is finished, the objects are left to be garbage collected.
To cache entities across sessions one needs to use the second level cache.
Using this can become a bit tricky for applications with multiple instances; depending how the application is built, one might need to use a distributed cache to have the cache in sync across instances of the application.
I have persistent cache store
When I put data into the cache, I see in logs that datastore put operation can be called on any cluster node
Is it correct behavior?
I ask because I expect that put to the datastore should be called on the same node where put into the cache occurred.
Yes, this is correct for ATOMIC caches which update persistence store from primary nodes. TRANSACTIONAL cache, on the other hand, will update from the node that runs the transaction. This way it's possible to maintain underlying DB transaction to maintain full consistency between cache and DB.
We are using Oracle db, we would like to use Redis Cache mechanism, We add some subset of DB data to cache, does it sync with DB automatically when there is a change in the data in DB or we will have to implement the sync strategy, if yes, what is the best way to do it.
does it sync with DB automatically when there is a change in the data in DB
No, it doesn't.
we will have to implement the sync strategy, if yes, what is the best way to do it.
This will depend on your particular case. Usually caches are sync'd in two common ways:
Data cached with expiration. Once cached data has expired, a background process adds fresh data to cache, and so on. Usually there's data that will be refreshed in different intervals: 10 minutes, 1 hour, every day...
Data cached on demand. When an user requests some data, that request goes through the non-cached road, and that request stores the result in cache, and a limited number of subsequent requests will read cached data directly if cache is available. This approach can fall into #1 one too in terms of cache invalidation interval.
Now I believe that you've enough details to think about what could be your best strategy in your particular case!
Additionally to what mathias wrote, you can look ath the problem from dynamic/static perspective:
Real/Time approach: each time a process changes the DB data, you dispatch an event or a message to a queue where a worker handles corresponding indexing of the cache. Some might event implement it as a DB Trigger (I don't like)
Static/delayed approach: Once a day/hour/minute.. depending on your needs there is a process that does a batch/whole indexing of the DB data to the cache.
If I have an ATG Nucleus Repository Item that is not cacheable (ATG/Nucleus simple cache is disabled) AND I'm not in a transaction, the following results in two queries to the database.
The following code results in a db query for every property.
repositoryItem.getPropertyValue("columnA");
repositoryItem.getPropertyValue("columnB");
If debugging for the user entity is enabled you would see the following log statements ever each call:
repositoryItem.getPropertyValue("columnA");
DEBUG loadingPropertyFromDatabase(user:ID_1.columnA, column_a_value) property is not cacheable caching disabled for this transaction
DEBUG loadingPropertyFromDatabase(user:ID_1.columnB, column_b_value) property is not cacheable caching disabled for this transaction
DEBUG getPropertyValue(user:ID_1.columnA) -> "column_a_value" (value from database)
repositoryItem.getPropertyValue("columnB");
DEBUG loadingPropertyFromDatabase(user:ID_1.columnA, column_a_value) property is not cacheable caching disabled for this transaction
DEBUG loadingPropertyFromDatabase(user:ID_1.columnB, column_b_value) property is not cacheable caching disabled for this transaction
DEBUG getPropertyValue(user:ID_1.columnB) -> "column_b_value" (value from database)
We cannot enable caching, due to how the object is being access/updated by other systems.
I also do not want to create a transaction for a read only query of the entity.
If I was using Hibernate, the Hibernate session would keep a state within the session, even if I was not in a transaction. That doesn't seem to be the case with ATG/Nucleus. Is there any way I can get this type of behavior or a thread level cache?
In looking at documentation and walking through the code via debugger (which is difficult w/out source), I am not having any luck finding a work around.
Thanks!
You need to wrap the getPropertyValue calls with a transaction which will save the results of the database queries into the temporary transaction cache. That will prevent the repository from going back to the database for every getPropertyValue call.
You also want to ensure that all the properties you are accessing are part of the same property group (as described here). The first load of the item from the database will pull in the properties in the same group as the ID property. This combined with the transaction cache will significantly reduce the number of database queries.
I also do not want to create a transaction for a read only query of
the entity.
I don't understand why you wouldn't want to explicitly demarcate a transaction. Every getPropertyValue call will automatically create (and end) a transaction if one isn't already present. So in your example, you would have 2 transactions implicitly created for you. Why not just create 1 transaction explicitly?
I am trying to evaluate Terracotta Disctributed Cache with ehcache. I have the following query. There are 20+ apps which will use a TAS distributed cache. As I understand there will be a L1 cache in each of these apps and a L2 in the cluster. The cluster cache data is fronting a Database which will be updated by a different app which we do not have access to. So we only read from this DB. But the DB updates needs to flow to the cache.
By the way of DB triggers the updated (keys alone) are stored in a temp table. In specific intervals a job monitors this table and collects the keys in the cache that needs to be expired. This is a separate batch job.
From here I need help. How do I inform the TAS L2 cache to expire/evict these keys? What options in terracotta are there?. Will this expiry event flow from L2 to all the individual apps? What is the time lag? I do not want to send the expiry keys to all the individual apps. Can this be accomplished?.
Thanks for the help!
Maybe I am missing something, but I am not sure why you would want to expire/evict those keys instead of simply calling cache.removeAll(keys). This removal will be automatically propagated to all L1 nodes which have those entries in their local cache.
The time lag depends on the consistency settings of the distributed cache.