JCS - Dynamic update of cache from database - caching

I maintain an application which leverage JCS to hold the cache in JVM (JVM1). This data will be loaded from a database for the first time when the JVM gets started/ restarted.
However the database will be accessed from a different JVM (JVM2) and will help adding data to database.
In order to make sure this additional/ newly added records loaded into cache, we need to restart JVM1 for every addition in the database.
Is there a way we can refresh/load the cache (only for newly added records) in JVM1 for regular intervals (instead of frequent db polling)?
Thanks,
Jaya Krishna

Can you not simply have JVM1 first check the in memory cache, and then, if the item is absent in the in-memory cache, check the database cache?
If you, however, need to list all items in existance, of some certain type, and don't want to access the database. Then, for JVM1 to know that there's a new item in the databse, I suppose that either 1) JVM2 would have to send a network message to JVM1 telling it that there're new entries in the database. Or 2) there could be a database trigger that fires when new data is inserted, and sends a network message to JVM1. (But having the database send network messages to an application server feels rather weird I think.) — I think these approaches seem rather complicated though.
Have you considered some kind of new-item-ids table, that logs the IDs of items recently inserted into the database? It could be updated by a database trigger, or by JVM1 and 2 when they write to the databse. Then JVM1 would only need to poll this single table perhaps once per second, to get a list of new IDs, and then it could load the new items from the database.
Finally, have you considered a distributed cache? So that both JVM1 and 2 share the same cache, and JVM1 and 2 writes items to this cache when they insert them into the datbase. (This approach would be somewhat similar to sending network messages between JVM1 and 2, but the distributed cache system would send the messages itself, so you didn't need to write any new code)

Related

How would Redis get to know if it has to return cached data or fresh data from DB

Say, I'm Fechting thousands or record using some long runing task from DB and caching it using Redis. Next day somebody have changed few records in DB.
Next time how redis would know that it has to return cached data or again have to revisit that all thousands of records in DB?
How this synchronisation achived?
Redis has no idea whether the data in DB has been updated.
Normally, we use Redis to cache data as follows:
Client checks if the data, e.g. key-value pair, exists in Redis.
If the key exists, client gets the corresponding value from Redis.
Otherwise, it gets data from DB, and sets it to Redis. Also client sets an expiration, say 5 minutes, for the key-value pair in Redis.
Then any subsequent requests for the same key will be served by Redis. Although the data in Redis might be out-of-date.
However, after 5 minutes, this key will be removed from Redis automatically.
Go to step 1.
So in order to keep your data in Redis update-to-date, you can set a short expiration time. However, your DB has to serve lots of requests.
If you want to largely decrease requests to DB, you can set a large expiration time. So that, most of time, Redis can serve the requests with possible staled data.
You should consider carefully about the trade-off between performance and staled data.
Since the source of truth resides on your Database and you push data from this DB to Redis, you always have to update from DB to Redis, at least you create another process to sync data.
My suggestion is just to run a first full update from DB to Redis and then use a synch process which every time you notice update/creation/deletion operation in your database you pull it to Redis.
I don't know which Redis structure are you using to store database records in Redis but I guess it could be a Hash, probably indexed by your table index so the sync operation will be immediate: if a record is created in your database you set a HSET, if deletion HDEL and so on.
You even could omit the first full sync from DB to Redis, and just clean Redis and start the sync process.
If you cannot do the above for some reason you can create a syncher daemon which constantly read data from the database and compare them with the data store in Redis if they are different in somehow you update or if they don't exist in some of both sides you can delete or create the entry in Redis.
My solution is:
When you are updating, deleting or adding new data in database, you should delete all data in redis. In your get route, you should check if data exists. If not, you should store all data to redis from db.
you may use #CacheEvict on any update/delete applied on DB. that would clear up responding values from cache, so next query would get from DB

Will Caching be useful when we need multiple items in one go

We are working on a ecom site, where admin can store some configuration on the combination of Product-Category-manufacturer or on Product-Category.
We have some reports, which can return 10000 Product's transactions (with 100-1000 unique combination of product-category-manufacturer ).
In this report, we also need to use configuration as well.
One option could be to fetch configurations from the same stored procedure for all unique Product-Category-manufacturer.
Another option could be to cache all these combination in some outproc cache (like redis). And once transaction data is fetched from stored procedure, system will pull the data from cache for all 1000 Product-Category-Feature combinations. But in this case, we will have to request cache 1000 times and if some of keys are not found in cache, we will have to hit database.
In fact there can be some combination where data does not exist in database. If we request for these combination, system will not find it in cache, and it will have to hit database every-time. To resolve this, we will have to form a set of all the Product-Category-Feature combination where there is data available in cache.
Could anybody suggest that if cache will be useful in this case?
We use caching mainly in 2 occasions,
To Reduce latency: Cache is closer to the client it takes less time for the resource to reach the client.
To Reduce network traffic: Most of the time we see that some resources are reusable but always fetch from original source which
is costly and make more unnecessary traffic. Adding a cache layer
solves this.
So to answer your question, "Will Caching be useful when we need multiple items in one go?" You have to think on the above 2 points. How much you are reusing (cache hit percentage). And cost difference between cache call and call to original source.
If your issue is getting 1000 items at once, Redis don't have issue providing that. It will be so much faster than the transnational DB. And you can have set of all the Product-Category-Feature combinations, its better as we will no have cache misses. However think about the size of the Redis DB, before you proceed.

Laravel Redis Caching

I'm now working on a big project,we decided to use redis as cache in our system so,when we put some data in the cache and then the original data is changed,how could we know ? and what is the best practice in this case ? to delete the old data and replace the new one ? Is there any mechanism to replace just the changed part ?
Few things to keep in mind for caching for a large application using redis :
1) localise your cache as much as you can. For example if you have 5 information for every user that needs to be cached. Instead of accessing them all together make simple cache for each info.
2) choose the right data structure. Use redis' set, hash, sorted set and bit operations wherever possible.
3) make sure your system will work even if redis is not available (to overcome downtime). That is, check in redis if it's there serve, if not get from dB and populate in cache. So that even If redis is not available you will get values from DB
To answer your question, You can do it in three ways
1) you can maintain cache alongside your DB. During on success of transaction in the DB update the cache. So that you will not loose any information. But implementing this is bit difficult
2) whenever a transaction begins drop cache belongs to that. So that the values in the cache will be removed and will be fetched from DB during the successive read request.
3) maintain a last accessed or created time in both cache and DB. During every read compare them and decide. This is the most reliable solution.

Terracotta L2 cache invalidation by messaging

I am trying to evaluate Terracotta Disctributed Cache with ehcache. I have the following query. There are 20+ apps which will use a TAS distributed cache. As I understand there will be a L1 cache in each of these apps and a L2 in the cluster. The cluster cache data is fronting a Database which will be updated by a different app which we do not have access to. So we only read from this DB. But the DB updates needs to flow to the cache.
By the way of DB triggers the updated (keys alone) are stored in a temp table. In specific intervals a job monitors this table and collects the keys in the cache that needs to be expired. This is a separate batch job.
From here I need help. How do I inform the TAS L2 cache to expire/evict these keys? What options in terracotta are there?. Will this expiry event flow from L2 to all the individual apps? What is the time lag? I do not want to send the expiry keys to all the individual apps. Can this be accomplished?.
Thanks for the help!
Maybe I am missing something, but I am not sure why you would want to expire/evict those keys instead of simply calling cache.removeAll(keys). This removal will be automatically propagated to all L1 nodes which have those entries in their local cache.
The time lag depends on the consistency settings of the distributed cache.

Caching strategy suggestions needed

We have a fantasy football application that uses memcached and the classic memcached-object-read-with-sql-server-fallback. This works fairly well, but recently I've been contemplating the overhead involved and whether or not this is the best approach.
Case in point - we need to generate a drop down list of the users teams, so we follow this pattern:
Get a list of the users teams from memcached
If not available get the list from SQL server and store in memcached.
Do a multiget to get the team objects.
Fallback to loading objects from sql store these.
This is all very well - each cached piece of data is relatively easily cached and invalidated, but there are two major downsides to this:
1) Because we are operating on objects we are incurring a rather large overhead - a single team occupies some hundred bytes in memcached and what we really just need for this case is a list of team names and ids - not all the other stuff in the team objects.
2) Due to the fallback to loading individual objects, the number of SQL queries generated on an empty cache or when the items expire can be massive:
1 x Memcached multiget (which misses, which and causes)
1 x SELECT ... FROM Team WHERE Id IN (...)
20 x Store in memcached
So that's 21 network request just for this one query, and also the IN query is slower than a specific join.
Obviously we could just do a simple
SELECT Id, Name FROM Teams WHERE UserId = XYZ
And cache that result, but this this would mean that this data would need to be specifically invalidated whenever the user creates a new team. In this case it might seem relatively simple , but we have many of these type of queries, and many of them operate on axes that are not easily invalidated (like a list of id and names of the teams that your friends have created in a specific game).
Sooo.. My question is - do any of you have ideas for resolving the mentioned drawbacks, or should I just accept that there is an overhead and that cache misses are bad, live with it?
First, cache what you need, maybe that two fields, not a complete record.
Second, cache what you need again, break the result set into records and cache them seperately
about caching:
You generally use caching to offload the slower disc-based storage, in this case mysql. The memory cache scales up rather easily, mysql scales less easy.
Given that, even if you double the cpu/netowork/memory usage of the cache and putting it all together again, it will still offload the db. Adding another nodejs instance or another memcached server is easy.
back to your question
You say its a user's team, you could go and fetch it when the user logs-in, and keep it updated in cache while the user changes it throughout his session.
I presume the team member's names do not change, if so you can load all team members by id,name and store those in cache or even local on nodejs, use the same fallback strategy as you do now. Only step 1 and 2 and 4 will be left then.
personally i usually try to split the sql results into smaller ready-made pieces and cache those, and keep the cache updated as long as possible, untimately trying to use mysql only as storage and never read from it
usually you will run some logic on the returned rows form mysql anyways, theres no need to keep repeating that.

Resources