How is memcached updated? - caching

I have never used memcached before and I am confused on the following basic question.
Memcached is a cache right? And I assume we cache data from a DB for faster access. So when the DB is updated who is responsible to update the cache? Our code is does memcached "understand" when the DB has been updated?

Memcached is a cache right? And I assume we cache data from a DB for
faster access
Yes it is a cache, but you have to understand that a cache speed up the access when you are often accessing same data. If you access thousand times data/objects which are always different each other a cache doesn't help.
To answer your question:
So when the DB is updated who is responsible to update the cache?
Always you but you don't have to worry about if you are doing the right thing.
Our code is does memcached "understand" when the DB has been updated?
memcached doesn't know about your database. (actually the client doesn't know even about servers..) So when you use an object of your database you should check if is present in cache, if not you put in cache otherwise you are fine.. that is all. When the moment comes memcache will free the memory used by old data, or you can tell memcached to free data after a time you choose(read the API for details).

You are responsible to update the cache (or some plugin).
What happens is that the query is compressed to some key features and these are hashed. This is tested against the cache. If the value is in the cache, the data is returned directly from cache. Otherwise the query is performed, stored in cache and returned to the user.
In pseudo code:
key = query_key(your_sql_query)
if key in cache:
return cache.get(key)
else:
results = execute(your_sql_query)
cache.set(key, results, time_to_live)
return results.
The cache is cleared once in a while, you can give a time to live to a key, then your cached results are refreshed.
This is the most simple model, but can cause some inconsistencies.

One strategy is that if your code is also the only app that updates data, then your code can also refresh memcached as a second step after it has updated the database. Or at least evict the stale data from memcached, so the next time an app wants to read it, it will be forced to re-query the current data from the database and restore that latest data to memcached.
Another strategy is to store data in memcached with an expiration time, so memcached automatically purges that data element after a certain time. You pick the expiration time, based on your knowledge of how frequently the data might be updated, and how tolerant your app is of reading stale data.
So ultimately, you are the one responsible for putting data into memcached. Only you know what data is worth storing in the cache, what format you want to store it in, how frequently you expect to query it, and when to refresh it. You make this judgment on a case-by-case basis, because you know better than any automatic system the likely behavior of your data and your app.

Related

How would Redis get to know if it has to return cached data or fresh data from DB

Say, I'm Fechting thousands or record using some long runing task from DB and caching it using Redis. Next day somebody have changed few records in DB.
Next time how redis would know that it has to return cached data or again have to revisit that all thousands of records in DB?
How this synchronisation achived?
Redis has no idea whether the data in DB has been updated.
Normally, we use Redis to cache data as follows:
Client checks if the data, e.g. key-value pair, exists in Redis.
If the key exists, client gets the corresponding value from Redis.
Otherwise, it gets data from DB, and sets it to Redis. Also client sets an expiration, say 5 minutes, for the key-value pair in Redis.
Then any subsequent requests for the same key will be served by Redis. Although the data in Redis might be out-of-date.
However, after 5 minutes, this key will be removed from Redis automatically.
Go to step 1.
So in order to keep your data in Redis update-to-date, you can set a short expiration time. However, your DB has to serve lots of requests.
If you want to largely decrease requests to DB, you can set a large expiration time. So that, most of time, Redis can serve the requests with possible staled data.
You should consider carefully about the trade-off between performance and staled data.
Since the source of truth resides on your Database and you push data from this DB to Redis, you always have to update from DB to Redis, at least you create another process to sync data.
My suggestion is just to run a first full update from DB to Redis and then use a synch process which every time you notice update/creation/deletion operation in your database you pull it to Redis.
I don't know which Redis structure are you using to store database records in Redis but I guess it could be a Hash, probably indexed by your table index so the sync operation will be immediate: if a record is created in your database you set a HSET, if deletion HDEL and so on.
You even could omit the first full sync from DB to Redis, and just clean Redis and start the sync process.
If you cannot do the above for some reason you can create a syncher daemon which constantly read data from the database and compare them with the data store in Redis if they are different in somehow you update or if they don't exist in some of both sides you can delete or create the entry in Redis.
My solution is:
When you are updating, deleting or adding new data in database, you should delete all data in redis. In your get route, you should check if data exists. If not, you should store all data to redis from db.
you may use #CacheEvict on any update/delete applied on DB. that would clear up responding values from cache, so next query would get from DB

how to keep caching up to date

when memecached or Redis is used for data-storage caching. How is the cache being updated when the value changed?
For, example. If I read key1 from cache the first time and it missed, then I pull value1 and put key1=value1 into cache.
After that if the value of key1 changed to value2.
How is value in cache updated or invalidated?
Does that mean whenever there is a change on key1's value. Either the application or database need to check if this key1 is in cache and update it?
Since you are using a cache, you have to tolerate the data inconsistency problem, i.e. at some time point, data in cache is different from data in database.
You don't need to update the value in cache, whenever the value has been changed. Otherwise, the whole cache system will be very complicated (e.g. you have to maintain a list of keys that have been cached), and it also might be unnecessary to do that (e.g. the key-value might be used only once, and no need to update it any more).
How can we update the data in cache and keep the cache system simple?
Normally, besides setting or updating a key-value pair in cache, we also set a TIMEOUT for each key. After that, client can get the key-value pair from the cache. However, if a key reaches the timeout, the cache system removes the key-value pair from the cache. This is called THE KEY HAS BEEN EXPIRED. The next time, the client trying to get that key from cache, will get nothing. This is called CACHE MISS. In this case, client has to get the key-value pair from database, and update it to cache with a new timeout.
If the data has been updated in database, while the key has NOT been expired in cache, client will get inconsistent data. However, when the key has been expired, its value will be retrieved from database and inserted into cache by some client. After that, other clients will get updated data until the data has been changed again.
How to set the timeout?
Normally, there're two kinds of expiration policy:
Expire in N seconds/minutes/hours...
Expire at some future timepoint, e.g. expire at 2017/7/30 00:00:00
A large timeout can largely reduce the load of database, while the data might be out-of-date for a long time. A small timeout can keep the data up-to-date as much as possible, while the database will have a heavy load. So you have to balance the trade-off when designing the timeout.
How does Redis expire keys?
Redis has two ways to expire keys:
When client tries to operate on a key, Redis checks if the key has reached the timeout. If it does, Redis removes the key, and acts as if the key doesn't exist. In this way, Redis ensures that client doesn't get expired data.
Redis also has an expiration thread that samples keys at a configured frequency. If the keys reach the timeout, Redis removes these keys. In this way, Redis can accelerate the key expiration process.
You can simply empty the particular cache value in the api function where insertion or updation of that particular value is performed. This way the server will fetch the updated value in the next request because you had already emptied the cache value.
Here is a diagram which will make it easier for you to understand:
I had similar issue related to stale data esp. in two cases:
When i get bulk messages/events
In this (my) use case, I am writing score to Redis cache and reading it again in subsequent call. In case of bulk messages, due to weak consistency in Redis, data might not be replicated to all replicas when I request again to read the data against same key(which is generally few ms(1-2 ms).
Remediation:
In this case, I was getting stale data. In order to address that, used cache on cache i.e. Loading TTL cache on Redis Cache. Here, it used to check the data in loading cache first, if not present, it checks data in Redis cache. Once done, both the caches are being updated.
in distributed system(k8s) where I have multiple pods
(kafka is being used as messaging broker)
When went for above strategy, we have another problem, what if data for a key previously served by say pod1, reaches to pod2. This has bigger impact, as it leads to data inconsistencies.
Remediation:
Here kafka partition key was set as "key" which is set in Redis. This way, we are getting subsequent messages to a particular pod only. In case of restart of pods, cache will be build again.
This solved our problem.

what is the best strategy to sync data between DB and redis cache

We are using Oracle db, we would like to use Redis Cache mechanism, We add some subset of DB data to cache, does it sync with DB automatically when there is a change in the data in DB or we will have to implement the sync strategy, if yes, what is the best way to do it.
does it sync with DB automatically when there is a change in the data in DB
No, it doesn't.
we will have to implement the sync strategy, if yes, what is the best way to do it.
This will depend on your particular case. Usually caches are sync'd in two common ways:
Data cached with expiration. Once cached data has expired, a background process adds fresh data to cache, and so on. Usually there's data that will be refreshed in different intervals: 10 minutes, 1 hour, every day...
Data cached on demand. When an user requests some data, that request goes through the non-cached road, and that request stores the result in cache, and a limited number of subsequent requests will read cached data directly if cache is available. This approach can fall into #1 one too in terms of cache invalidation interval.
Now I believe that you've enough details to think about what could be your best strategy in your particular case!
Additionally to what mathias wrote, you can look ath the problem from dynamic/static perspective:
Real/Time approach: each time a process changes the DB data, you dispatch an event or a message to a queue where a worker handles corresponding indexing of the cache. Some might event implement it as a DB Trigger (I don't like)
Static/delayed approach: Once a day/hour/minute.. depending on your needs there is a process that does a batch/whole indexing of the DB data to the cache.

Laravel Redis Caching

I'm now working on a big project,we decided to use redis as cache in our system so,when we put some data in the cache and then the original data is changed,how could we know ? and what is the best practice in this case ? to delete the old data and replace the new one ? Is there any mechanism to replace just the changed part ?
Few things to keep in mind for caching for a large application using redis :
1) localise your cache as much as you can. For example if you have 5 information for every user that needs to be cached. Instead of accessing them all together make simple cache for each info.
2) choose the right data structure. Use redis' set, hash, sorted set and bit operations wherever possible.
3) make sure your system will work even if redis is not available (to overcome downtime). That is, check in redis if it's there serve, if not get from dB and populate in cache. So that even If redis is not available you will get values from DB
To answer your question, You can do it in three ways
1) you can maintain cache alongside your DB. During on success of transaction in the DB update the cache. So that you will not loose any information. But implementing this is bit difficult
2) whenever a transaction begins drop cache belongs to that. So that the values in the cache will be removed and will be fetched from DB during the successive read request.
3) maintain a last accessed or created time in both cache and DB. During every read compare them and decide. This is the most reliable solution.

Is it normal to have a lot of records in Memached with Laravel?

I have an instance of Laravel up and running with a load balancer in place. We've setup memcached (two server nodes) to handle session management. So far the site is running fine in our test environment. The site largely ties into a web based API, so we only store a few values (other than user authentication data) in a user's session to work with the site.
After a short amount of usage by one or two users, there are about 3000 items in the cache. I don't have full access to the nodes, so I don't know exactly what the items are. However we don't appear to be maxing out the nodes with memory and the application functionality is good.
Is this to be expected? I understand that the cache management will clear out old records over time as they expire, so these could just be "remnant" data records, but this is my first time working with memcached so I want to verify that this is normal behavior.
It's quite normal for any caching solution to rack up a number of items. Especially for lots of small objects it's often more efficient for a cache to keep them beyond their expiry (but no longer serve them) and then clear them out in a big sweep periodically.
"Remnant records" pretty much describes it.
As long as your application performs as expected, I wouldn't worry. You should worry when you get a lot of cache misses for objects that were supposed to be in cache but kicked out before expiry due to lack of memory to store them all.
Yes
It is normal to have lots of records in Memcache. But you need to have proper session management.
Store small amount of values per session. (Data which is required most of the API's, Like user access token)
Cache expiration
The biggest challenge when using Memcache is avoiding cache staleness while still writing clean code. Most developers store data to Memcache and delete or update data when it changes. This strategy can get messy very quickly – Memcache code becomes riddled throughout an application. Rails’ Sweepers can help with this problem, but other languages and frameworks don’t have similar alternatives.
One simple strategy to avoid code complexity is to write data to Memcache with an expiration. Data with an expiration will automatically expire when the expiration is reached. Most applications can benefit from time-based cache expiration with infrequently changing content such as static assets, headers, footers, blog posts, etc.
List management
A simple list stored in Memcache can be useful for maintaining denormalized relationships.
For example An e-commerce website may want to store a small table of recent purchases. Rather than keeping a serialized list in Memcache and recalculating it when a new purchase is made, append and prepend can be used to store denormalized data, avoiding a database query.
Note - Memcache only supports a max value size of 1 MB. Be careful creating lists that may grow larger in size than the maximum allowed value size
Also Check these links-
https://cloud.google.com/appengine/docs/adminconsole/memcache
http://docs.oracle.com/cd/E17952_01/refman-5.6-en/ha-memcached-faq.html
http://symas.com/mdb/memcache/

Resources