WATCH / UNWATCH Redis key outside a transaction - caching

I'm relatively new to Redis and trying to understand how I can use WATCH / UNWATCH to address a concurrency / race condition issue.
All of the documentation I've read describes WATCH in the context of a transaction, but can I WATCH and UNWATCH a key if no transaction occurs?
Here is the scenario:
I need to modify an object in cache but due to TTL I cannot guarantee that object exists
I don't want to fetch the object, check it exists, and then call WATCH, because it is possible for TTL to expire between fetching and WATCHing.
Solution is to WATCH the key, fetch the object, check it exists, and UNWATCH the key if the object doesn't exist
Is this workflow possible, and is there a better way to achieve my goal?

Related

How to decline a request, if another one is already processed for the same user-id?

I am trying to implement some kind of sync-service.
Two clients with different user-agents may POST/PATCH to /sync/user/{user_id}/resource at the same time with the same user_id. sync should update data for user with id={user_id} in DB.
func (syncServer *SyncServer) Upload(w http.ResponseWriter, r *http.Request, ps httprouter.Params) {
userID := ps.ByName("user_id"))
if isAlreadyProcessedForUser(userID) {
w.WriteHeader(http.StatusConflict)
return
}
...
syncServer.db.Update(userID, data)
...
}
The problem is I have no idea how to correctly decline one Upload when another one is still processing request for the same user_id. I think the idea to use mutex.Lock() is bad because I will use many pods for this handler and if Upload is called on different pods, it won't help me out anyway. What a synchronization method can I use to solve this problem? Should I use some additional field in DB? I am asking to give me any idea!
There're many ways to do this (distributed locking) in a distributed system, some I can come up with by far:
Use a redis (or any other similar services) lock . Then you can lock each user_id on receiving the first request and reject other requests for the same user_id because you'll failed to lock it. Redis lock generally has expiration time so you won't deadlock it. Ref: https://redis.io/docs/reference/patterns/distributed-locks/
Use a database lock. You should be careful to use a database lock, but a simple way to do this is with unique index: Create a uploading record with unique(user_id) constraints before upload, and delete it after upload. It's possible to forget/failed to delete the record and cause deadlock, so you might want to add another expired_at field to the record, check & drop it before uploading.
(Specific to the question's scenario) Use a unique constraints on (user_id, upload_status). This is called partial index, and you'll only check this unique index when upload_stats = 'uploading'. Then you can create an uploading record on each request, and reject the other request. Expiration is also needed so you need to track the start_time of uploading and cleanup long-time uploading record. If you don't need to re-claim the disk space you can simply mark the record as failed, by this you can also track when & how these uploads failed in database.
CAUTION:
It seems that you're using Kubernetes, so any non-distributed lock should be used cautiously, depends on the level of consistency you want to acquire. Pods are volatile and it's hard to rely on local information and achieve consistency because they might be duplicated/killed/re-scheduled to another machine. This also applies to any other platforms with auto scaling or scheduling mechanisms.
A syncing process between several clients owned by one user and server needs to handle at least the request ordering, request deduplicating, and eventual consistency issue (e.g. Google Doc can support many people editing at the same time). There're some generic algorithms (like operational transformation) but it depends on your specific use case.

Ensure consistency when caching data in after_commit hooks

For a specific database table, we need an in-memory cache of this data that is always in-sync with the database. My current attempt is to write the changes to the cache in an after_commit hook - this way we make sure not to write any changes to the cache that could get reverted later.
However, this strategy is vulnerable to the following scenario:
Thread A locks and updates record, stores value 1
Thread A commits the change
Thread B locks and updates record, stores value 2
Thread B commits the change
Thread B runs the after_commit hook, so the cache now has value 2
Thread A runs the after_commit hook, so the cache now has value 1 but should have value 2
Am I right about this problem and how would one solve this?
You are right about this problem.
There is a after_save callback that runs within the same transaction. You might want to use that one instead of the after_commit hook that run after the transaction.
But than you will need to deal with a rolled back transaction yourself.
Or you might want to write your caching method in a way that does not depend on a specific instance. But instead caches the latest version that is found in the database by reloading the record from the database first.
But even than: Multithreaded systems are hard to keep in sync. And you cannot even ensure if the first or the second update send to your cache would be stored, because the caching system might be multi-threaded too.
You might want to read about different consistency models.
The solution we came up with is to lock the cache for read / write before_commit and unlock it in the after_commit. This seems to do the trick.

Redis cache updating

EDIT2: Clarification: The code ALREADY has refresh cache on miss logic. What I'm trying to do is reducing the number of missed cache hits.
I'm using Redis as a cache for an API. The idea is that when the API receives a call it first checks the cache and if the data isn't in cache the API will fetch it and cache it afterwards for next time.
At the moment the configuration is the following:
maxmemory 50mb
maxmemory-policy allkeys-lru
That is, use at most 50mb memory, keep trying keys in there and when memory is full start by deleting the least recently used keys (lru).
Now I want to introduce a second category of keys. For this second category I'm going to set a certain expiry time. Now I would like to set up a mechanism such that when these keys expiry this mechanism kicks in and refreshes them (and sets new expiry).
How do I do this?
EDIT:
Some progress. It turns out that Redis has a pub/sub messaging system which in particular can dispatch messages on event. One of them is expiring keys, which can be enabled as such:
notify-keyspace-events Ex
I found this code can describes a blocking python process subscribing to Redis' messaging system. It can easily be changed to detect keys expiring and make a call to the API when a key expires, and the API will then refresh the key.
def work(self, item):
requests.get('http://apiurl/?q={param}'.format(param=item['data']))
So this does precisely what I was asking about.
Often, this feels way too dangerous and out of control. I can imagine a bunch of different situations under which this will very quickly fail.
So, what's a better solution?
http://redis.io/topics/notifications
Keyspace notifications allows clients to subscribe to Pub/Sub channels
in order to receive events affecting the Redis data set in some way.
Examples of the events that is possible to receive are the following:
All the keys expiring in the database 0. (e.g)
...
EXPIRE generates an expire event when an expire is set to the key, or
a expired event every time setting an expire results into the key
being deleted (see EXPIRE documentation for more info).
To expire keys, just use Redis' built-in expiry mechanism. You don't need to refresh the cache contents on expiry, the simplest is to do it when the code experiences a cache miss.

Redis as cache - reset expiry

I am using redis as a cache and would like to expire data in redis that are not actively used. Currently, setting expiry for an object deletes an object after the expiry time has elapsed. However, I would like to retain the object in redis if it is read atleast once before the object expires.
One way I see is to store a separate expiry_key for every object and set the expiry to the expiry_key instead of the original object. Subscribe to del notification on the expiry_key and when a del notification is received, check if the object is read atleast once (via a separately maintained access log) during the expiry interval. If the object is not read, execute a del command on the original object. If it is read, recreate the expiry_key with the expiry interval.
This implementation requires additional systems to manage expiry and would prefer to do it locally with redis.
Are there better solutions to solve this?
Resetting expiry for the object for every read will increase the number of writes to redis and hence this is not a choice.
Note the redis cache refresh is managed asynchronously via a change notification system.
You could just set the expiry key again after each read (setting a TTL on a key is O(1)).
It maybe make sense for your system to do this in a transaction:
MULTI
GET mykey
EXPIRE mykey 10
EXEC
You could also pipeline the commands.
This pattern is also described in the official documentation.
Refer to section "Configuring Redis as a cache" in http://redis.io/topics/config
We can set maxmemory-policy to allkeys-lru to clear inactive content from redis. This would work for the usecase I have stated.
Another way is do define a notification on the key , and then reset it's expiration
see here

How to I set up a lock that will automatically time out if it does not get a keep alive signal?

I have a certain resouce I want to limit access to. Basically, I am using a session level lock. However, it is getting to be a pain writing JavaScript that covers every possible way a window can close.
Once the user leaves that page I would like to unlock the resouce.
My basic idea is to use some sort of server side timeout, to unlock the resouce. Basically, if I fail to unlock the resource, I want a timer to kick in and unlock the resouce.
For example, after 30 seconds with now update from the clientside, unlock the resouce.
My basic question, is what sort of side trick can I use to do this? It is my understanding, that I can't just create a thread in JSF, because it would be unmanaged.
I am sure other people do this kind of thing, what is the correct thing to use?
Thanks,
Grae
As BalusC right fully asked, the big question is at what level of granularity would you like to do this locking? Per logged-in user, for all users, or perhaps you could get away with locking per request?
Or, and this will be a tougher one, is the idea that a single page request grabs the lock and then that specific page is intended to keep the lock between requests? E.g. as a kind of reservation. I'm browsing a hotel page, and when I merely look at a room I have made an implicit reservation in the system for that room so it can't happen that somebody else reserves the room for real while I'm looking at it?
In the latter case, maybe the following scheme would work:
In application scope, define a global concurrent map.
Keys of the map represent the resources you want to protect.
Values of the map are a custom structure which hold a read write lock (e.g. ReentrantReadWriteLock), a token and a timestamp.
In application scope, there also is a single global lock (e.g. ReentrantLock)
Code in a request first grabs the global lock, and quickly checks if the entry in the map is there.
If the entry is there it is taken, otherwise it's created. Creation time should be very short. The global lock is quickly released.
If the entry was new, it's locked via its write lock and a new token and timestamp are created.
If the entry was not new, it's locked via its read lock
if the code has the same token, it can go ahead and access the protected resource, otherwise it checks the timestamp.
If the timestamp has expired, it tries to grab the write lock.
The write lock has a time-out. When the time-out occurs give up and communicate something to the client. Otherwise a new token and timestamp are created.
This just the general idea. In a Java EE application that I have build I have used something similar (though not exactly the same) and it worked quite well.
Alternatively you could use a quartz job anyway that periodically removed the stale entries. Yet another alternative for that is replacing the global concurrent map with e.g. a JBoss Cache or Infinispan instance. These allow you to define an eviction policy for their entries, which saves you from having to code this yourself. If you have never used those caches though, learning how to set them up and configuring them correctly can be more trouble than just building a simple quartz job yourself.

Resources