I am developing a microservice. This MS will be deployed to docker containers and will be monitored by Kubernetes. I have to implement a caching solution using hazelcast distributed cache. My requirements are:
Preload the cache on startup of this microservice. For around 3000 stores I have to fetch two specific attributes and cache them.
Every 24 hours refresh the cache.
I implemented Spring #EventListener and on startup to make a database call for the 2 attributes and do a #CachePut and store them in Cache.
I also have a Spring scheduler with cron expression to refresh cache at every 6 AM in morning.
So far so good.
But what I did not realize that in clustered environment - 10-15 instances of my microservice will be in action and will try to do above 2 steps almost simultaneously - thus creating a stampede effect on my database and cache. Does anyone know what to do in this scenario? Is there any good design or even average one which I can follow?
Thanks.
You should be looking to use Hazelcast provided Loading and Storing Persistent Data mechanism that allows 2 options for writing: Write-through and write-behind and read-through for loading data into the cache.
Look for MapLoader and its methods, that will let you warm-up/preload your cluster and you have the freedom to do that with your own implementation.
Check for more details: https://docs.hazelcast.org/docs/3.11/manual/html-single/index.html#loading-and-storing-persistent-data
Related
I have implemented Caching in my Spring Boot REST Application. My policy includes a time based cache eviction strategy, and an update-based cache eviction strategy. I am worried that since I employ a stateless server, if there is a method called to update certain data, and this was handled by server instance A, then the corresponding caches in server instance B, C and D, are not updated as well.
Is this an issue I would face / is there a way to overcome this issue?
This is the oldest problem in software development - cache invalidation when you have multiple servers
One way to handle it is to move your cache out of the individual servers and move them to somewhere shared like another instance that holds the cache entries that every other app refers to or something like redis [centralized cache]
Second way is to do a broadcast message so that each server now knows to invalidate the entry once the data has been modified or deleted - here you run the risk of the message not being processed and thus a stale entry is left in some server[s]
Another option is to have some sort of write ahead log [like kafka or redis streams] which is processed by each server and thus they will all process the events deterministically and have the same cache state
Lmk if you need more help - we can setup some time outside of SO
Background
I am working on a spring boot application. In this application, we have two different caffeine caches - dealDetailsCache and trackingDetailsCache.
Both of these caffeine caches have single keys, let us say, X and Y respectively. We just keep updating the values for the same keys while refreshing.
Application Flow
In every 5 minutes, there is a scheduled job that runs.
This job fetches some data from an external source and upon successful retrieval, updates the 2 caffeine caches (mentioned above)
Currently, I am manually doing a put operation on each of the above caffeine caches to refresh the data for their respective keys:
put(X, <<data>>) for updating dealDetailsCache
put(Y, <<data>>) for updating trackingDetails
Expected QPS is about 50 per second.
What am I looking for
I am looking for a way to refresh the caffeine caches (just like buildAsync)
such that it does not impact the application and there should be no
downtime.
If put is not the right way to do this, then would someone please
suggest the right way to update the cache in such a way that there is
absolutely no downtime.
I read about CacheEvict, but there is a risk associated with it. It evicts and then refreshes. There could be some time between the two operations and any requests that come in during this period (after eviction and before new data is loaded) would fail.
Ultimate aim is that the requests should always find the data, even if it is old for the time being. Would someone please suggest a clean mechanism for manual cache refreshes?
If i add spring-session jdbc to my vaadin-spring-boot-application the application is very slow and does a full page reload after a few seconds. Everything else looks like it is working normally.
I do not notice the problem and I have been researching on this issue for a few days and got this Github issue and Vaadin microservices configuration But in these, I did not find a suitable solution to solve this problem, Any one can give me an true example to implemention Spring sessions on Vaadin?
Regards.
Session replication schemes like spring-session assumes that the session is relatively small and that the content isn't sensitive to concurrent modification from multiple request threads. Neither of those assumptions hold true for a typical Vaadin application.
The first problem is that there's typically between 100KB and 10MB of data in the session that needs to be fetched from the database, deserialized, updated and then again serialized and stored in the database for each request. The second problem is that Vaadin stores a lock instance in the session and uses that to ensure there aren't multiple request threads using the same session concurrently.
To serialize a session to persistent storage, you thus need to ensure your load balancer uses sticky sessions and typically also use a high performance solution such as Hazelcast rather than just deserializing and serializing individually for each request.
For more details, you can have a look at these two posts:
https://vaadin.com/learn/tutorials/hazelcast
https://vaadin.com/blog/session-replication-in-the-world-of-vaadin
I have been studying about Redis (no experience at all - just studied theory), and after doing some research, found out that its also being used as cache. e.g. StackOverfolow it self.
My question is, if I have an asp.net WebApi service, and I use output caching at the WebApi level to cache responses, I am basically storing kind of key/value (request/response) in server's memory to deliver cached responses.
Now as redis is an in memory database, how will it help me to substitute WebApi's output caching with redis cache?
Is there any advantage?
I tried to go through this answer redis-cache-vs-using-memory-directyly, but I guess I didn't got the key line in the answer:
"Basically, if you need your application to scale on several nodes sharing the same data, then something like Redis (or any other remote key/value store) will be required."
I am basically storing kind of key/value (request/response) in server's memory to deliver cached responses.
This means that after a server restart, the server will have to rebuild the cache . That won't be the case with Redis. So one advantage of Redis over a homemade in-memory solution is persistence (only if that's an issue for you and that you did not planned to write persistence yourself).
Then instead of coding your own expiring mechanism, you can use Redis EXPIRE or command EXPIREAT or even simply specifying the expire timestamp when putting the api output string into cache with SETEX.
if you need your application to scale on several nodes sharing the same data
What it means is that if you have multiple instances of the same api servers, putting the cache into redis will allow these servers to share the same cache, thus reducing, for instance, memory consumption (1 cache instead of 3 in-memory cache), and so on...
I am working on a webapp project and we are considering deploying it on multiple servers.
What solution do you advise for clustering/load-balancing with Spring?
What are the issues to take into account?
For example: How do singletons behave in a cluster of machines? What about session replication? Are there any other issues to take into account?
Here is the list of possible issues (not necessarily Spring-related):
stateful beans - if your beans have state, like collections accumulating something or counters, you need to think whether this state should be replicated or not. E.g. should this counter be local to one JVM or global in the whole cluster? In the latter case consider terracotta and hazelcast
filesystem - as long as all instances use the same database, everything is fine. But if one node writes to disk, other instance can't read it. Solutions? Either use database for all storage or distributed file system
HTTP sessions - either use sticky session or replicate sessions. If you go for replication, keep sessions as small as possible.
asynchronous jobs - if you have a job running every hour, should it run on every machine, or just on a dedicated one (or maybe on random)?