How to use redis for number of micro-services? - caching

I am very much new to redis. I have been investigating on redis for past few days.I read the documentation on cache management(lru cache), commands ,etc. I want to know how to implement caching for multiple microservice(s) data .
I have few questions:
Can all microservices data(cached) be kept under a single instance of redis
server?
Should every microservice have its own cache database in redis?
How to refresh cache data without setting EXPIRE? Since it would consume more memory.
Some more information on best practices on redis with microservices will be helpful.

It's possible to use the same Redis for multiple microservices, just make sure to prefix your redis cache keys to avoid conflict between all microservices.
You can use multi db in the same redis instance (i.e one for each microservice) but it's discouraged because Redis is single threaded.
The best way is to use one Redis for each microservices, then you can easily flush one of them without touching others.
From my personal experience with a redis cache in production (with 2 million keys), there is no problem using EXPIRE. I encourage you to use it.

Please find below the answer to all your questions -
Can all microservices data(cached) be kept under a single instance of redis server? Ans - Yes you can keep all the data under single redis instance, all you need to do is to set that data using different key Name. As redis is basically a Key-Value Database.
Should every microservice have its own cache database in redis? Ans - Not required. Just make different key for each microservice. Also please note that you can use colon (:) to make folders in redis, to identify different microservices easily on Redis Desktop Manager.
Example - Key Name X:Y:Z, here Z is placed in Y folder and Y is in X. SO you will get a folder kind of structure. That would be helpful to differentiate different microservices.
How to refresh cache data without setting EXPIRE? Since it would consume more memory. Ans - You can set data again on the same key if you have any change in Microservice response. That Key value will get over written in that case.

Can all microservices data(cached) be kept under a single instance of redis server?
In microservice architecture it's prefirible "elastic scale SaaS". You can think your Cache service is perse a microservice (that will response on demand) Then you have multiple options here. The recommended practice on data storage is sharding https://azure.microsoft.com/en-us/documentation/articles/best-practices-caching/#partitioning-a-redis-cache .See the diagram below for book Microservices, IoT and Azure
Should every microservice have its own cache database in redis? It's possible to still thinking "vertical partition" but you should consider "horizontal partitions" so again consider sharding; additionally It's not a bad idea to have "local cache" specialy to avoid DoS
"Be careful not to introduce critical dependencies on the availability of a shared cache service into your solutions. An application should be able to continue functioning if the service that provides the shared cache is unavailable. The application should not hang or fail while waiting for the cache service to resume."
How to refresh cache data without setting EXPIRE? Since it would consume more memory.
You can define your synch polices; I think cache is suitable for things that have few changes.
"It might also be appropriate to have a background process that periodically updates reference data in the cache to ensure it is up to date, or that refreshes the cache when reference data changes."
For cahe best practices check
Caching Best Practices

Related

Clarification on database caching

Correct me if I'm wrong, but from my understanding, "database caches" are usually implemented with an in-memory database that is local to the web server (same machine as the web server). Also, these "database caches" store the actual results of queries. I have also read up on the multiple caching strategies like - Cache Aside, Read Through, Write Through, Write Behind, Write Around.
For some context, the Write Through strategy looks like this:
and the Cache Aside strategy looks like this:
I believe that the "Application" refers to a backend server with a REST API.
My first question is, in the Write Through strategy (application writes to cache, cache then writes to database), how does this work? From my understanding, the most commonly used database caches are Redis or Memcached - which are just key-value stores. Suppose you have a relational database as the main database, how are these key-value stores going to write back to the relational database? Do these strategies only apply if your main database is also a key-value store?
In a Write Through (or Read Through) strategy, the cache sits in between the application and the database. How does that even work? How do you get the cache to talk to the database server? From my understanding, the web server (the application) is always the one facilitating the communication between the cache and the main database - which is basically a Cache Aside strategy. Unless Redis has some kind of functionality that allows it to talk to another database, I don't quite understand how this works.
Isn't it possible to mix and match caching strategies? From how I see it, Cache Aside and Read Through are caching strategies for application reads (user wants to read data), while Write Through and Write Behind are caching strategies for application writes (user wants to write data). Couldn't you have a strategy that uses both Cache Aside and Write Through? Why do most articles always seem to portray them as independent strategies?
What happens if you have a cluster of webs servers? Do they each have their own local in-memory database that acts as a cache?
Could you implement a cache using a normal (not in-memory) database? I suppose this would still be somewhat useful since you do not need to make an additional network hop to the database server (since the cache lives on the same machine as the web server)?
Introduction & clarification
I guess you have one misunderstood point, that the cache is NOT expclicitely stored on the same server as the werbserver. Sometimes, not even the database is sperated on it's own server from the webserver. If you think of APIs, like HTTP REST APIs, you can use caching to not spend too many resources on database connections & queries. Generally, you want to use as few database connections & queries as possible. Now imagine the following setting:
You have a werbserver who serves your application and a REST API, which is used by the webserver to work with some resources. Those resources come from a database (lets say a relational database) which is also stored on the same server. Now there is one endpoint which serves e.g. a list of posts (like blog-posts). Every user can fetch all posts (to make it simple in this example). Now we have a case where one can say that this API request could be cached, to not let all users always trigger the database, just to query the same resources (via the REST API) over and over again. Here comes caching. Redis is one of many tools which can be used for caching. Since redis is a simple in-memory key-value storage, you can just put all of your posts (remember the REST API) after the first DB-query, into the cache. All future requests for the posts-list would first check whether the posts are alreay cached or not. If they are, the API will return the cache-content for this specific request.
This is one simple example to show off, what caching can be used for.
Answers on your question
My first question is, why would you ever write to a cache?
To reduce the amount of database connections and queries.
how is writing to these key-value stores going to help with updating the relational database?
It does not help you with updating, but instead it helps you with spending less resources. It also helps you in terms of "temporary backing up" some data - but that only as a very little side effect. For this, out there are more attractive solutions (Since redis is also not persistent by default. But it supports persistence.)
Do these cache writing strategies only apply if your main database is also a key-value store?
No, it is not important which database you use. Whether it's a NoSQL or SQL DB. It strongly depends on what you want to cache and how the database and it's tables are set up. Do you have frequent changes in your recources? Do resources get updated manually or only on user-initiated actions? Those are questions, leading you to the right caching implementation.
Isn't it possible to mix and match caching strategies?
I am not an expert at caching strategies, but let me try:
I guess it is possible but it also, highly depends on what you are doing in your DB and what kind of application you have. I guess if you find out what kind of application you are building up, then you will know, what strategy you have to use - i guess it is also not recommended to mix those strategies up, because those strategies are coupled to your application type - in other words: It will not work out pretty well.
What happens if you have a cluster of webs servers? Do they each have their own local in-memory database that acts as a cache?
I guess that both is possible. Usually you have one database, maybe clustered or synchronized with copies, to which your webservers (e.g. REST APIs) make their requests. Then whether each of you API servers would have it's own cache, to not query the database at all (in cloud-based applications your database is also maybe on another separated server - so another "hop" in terms of networking). OR (what i also can imagine) you have another middleware between your APIs (clusterd up) and your DB (maybe also clustered up) - but i guess that no one would do that because of the network traffic. It would result in a higher response-time, what you usually want to prevent.
Could you implement a cache using a normal (not in-memory) database?
Yes you could, but it would be way slower. A machine can access in-memory data faster then building up another (local) connection to a database and query your cached entries. Also, because your database has to write the entries into files on your machine, to persist the data.
Conclusion
All in all, it is all about being fast in terms of response times and to prevent much network traffic. I hope that i could help you out a little bit.

what are the best approaches (practices) to create stateful microservices?

I need to create a food ordering service, using microservices, scalable , cluster, several steps to order. Need to store user data between steps / requests.
What is an approach to keep state and user data? Store it in DB? Cache? Shared memory?
Are there any tutorials for the best practice of it?
(I gonna use spring / springboot and modules)
Anything that you cannot afford to lose (usually the business data) will go in DB and can be parallelly cached in an in-memory DB like Redis that has a cache eviction algorithm inbuilt.
Anything that, if lost, is not a big deal (usually the technical things that are not directly linked with the business data) can go only in an in-memory DB.
Since you are using Spring, you could probably use something like Redis with Spring Data Redis. There are already known Spring solutions (such as this) to fall back on api calls to fetch data from DB if the Redis server goes down. You can also run multiple Redis instances behind Redis Sentinel to provide failover. Redis Cluster provides a way to run a Redis installation where data is automatically sharded across multiple Redis nodes. Also, you can configure Redis to persist the data in file system once daily or so to backup the cache data for disaster recovery.
If you are looking for a fully managed service, AWS provides "Step Functions" to satisfy your stateful requirements: https://stackoverflow.com/questions/tagged/aws-step-functions

Hazelcast data isolation ("Memory Regions")

We are building a multi tenant application which has restrictions on the regions/countries where the data is persisted.
The application is based on microsoft .Net microservice architecture but we have shared Domains, although we have separate DBs at very lower levels say for each city a separate DB. We cannot persist the data of one country in another country's data center. Hazelcast will be used as the distributed cache. I could not find any direct ways to configure data isolation for ex. like "Memory Regions" in apache ignite. Do we have "Memory Regions" in hazelcast?
I need to write behind the data from cache to respective Database. Can I segregate a part/partition of cache specific to a database instance?
Any help would be greatly appreciated. Thanks in advance.
I am not directly replying to your question. IMHO, from my understanding when you have a data stored across different clusters / nodes, there will still be a network call, despite you having some key formats so that the data is stored within the same Cluster / Node.
Based on my experience, you could easily setup a MemoryCache that comes as part of the System.Runtime.Caching to store the data in every node and then use Redis Pub-Sub or Azure Service bus as the back-bone for the pub-sub.
In that case,
any data that is updated in a cache is notified to all the other instances of the application via a ServiceBus / Redis message which is typically the key.
Upon receipt of the key, each application clears out its internal cache and then gets the data cached back on the next DB access.
This method is more commonly prevalent in Multi-Tenant Applications and also is fail-safe and light weight. The payloads / network transfers are less and each AppDomain has its internal memory used as a cache which does support different regions via different instances of MemoryCache.
Hope this helps if no direct response is available regarding HazelCast
Also, you may refer to this link for some details regarding the Hazelcast

What is the best practices to implement caching layer?

I'm going to use Redis as a cache service.
What is the best practices to access the caching service?
Through a service/API or in-memory component?
I'm not sure I want to have access to the DB from all the services.
Thanks
All your questions depends on topology and/or architecture of your system. I don't think that you would provide a service on separated computer if your application resided completely on one computer.
But suppose you have distributed app.
In this case it makes sense to do caching using separated service on separated node. It's same as within OOP, you can simple encapsulate data also in cache. Other services depends on your cache, not directly on Redis - you can decide to change redis for something else. Another advantage of caching service is that you can cache data in memory depending on throughput and fetches data from redis time to time. Note that you can simple buy a server having a lot of RAM, e.g. 192gb, because caching service needs a memory more than anything else.

Is it necessary for memcached to replicate its data?

I understand that memcached is a distributed caching system. However, is it entirely necessary for memcached to replicate? The objective is to persist sessions in a clustered environment.
For example if we have memcached running on say 2 servers, both with data on it, and server #1 goes down, could we potentially lose session data that was stored on it? In other words, what should we expect to see happen should any memcached server (storing data) goes down and how would it affect our sessions in a clustered environment?
At the end of the day, will it be up to use to add some fault tolerance to our application? For example, if the key doesn't exist possibly because one of the servers it was on went down, re-query and store back to memcached?
From what I'm reading, it appears to lean in this direction but would like confirmation: https://developers.google.com/appengine/articles/scaling/memcache#transient
Thanks in advance!
Memcached has it's own fault tolerance built in so you don't need to add it to your application. I think providing an example will show why this is the case. Let's say you have 2 memcached servers set up in front of your database (let's say it's mysql). Initially when you start your application there will be nothing in memcached. When your application needs to get data if will first check in memcached and if it doesn't exist then it will read the data from the database and insert it into memcached before returning it to the user. For writes you will make sure that you insert the data into both your database and memcached. As you application continues to run it will populate the memcached servers with a bunch of data and take load off of your database.
Now one of your memcached servers crashes and you lose half of your cached data. What will happen is that your application will now be going to the database more frequently right after the crash and your application logic will continue to insert data into memcached except everything will go directly to the server that didn't crash. The only consequence here is that your cache is smaller and your database might need to do a little bit more work if everything doesn't fit into the cache. Your memcached client should also be able to handle the crash since it will be able to figure out where your remaining healthy memcached servers are and it will automatically hash values into them accordingly. So in short you don't need any extra logic for failure situations in memcached since the memcached client should take care of this for you. You just need to understand that memcached servers going down might mean your database has to do a lot of extra work. I also wouldn't recommend re-populating the cache after a failure. Just let the cache warm itself back up since there's no point in loading items that you aren't going to use in the near future.
m03geek also made a post where he mentioned that you could also use Couchbase and this is true, but I want to add a few things to his response about what the pros and cons are. First off Couchbase has two bucket (database) types and these are the Memcached Bucket and the Couchbase Bucket. The Memcached bucket is plain memcached and everything I wrote above is valid for this bucket. The only reasons you might want to go with Couchbase if you are going to use the memcached bucket are that you get a nice web ui which will provide stats about your memcached cluster along with ease of use of adding and removing servers. You can also get paid support down the road for Couchbase.
The Couchbase bucket is totally different in that it is not a cache, but an actual database. You can completely drop your backend database and just use this bucket type. One nice thing about the Couchbase bucket is that it provides replication and therefore prevents the cold cache problem that memcached has. I would suggest reading the Couchbase documentation if this sounds interesting you you since there are a lot of feature you get with the Couchbase bucket.
This paper about how Facebook uses memcached might be interesting too.
https://www.usenix.org/system/files/conference/nsdi13/nsdi13-final170_update.pdf
Couchbase embedded memcached and "vanilla" memcached have some differences. One of them, as far as I know, is that couchbase's memcached servers act like one. This means that if you store your key-value on one server, you'll be able to retreive it from another server in cluster. And vanilla memcached "clusters" are usally built with sharding technique, which means on app side you should know what server contain desired key.
My opinion is that replicating memcached data is unnessesary. Modern datacenters provide almost 99% uptime. So if someday one of your memcached servers will go down just some of your online users will be needed to relogin.
Also on many websites you can see "Remember me" checkbox that sets a cookie, which can be used to restore session. If your users will have that cookie they will not even notice that one of your servers were down. (that's answer for your question about "add some fault tolerance to our application")
But you can always use something like haproxy and replicate all your session data on 2 or more independent servers. In this case to store 1 user session you'll need N times more RAM, where N is number of replicas.
Another way - to use couchbase to store sessions. Couchbase cluster support replicas "out of the box" and it also stores data on disk, so if your node (or all nodes) will suddenly shutdown or reboot, session data will not lost.
Short answer: memcached with "remember me" cookie and without replication should be enough.

Resources