I'm going to apply Microservices for my Datawarehouse application. There are 4 main Microservices in application:
1) Data Service: Import/Export external data sources to DWH and Query data from DWH.
2) Analytics Service: for chart visualization on UI
3) Machine Learning: for recommendation system
4) Reports: for report generating
The diagram as below:
Each service has its own DB and they communicate directly with each other via TCP and Thift serialization. The problem here is Data Service suffer a high load from other services and can become a SPOF of application. Data in DWH is big too (maybe up to hundred miliions of records). How to avoid the bottlenecks for Data Service in this case? Or How do I define a properly bounded context to avoid the bottlenecks?
You may think about
splitting Data Service into few microservices, based on some business logic;
modify Data Service (if needed) to support more than one instance of service. Then use the load balancer to split requests between those instances.
A load balancer is a device that acts as a reverse proxy and distributes network or application traffic across a number of servers. Load balancers are used to increase capacity (concurrent users) and reliability of applications.
Regarding "One database, multiple services":
Each microservice need to have own data storage, otherwise, you do not have a decomposition. If we are talking about relation database, then this can be achieved using one of the following patterns:
Private tables per Service – each service owns a set of tables that must only be accessed by that service
Schema perService – each service has a database schema that’s private to that service
Database per Service – each service has it’s own database.
If your services using separate tables from Data Warehouse database and Data Service only provides access layer to database without any additional processing logic, then yes, you may remove Data Service and move data access logic to corresponding services. But think on another hand - right now you have only one place (Data Service), that knows how to access and manipulate with Data Warehouse that is what microservices are about.
Related
I'm building a microservices architecture that should deal with:
Direct database access
Call to external legacy services
I can think about 2 caching strategies, but can't figure out what is the best considering that I will not have control on what other people could do across the layers.
Caching at application level (#Cacheable)
I only provide a caching feature that everyone can use, by enforcing the spring.cache.redis.key-prefix to the microservice name to limit conflicting keys.
PRO: most flexible way
CONS:
No control over cache except for maximum space: people would just create new cache entries
No control over cache invalidation: we don't know what kind of data is actually stored so if, for example, a legacy system needs to be reloaded we cannot empty some cache keys
Possible redundancy: as caching is at application layer it could happen that microservices store about the same data. While I could have control on the database (one MS should own its own db or at least a subset of tables) I can't guarantee about the legacy SOAP layer
Caching at service layer (connectors)
I don't provide a caching feature but I provide custom soap connectors that will/will not cache response based on a configuration that I will provide (could also be a blacklist/whitelist)
PROS:
cache is controlled
easy to invalidate
CONS:
need to update connectors each time a cache policy changes
dependency between development and architecture
edit: I need suggestion about the theoretical approach, not about a specific technology.
I suppose you should build different microservices (apis) to deal with different set of responsibilities. Like , you can have a one microservice which deals with legacy and other one which deals with database. In order for these two microservices to communicate, you can have a message broker architecture like apache kafka (hazelcast being cost effective or Rabbit MQ).
Communication between these two microservices can be event driven as well.
Once you decide this, then you can finalize where to place your cache.
You will need to place cache at application level and not service level if there is an UI where you are showing these values.
I am trying to implement caching in my application.
We are using Oracle database, asp.net web api to serve data to ui.
Api calls take more time, so we are thinking of implementing caching. Our code is deployed on 2 servers with load balancers.
How caching should be implemented.
What i am planning to implement is,
There should be a service API on any server, this api will store all data in memory. Ui will call our existing API, hit can go to any node, this api then will get data from new api(cache) and serve it to ui.
Is this architecture correct for distruted caching.
Can any one share their experience or guidance to implementation?
You might want to check NCache. Being a distributed caching solution, it provides first class support for sharing cache data between multiple clients due to the ache process running autonomously outside the address of any one application address space.
For your case, every web server in your load-balanced web farm will have be the client of NCache and have direct access to the cache servers. All the web servers,being clients to a central caching solution, will see the same cache data through simple-to-use NCache APIs. Any modification through insert, update or delete cache operations will be immediately observable to all the web servers.
The intelligence driving NCache allows for a seamless behind-the-scenes handling of all the tasks of storing and distributing the cache data among multiple cache server nodes on which the cache instance is distributed.
Furthermore, all the caching operations are completely independent of the framework used for database content retrieval and can be applied equally well with NHibernate, EF, EF Core and, of course, ADO.NET.
You can find more information about how to integrate NCache into your web farm environment and much more by using the following link:
http://www.alachisoft.com/resources/docs/ncache/admin-guide/ncache-architecture.html
I need to create a food ordering service, using microservices, scalable , cluster, several steps to order. Need to store user data between steps / requests.
What is an approach to keep state and user data? Store it in DB? Cache? Shared memory?
Are there any tutorials for the best practice of it?
(I gonna use spring / springboot and modules)
Anything that you cannot afford to lose (usually the business data) will go in DB and can be parallelly cached in an in-memory DB like Redis that has a cache eviction algorithm inbuilt.
Anything that, if lost, is not a big deal (usually the technical things that are not directly linked with the business data) can go only in an in-memory DB.
Since you are using Spring, you could probably use something like Redis with Spring Data Redis. There are already known Spring solutions (such as this) to fall back on api calls to fetch data from DB if the Redis server goes down. You can also run multiple Redis instances behind Redis Sentinel to provide failover. Redis Cluster provides a way to run a Redis installation where data is automatically sharded across multiple Redis nodes. Also, you can configure Redis to persist the data in file system once daily or so to backup the cache data for disaster recovery.
If you are looking for a fully managed service, AWS provides "Step Functions" to satisfy your stateful requirements: https://stackoverflow.com/questions/tagged/aws-step-functions
I'm a beginner in microservice architecture and I have read in a lot of blog that in a microservice architecture, it is mandatory that each micro service has its own database. In my case it may cost very expensive.
My question is, is it possible to make the persistence layer as micro service in itself ? Which would have the function of allowing other microservices to have read/write access to the database.
Thanks
To answer your question first of all lets understand :
it is mandatory that each micro service has its own database. In my
case it may cost very expensive.
Yes it is said that every microservice should have its own database.
What they mean is tables/collection of each microservice should be separate (you could use a single scalable database instance) and one microservice should only access the data of other microservices only through API calls
Benefits of having a separate model are:
Model will be clean. Eg: In E-Commerce Customer have diff. meaning for Shipping Microservice, Order Microservice, Customer Management Microservice and so on. If we put all data required by multiple microserives Customer Object will become very big
Microservices could evolve independently. In this case if we have a single Customer object and one microservice lets say Order one want to add something to the schema, all microservices needs to change
If we have a single Database Schema we will be getting into a big mess.
In my case it may cost very expensive.
If expensive means read model actually require data from multiple microservices. then its better to listen to events from multiple microservices and create a single read model , little duplication of data is ok.
If anything else, ask more specific question.
Having all Microservices accessing the same database will result in Loose Cohesion and Strong Coupling
Try to see if you can define separate Schema for each of the Microservices, so that you can ensure Microservices doesn't refer to the tables of other MicroServices.
This way in future, you can seamlessly move to separate Database for each service when your infrastructure cost concern goes off.
Micro services follows database per service model
We are building a multi tenant application which has restrictions on the regions/countries where the data is persisted.
The application is based on microsoft .Net microservice architecture but we have shared Domains, although we have separate DBs at very lower levels say for each city a separate DB. We cannot persist the data of one country in another country's data center. Hazelcast will be used as the distributed cache. I could not find any direct ways to configure data isolation for ex. like "Memory Regions" in apache ignite. Do we have "Memory Regions" in hazelcast?
I need to write behind the data from cache to respective Database. Can I segregate a part/partition of cache specific to a database instance?
Any help would be greatly appreciated. Thanks in advance.
I am not directly replying to your question. IMHO, from my understanding when you have a data stored across different clusters / nodes, there will still be a network call, despite you having some key formats so that the data is stored within the same Cluster / Node.
Based on my experience, you could easily setup a MemoryCache that comes as part of the System.Runtime.Caching to store the data in every node and then use Redis Pub-Sub or Azure Service bus as the back-bone for the pub-sub.
In that case,
any data that is updated in a cache is notified to all the other instances of the application via a ServiceBus / Redis message which is typically the key.
Upon receipt of the key, each application clears out its internal cache and then gets the data cached back on the next DB access.
This method is more commonly prevalent in Multi-Tenant Applications and also is fail-safe and light weight. The payloads / network transfers are less and each AppDomain has its internal memory used as a cache which does support different regions via different instances of MemoryCache.
Hope this helps if no direct response is available regarding HazelCast
Also, you may refer to this link for some details regarding the Hazelcast