I’ve been doing a lot of googling regarding managing dependencies between microservices. We’re trying to move away from big monolithic app into micro-services in order to scale organizationally and be able to develop faster and with multiple teams working in parallel.
However, as we’re trying to functionally partition the monolith into the microservices, we see how intertwined business logic and data really is. This was not a problem when we were sitting on top of one big DB and were able to do big relational joins. But with microservices, this becomes a problem.
One solution is to make microservice-A go to 5-10 other microservices to get necessary data (this is equivalent of DB view with join). Another solution is to make microservice-A listen to events from 5-10 other services and populate local storage with relevant into (this is an equivalent of materialized view). Either way, microservice-A is coupled with 5-10 other services, and if new info is needed in microservice-A, the some of the services that it depends upon might will need to be release prior to microservice-A. Please note that microservice-A is itself depended upon by other services. Bottom line, we end up with DISTRIBUTED dependency hell.
Many articles advocate for second solution – i.e. something along the lines of Event Sourcing, Choreography, etc.
I would appreciate any shared experiences, recommendations and insights.
Philometor.
While not technically an "answer", I can definitely share some of my observations and experiences. Your question concerning services calling other services for database operations reminded me of a project where an architect sold senior management on the idea of "decoupling" persistence from the rest of the applications by implementing hundreds of REST interfaces in what essentially was a distributed DAO pattern in front of a very large enterprise database. The project ended up exactly the way I predicted - a dismal failure.
Microservices aren't about turning a monolithic application into a distributed monolithic application. In my example project above, the monolith was turned into a stove-piped, fragile, chaotic mess, with the coupling only moved to service contracts instead of Java class method signatures, and with a performance hit so bad the application was unusable. Last I heard they are still running their original monolith.
Microservices should be more of a vertical partitioning of your application and not a horizontal one. In my opinion it's better to think in terms of business function partitioning rather than "converting" an existing monolith. There's no rule that determines how big a microservice must be, but it should be big enough to do one complete synchronous function without needing to directly depend on outside services (as much as possible) to complete its work. If a microservice performs a complex business function that affects 50 tables, so be it! It owns those many tables. Ideally if a service goes down, it should affect only that business functionality it's responsible for, and not directly affect other services. As you can see, this thinking is the complete opposite from that which produced the distributed mess in my project example.
Not only do you need to ensure that the motivation behind replacing monoliths with microservices is sound, but also you need to step outside the monolith and revisit the actual business and begin partitioning that instead. Like everything else, baby steps are the way to go. Start with one small complete business function, and convert that into a single microservice instead of trying to replace a monolith all at once.
Related
For example,
You have an IT estate where a mix of batch and real-time data sources exists from multiple systems, e.g. ERP, Project management, asset, website, monitoring etc.
The aim is to integrate the datasources into a cloud environment (agnostic).
There is a need for reporting and analytics on combinations of all data sources.
Inevitably, some source systems are not capable of streaming, hence batch loading is required.
Potential use-cases for performing functionality/changes/updates based on the ingested data.
Given a steer for creating a future-proofed platform, architecturally, how would you look to design it?
It's a very open-end question, but there are some good principles you can adopt to help direct you in the right direction:
Avoid point-to-point integration, and get everything going through a few common points - ideally one. Using an API Gateway can be a good place to start, the big players (Azure, AWS, GCP) all have their own options, plus there's lots of decent independent ones like Tyk or Kong.
Batches and event-streams are totally different, but even then you can still potentially route them all through the gateway so that you get the centralised observability (reporting, analytics, alerting, etc).
Use standards-based API specifications where possible. A good REST based API, based off a proper resource model is a non-trivial undertaking, not sure if it fits with what you are doing if you are dealing with lots of disparate legacy integration. If you are going to adopt REST, use OpenAPI to specify the API's. Using this standard not only makes it easier for consumers, but also helps you with better tooling as many design, build and test tools support OpenAPI. There's also AsyncAPI for event/async API's
Do some architecture. Moving sh*t to cloud doesn't remove the sh*t - it just moves it to the cloud. Don't recreate old problems in a new place.
Work out the logical components in your new solution: what does each of them do (what's it's reason to exist)? Don't forget ancillary components like API catalogues, etc.
Think about layering the integration (usually depending on how they will be consumed and what role they need to play, e.g. system interface, orchestration, experience APIs, etc).
Want to handle data in a consistent way regardless of source (your 'agnostic' comment)? You'll need to think through how data is ingested and processed. This might lead you into more data / ETL centric considerations rather than integration ones.
Co-design. Is the integration mainly data coming in or going out? Is the integration with 3rd parties or strictly internal?
If you are designing for external / 3rd party consumers then a co-design process is advised, since you're essentially designing the API for them.
If the API's are for internal use, consider designing them for external use so that when/if you decide to do that later it's not so hard.
Taker a step back:
Continually ask yourselves "what problem are we trying to solve?". Usually, a technology initiate is successful if there's a well understood reason for doing it, which has solid buy-in from the business (non-IT).
Who wants the reporting, and why - what problem are they trying to solve?
As you mentioned its an IT estate aka enterprise level solution mix of batch and real time so first you have to identify what is end goal of this migration. You can think of refactoring applications. If you are trying to make it event driven then assess the refactoring efforts and cost. Separation of responsibility is the key factor for refactoring and migration.
If you are thinking about future proofing your solution then consider Cloud for storing and processing your data. Not necessary it will be cheap but mix of Cloud and on-prem could be a way. There are services available by cloud providers to move your data in minimal cost. Cloud native solutions are there for performing analysis on your data. Database migration service in AWS or Azure can move data and then capture on-going changes. So you can keep using on-prem db & apps and perform analysis for reporting on cloud. It will ease out load on your transactional DB. Most data sync from on-prem to cloud is near real time.
I just finished reading Uncle Bob's "Clean Architecture" and now wondering how to apply it in the context of microservices!
On one hand, I think that microservices fall in the "Framework-Drivers" layer since it's an implementation on top of use-cases (they are ways to serve use-cases.) This way, we focus on the core of the app (Entities and Use-cases) and stay flexible in the implementation of the outer layers (including microservices). But since each microservice can be maintained by a different developer/team of developers, they will have a bad time when use-cases change (harder to predict who will be impacted).
On the other hand, we can split our app into multiple microservices, decoupled from each other, and apply Clean Architecture inside each microservice. The pro of this approach is that we can focus on each microservice doing one thing, and doing it well. But the problem is that we started designing using technical separations (microservices) which violates the main Clean Architecture principle of focusing on the business. Also, it will be hard to not duplicate code if two microservices uses the same entity or use-case!
I think the first scenario is the best, but I would like to have feedback from fellow developers on the long-term benefits of both scenarios, and potential troubles.
My two cents:
From Uncle Bob's words, "Micro-services are deployment option, not an architecture". Each micro-service should be deployable, maintainable by different teams (which can be in different geographical locations). Each team can choose their own architecture, programming language, tools, frameworks etc... And forcing each team to use single/same programming language or tool or architecture does not sound good. So each micro-service team must be able to pick their architecture.
How can each team code/maintain/deploy their own micro-service without conflicting with other teams code? This question brings us to how to separate micro-services. IMHO it should be separated on feature based (same principle applies to modularization of mobile application projects where independent teams should be able to work on separate modules/micro-services).
After separating micro-services, the communication between them is implementation detail. It can be done through web-socket/REST API etc... Inside each micro-service, if team decides to follow Clean Architecture, they can have multiple layers based on Clean Arch Principles (Domain/Core - Interface Adapters - Presentation/API & Data & Infrastructure). There can/will be duplicate codes on micro-services, which are OK for micro-services.
As #lww-pai-long said in his answer here splitting based on the Domain responsibilities and DDD is in most cases the best solution.
Still if you worked with a system using micro-services you soon realize that there are other things involved here as well.
DDD Bounded Context as base for micro-services
In most cases splitting your application to micro-services based on Bounded Context is the safe way to go here. From experience I would even say that in some parts of Domain you could go even further and have multiple micro-services per Bounded Context. Example would be if you have quite big part of Domain which represents one Bounded Context. Other example would be if you use CQRS for a particular Domain. Then you can end up having a Write/Domain and Views/Read micro-service.
You can read in this answer how you can split your Domain to micro-services.
It would be advisable as you said to "apply Clean Architecture inside each microservice".
Also, it will be hard to not duplicate code if two microservices uses
the same entity or use-case!
This is something that you have to deal with when working with micro-services in most cases. Duplicating code and/or data across multiple micro-service is common drawback of working with micro-services. You have to take this into account as you on the other hand get isolation and independence of the micro-service and its database. This problem can be partly solved by using shared libraries as some sort of packages. Be careful this is not the best approach for all cases. Here you can read about using common code and libraries across micro-services. Unfortunately not all advice's and principles from Uncle Bob's "Clean Architecture" can be applied when using micro-services.
Non Domain or technical operation micro-services
Usually if your solution is using micro-services you will more or less have micro-services which are not Domain specific but rather some kind of technical task's or non business operations directly. Example could be something like:
micro-service for report generation
micro-service for email generation and forwarding
micro-service for authorization/permission management
micro-service for secret management
micro-service for notification management
These are not services which you will get by splitting your solution based on DDD principles but you still need them as general solution as they could be consumed by multiple other services.
Conclusion
When working with micro-services you will most of the time have a mixture of Domain specific and Domain agnostic micro-services. I think the Clean Architecture could be looked from a little different prospective when working with micro-services.
On one hand, I think that microservices fall in the
"Framework-Drivers" layer since it's an implementation on top of
use-cases (they are ways to serve use-cases.)
It kind of does but it also falls into the other layers like Entities and Use Cases. I think it goes in the direction that if you work on Domain specific services this Diagram becomes the Architecture of each micro-service but not a concept above all micro-services. In the applications where I worked with micro-services each micro-service(the ones which are based on the DDD Bounded Context) had most of this layers if not all of them. The Domain agnostic services are an exception to this as they are not based on Domain Entities but rather on some tasks or operations like 'Create an Email', 'Create a PDF report from html template' or similar'.
I think this question may be better on Sofware Engineering but I'll answer anyway.
My approach would be to use DDD and define each microservice as a Domain Services grouping Use Cases semantically, then link Domain Services with Bounded Context.
Sam newman talk about the importance of separating microservice by domain abraction and not technical one in Building Microservices
The point he makes basically is that defining scaling strategies for microservice based on subdomain will better match the "real live" constraints observed on the production system than using technically based microservice and try to defined a abstract strategy.
And if you look at how something like Kubernetes works it seems to push to that direction. A pod end up being a microservice with multiple containers defined as a complete stack matching a sub-domain if the overhaul application.
It then gets easier in an e-commerce application, for example, to scale the Payment service independently of the Cart service based on customer activity than to scale the web services independently of the job queues in an abstract way.
The way those Bounded Contexts will communicate, i.e request based or event based, depends on the the specific relation between them. To use the same example a Cart may generate an event that will trigger the Payment, while the same Cart may need to request the Inventory before validating the order.
And at the end of a day those Domain Services* and Bounded Contexts can be implemented the same when starting with a monolith, even the Bounded Contexts communication can be. The underlying communication protocol becomes an implementation detail that can easily(kinda) be switch when transitioning to a distributed a.k.a microservices architecture.
It seems that in the traditional microservice architecture, each service gets its own database with a different understanding of the data (described here). Sometimes it is considered permissible for databases to duplicate data. For instance, the "Users" service might know essentially everything about a user, whereas the "Posts" service might just store primary keys and usernames (so that the author of a post can have their name displayed, for instance). This page talks about eventual consistency, sources of truth, and other related concepts when data is duplicated. I understand that microservice architectures sometimes include a shared database, but most places I look suggest that this is a rare strategy.
As for why each service typically gets its own database, all I've seen so far is "so that each service owns its own resources," but I'm not convinced that a) the service layer in any way "owns" the persisted resources accessed through the database to begin with, or that b) services even need to own the resources they require rather than accessing necessary subsets of the master resources through a shared database.
So what are some of the justifications that each service in a microservice architecture should get its own database?
There are a few reasons why it does make sense to use a separate database per micro-service. Some of them are:
Scaling
Splitting your domain in micro-services is fine. You can scale your particular micro-service on the deployed web-server on demand or scale out as needed. That it obviously one of the benefits when using micro-services. More importantly you can have micro-service-1 running for example on 10 servers as it demands this traffic but micro-service-2 only requires 1 web-server so you deploy it on 1 server. The good thing is that you control this and you can manage your computing resources like in order to save money as Cloud providers are not cheap.
Considering this what about the database?
If you have one database for multiple services you could not do this. You could not scale the databases individually as they would be on one server.
Data partitioning to reduce size
Automatically as you split your domain in micro-services with each containing 1 database you split the amount of data that is stored in each database. Ideally if you do this you can have smaller database servers with less computing power and/or RAM.
In general paying for multiple small servers is cheaper then one large one.
So in this case you could make use of this fact and save some resources as well.
If it happens that the already spited by domain database have large amount of data techniques like data sharding or data partitioning could be applied additional, but this is another topic.
Which db technology fits the business requirement
This is very important pro fact for having multiple databases. It would allow you to pick the database technology which fits your Business requirement best in order to get the best performance or usage of it. For example some specific micro-service might have some Read-heavy operations with very complex filter options and a full text search requirement. Using Elastic Search in this case would be a good choice. Some other micro-service might use SQL Server as it requires SQL specific features like transnational behavior or similar. If for some reason you have one database for all services you would be stuck with the particular database technology which might not be so performant for those requirement. It is a compromise for sure.
Developer discipline
If for some reason you would have a couple micro-services which would share their database you would need to deal with the human factor. The developers would need to be disciplined to not cross domains and access/modify the other micro-services database(tables, collections and etc) which would be hard to achieve and control. In large organisations with a lot of developers this could be a serious problem. With a hard/physical split this is not an issue.
Summary
There are some arguments for having database per micro-service but also some against it. In general the guidelines and suggestions when using micro-services are to have the micro-service together with its data autonomous in order to work independent in Ideal case(this is not the case always). It is defiantly a compromise as well as using micro-services in general. As always the rule is the rule but there are exceptions to it. Micro-services architecture is flexible and very dependent of your Domain needs and requirements. If you and your team identify that it makes sense to merge multiple micro-service databases to 1 and that it solves a lot of your problems then go for it.
Microservices
Microservices advocate design constraints where each service is developed, deployed and scaled independently. This philosophy is only possible if you have database per service. How can i continue my business if i have DB failure and what steps i can take to mitigate this?DB is essential part of any enterprise application. I agree there are different number of challenges when services has its own databases.
Why Independent database?
Unlike other approaches this approach not only keeps your code-base clean and extendable but you truly omit the single point of failure in your business. To achieve this services sometimes can have duplicated data as well, as long as my service is autonomous and services can only be autonomous if i have database per service.
From business point of view, Lets take eCommerce application. you have microserivces like Booking, Order, Payment, Recommendation , search and so on. Database is shared. What happens if the DB is down ? All your services are down ! and there is no point using Microservies architecture other than you have clean code base.
If you have each service having it's own database , i don't mind if my recommendation service is not working but i can still search and book the order and i haven't lost the customer. that's the whole point.
It comes at cost and challenges, but in longer run it pays off.
SQL / NoSQL
Each service has it's own needs. To get the best performance I can use SQL for payment service (transaction) and I can use (I should) NoSQL for recommendation service. Shared database wouldn't help me in this case. In modern cloud Architectures like CQRS, Event Sourcing, Materialized views, we sometimes use 2 different databases for same service to get the performance out of it.
Again Database per service is not only about resources or how much data should it own. But we really have to see the bigger picture. Yes we have certain practices how much data and duplication is good or bad but that's another debate.
Hope that helps !
I am working on a jobs site where I am thinking of breaking out the jobs matching section into a micro service - everything else is a monolith.
But when thinking about how the microservice should have its own separate database, that would mean having the microservice have a separate copy of all the jobs, given the monolith would still handle all job crud functionality.
Am I thinking about this the right way and is it normal to have multiple copies of the same data spread out across different microservices?
The idea of having different databases with the same data scares me a bit, since that creates the potential for things to get out of sync.
You are trying to go away from monolith and the approach you are taking is very common, to take out part from monolith which can be converted into a microservice. Monolith starts to shrink over time and you have more number of MSs.
Coming to your question of data duplicacy, yes this is a challenge and some data needs to be duplicated but this vary case to case and difficult to say without looking into application.
You may expose API so monolith can get/create the data if needed and I strongly suggest not to sacrifice or compromise data model of microservice to avoid duplicacy, because MS will be going to more important than your monolith in future. Keep in mind you should avoid adding any new code to the monolith and even if you have to, for data ask the MS instead of the monolith.
One more thing you can try, instead of REST API call between microservices, you can use caching mechanism with event bus. Every microservice will publish CRUD changes to event bus, interested micro-service consume those events & update local cache accordingly.
Problem with REST call is, in some situation when dependent service is down we can not query main microservice, which could become bottleneck sometime.
To anyone with real world experience breaking a monolith into separate modules and services.
I am asking this question having already read the MonolithFirst blog entry by Martin Fowler. When taking a monolith and breaking it into microservices the "size" element of the equation is the one that I ponder over the most. Specifically, how to approach breaking a monolith application (we're talking 2001: A Space Oddessy; as in it is that old and that large) into micro services without getting overly fine grained or staying too monolithic. The end goal is creating separate modules that can be upgraded indepenently and scaled independently.
What are some recommended best practices based on personal experience of breaking a monolith into microservices?
The rule of thumb is breaking the monolith based on bounded context . The most common way of defining the bounded context is using BU ( Business Unit) . For example the module which does actual payment is mostly a separate BU .
The second thing to consider is the overhead micro-services bring. You should analyse the hardware , monitoring , infra pieces before completely breaking the service. What I have seen is people taking smaller microservices out of monolith instead of going and writing say 10 new service and depreciating the monolith.
My advice will be have an incremental approach . Take the first BU which is being worked upon out of monolith. This will also give a goos learning curve for the whole team.
You should clearly distinguish sub-domain areas (bounded contexts) from you domain.
Usually (if everything is fine with your architecture) you already have some separate components in your monolith application which responsible for each sub-domain. These components interact with each other in one process
(in monolith application) and you should to think about how to put them into separate processes. Of course you need to produce a lot of refactoring when moving one by one parts of the monolith to microservices.
Always remember that every microservice is responsible for some sub-domain.
I strongly recommend you to learn Domain Driven Design.
Domain-Driven Design: Tackling Complexity in the Heart of Software by Eric Evans
Implementing Domain-Driven Design by Vaughn Vernon
Also learn CQRS pattern
At the beginning you also should decide how your micservices will interact with each other.
There are several options:
Direct calls from one service to another
Send messages through some dispatcher service
which abstracts the client service from the knowledge where the called (destination) services are located.
This approach is similar to how proxy server like NGINX works.
Interact through some messaging bus (middleware), like RabbitMQ
You can combine these options, for example Query requests can be processed through Dispatcher Service, Commands and Events through message bus.
From my experience the biggest problem will be to go away from a single database,
which monolith applications is usually used.
In addition some good practices:
Put each microservice in own repository - this isolates from the ability to directly use the code of one micro service in another.
You also get faster checkouts and builds of each microservice on CI.
Interactions with any service should occur only through its public contracts.
It is necessary to aspire that each microservice has its own database
Example of the sub-domains (bounded contexts) for some Tourism Industry application.
Each bounded context can be serviced by a microservice.
We also started our journey some time back and i started writing a blog series for exactly the same thing: https://dzone.com/articles/how-i-started-my-journey-in-micro-services-and-how
Basically what i understood is to break my problem in diff. microservices, i need a design framework which Domain Driven Design gives(Domain Driven Design Distilled Book by Vaugh Vernon).
Then to implement the design (using CQRS and Event Sourcing and ...) i need a framework which provides all the above support.
I found Lagom good for this.(Eventuate , Spring Microservices are some other choices).
Sample Microservices Domain analysis using Domain Driven Design by Microsoft: https://learn.microsoft.com/en-us/azure/architecture/microservices/domain-analysis
One more analysis is: http://cqrs.nu/tutorial/cs/01-design
After reading on Domain Driven Design i think lagom and above links will help you to build a end to end application. If still any doubts , please raise :)