We have a monolithic Web API layer in our application with a hundred end points. I am trying to break it into microservices using Azure Service Fabric.
When we break them into multiple services, we may end up having duplicate code.
Example: Let's say we have an Account Services to create an account. And there is a payment service to apply payments to transactions.
In this case, both services need the Customer class/domain. Probably the Account Services need an exhaustive customer with full details, but the payment might need a light weight one.
The question is do we need to copy several domain entities, and other layers like this? Doesn't that create more maintenance issues?
If we don't we end up copying the code and creating different services, one monolithic service same is the existing Web API.
Any thoughts on this?
2ndly, we have some cases where transactions are mentioned today. If we separate them, is there any good design to record failures and rollback without trying too much to maintain transactions?
Breaking a monolith up into proper microservices with appropriate boundaries for your domain is certainly more of an art than a science. The prerequisite to taking on such a task is a thorough understanding of your domain and the interactions within, and you won't get it right the first time. One of points that Evans makes in his book on Domain-Driven Design is that for any sufficiently complex domain, the domain model continually evolves because your understanding of the domain is continually evolving; you will understand it a little better tomorrow than you do today. That said, don't be afraid to start when you have an understanding that is "good enough" and be willing to adapt/evolve your model.
I don't know your domain, but it sounds to me like you need to first figure out in which bounded context Customer primarily belongs. Yes, you want to minimize duplication of domain logic, and though it may not fit completely and neatly into a single service, to the extent that you make one service take primary responsibility for accessing, persisting, manipulating, validating, and ensuring the integrity of a Customer, the better off you'll be.
From your question, I see two possibilities:
The Account Services bounded context is the primary stakeholder in Customer, and Customer has non-trivial ties to other Account Services entities and services. It's difficult to draw clear boundaries around a Customer in isolation. In this case, Customer belongs in the Account Services bounded context.
Customer is an independent enough concept to merit its own microservice. A Customer can stand alone. In this case, Customer belongs in its own bounded context.
In either case, great care should be taken to ensure that the Customer-specific domain logic stays centralized in the Customer microservice behind strong boundaries. Other services might use Customer, or perhaps a light-weight (even read-only) CustomerView, but their interactions should go through the Customer service to the extent that they can.
In your question, you indicate that the Payments bounded context will need access to Customer, but it might just need a light-weight version. It should communicate with the Customer service to get that light-weight object. If, during Payments processing you need to update the Customer's billing address for example, Payments should call into the Customer microservice telling it to update its billing address. Payments need not know anything about how to update a Customer's billing address other than the single API call; any domain logic, validation, firing of domain events, etc... that need to happen as part of that operation are contained within the Customer microservice.
Regarding your second question: it's true that atomic transactions become more complex/difficult in a distributed architecture. Do some reading on the Saga pattern: https://blog.couchbase.com/saga-pattern-implement-business-transactions-using-microservices-part/. Also, Jimmy Bogard is currently in the midst of a blog series called
Life Beyond Distributed Transactions: An Apostate's Implementation that may offer some good insights.
Hope this helps!
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 15 days ago.
Improve this question
I have a monolith application that uses one database, and in my company, we decide to rewrite the application and use microservices in the backend.
At this time, we decided NOT to split the database because other applications and processes are using it, and it takes two years to change.
The difficulty in the process is to decompose and identify the right microservices.
I'll try to explain our system by start describing the UI. Please read carefully because I am trying to explain it in detail.
The system displays the stock market data. The Company or Fund or Fund manager in the market is posting everyday reports about the company's activities like status, information for investors, and more.
"breaking announcement" page
displays a list of today's priority reports. Each row contains the subject from the pdf document (the report) that the company is publishing and the company that belongs to the report:
When the user clicks on the row, we redirect to "report page" and which contains the report details:
In the database, we have entities such report, company, company_report, event, public_offers, upcoming_offering, and more.
So to get the list, we run an inner join query like this:
Select ... From report r inner join
company_report cr on r.reportid=cr.reportid
inner join company c on cr.company_cd=c.company_cd
Where ....
Most of our server endpoints are not changing anything but are only used to retrieve the data.
So I'll create this endpoint /reports/breaking-announcement to get the list, and it returns an object like that:
[{ reportId, subject, createAt, updateAt, pdfUrl, company: { id, name } }]
today's companies report page acts like "breaking announcement" page. but the page displays all the reports from today (not necessarily with priority).
disclosures are reports
On this page, we also have a search to get all reports by cretiria for example to get reports by company name. to do that we have autocomplete so the user types the company name or id.
In other to do that we think it should be API endpoint /companies/by-autocomplete and the response will [{ companyId, companyName, isCompany }].
eft page same as before, but this time we display the Funds report's (not a companys reports).
The list contained the fund name and the subject of the report. each click on the row leads to the report detail page (same page).
On this page we have a search by criteria such date-from date-to, name or id of the funds by autocomplete. and endpoint (/funds/by-autocomplete returns [{ fundId, fundName, ...}]).
foreign etf page same as before, list of items. each item is like before:
<fund name>
<subject of the report>
The query is different.
Okay, this was a very long description. thank you for reading.
Now I want to detect what are the microservices for this application.
I endup with:
Report microservice - responsible for getting and handling all the reports in the system.
which have a endpoints like getall, getbyid, get like getbreakingannouncement, getcompanytodayreports, getfunds, getforeignfunds. the report microservice will make a request to company or funds microservice to join the data from the company and build to the response.
company microservice:
handle all companies data. I mean endpoints such getall, getByIds (for report service), getByAutocomplete.
funds microservice:
handle all funds data. I mean endpoints such getall, getByIds (for report service), getByAutocomplete.
There are other services, such as a notification service or email service. but those are not business services. I want to split up my business logic into microservices in order to deploy and maintain them easily.
I'm not sure I decomposing right. maybe I do. but is fit the microservice ideas? it's fit the Pattern: Decompose by business capability
? if not what are the business capability in my system?
I don't think a query-oriented decomposition of your current application monolith will lead to a good microservice (MS) design. Two of your proposed microserivces have the same end-point query API which suggests to me that you are viewing your first-generation microservices as just entity-servers.
Your idea to perform joins on cross MS query operations indicates these first gen "microservices" are closely coupled and hence fall short of a genuine MS architecture.
One technique to verify an MS design is to ask yourself, "how would the whole system cope if one MS is unavailable for 3 minutes?". Solving that design challenge leads down a path towards decoupled message-base interactions between the microservices. And this in turn leads to interactions between Microservices being expressed as business operations where one MS raises messages that trigger a mutation in the state of another MS.
Maybe you should reduce the scope of your MS ambitions and instead look at Schema Stitching in GraphQL. Reading between the lines of your question I think a more realistic first step towards a distributed system would be to create specialised query services with a GraphQL endpoint.
At this time, we decided NOT to split the database because other applications and processes are using it, and it takes two years to change.
I'll try to stop you right here. In general case shared database is a huge antipattern in microservices architecture and should be avoided as much as possible. There are multiple problems here - less transparent dependencies between services which can cause high coupling with all the consequences in development and deployment, increasing chance to eventually end up with distributed monolith instead of microservices, etc.
Other applications and processes using it should not stop you from moving away from it - there are things which allow to mitigate that - you just sync data between services and "legacy" database (asynchronously using basically the same approaches like you will use in your microservices - for example transaction log tailing for example using something like debezium). It have it's own costs but I would argue that it is usually better to pay them upfront then have to pay bigger percentages on the tech debt.
I endup with: ....
I would argue that this split looks more like decomposition by subdomain then by business capability. Which is actually can be quite fine and suits microservices architecture also.
Based on your description I see at least the following business capabilities in your system that can be defined:
View (manage?) breaking announcements
View (manage?) reports
Search (reports?)
Potentially "today's reports" and "Funds reports" can be considered as separate business capabilities.
I want to split up my business logic into microservices in order to deploy and maintain them easily.
Then again - I highly recommend to reconsider not moving away from shared database.
I'm not sure I decomposing right
Without whole overview of the system including amount of data, data flows, resources available for development and competences in the teams, amount of incoming new business requirements, potential vectors of change, etc. it is hard to actually tell.
P.S.
Note that despite the microservices architecture having a lot of popularity it is not always a right solution for a concrete project to go full-blown microservices. If you have quite small team and/or do not handle high loads/large amount of data with various access patterns then potentially you do not need microservices. You still can leverage a lot of approaches used in the microservices architecture though.
Quick question on Foreign key in Microservices. I already tried looking for answer. But, they did not give me the exact answer I was looking for.
Usecase : Every blog post will have many comments. Traditional monolith will have comments table with foreign key to blog post. However in microservice, we will have two services.
Service 1 : Post Microservie with these table fields (PostID, Name, Content)
Service 2 : Comments Microservie with these table fields (CommentID, PostID, Cpmment)
The question is, Do we need "PostID" in service 2 (Comments Microservice) ? I guess the answer is yes, as we need to know which comment belongs to which post. But then, it will create tight coupling? I mean if I delete service 1(Blog post service), it will impact service 2(Comments service) ?
I'm going to use another example I'm more familiar with to explain how I believe most people would do this.
Consider an Order Management System (OMS) and an Inventory Management System (IMS).
When a customer places an order in the company web site, we ask the OMS to create an order entry in the backend (e.g. via an HTTP endpoint).
The OMS system then broadcasts an event e.g. OrderPlaced containing all the details of the customer order. We may have a pub/sub (e.g. Redis), or a queue (e.g. RabbitMQ), or an event stream (e.g. Kafka) where we place the event (although this can be done in many other ways).
The thing is that we have one or more subscribers interested in this event. One of those could be the IMS, which has the responsibility of assigning the best inventory available every time an order is placed.
We can expect that the IMS will keep a copy of the relevant order information it received when it processed the OrderPlaced event such that it does not ask every little detail of the order to the OMS all the time. So, if the IMS needed a join with the order, instead of calling an endpoint in the Order API, it would probably just do a join with its local copy of the orders table.
Say now that our customer called to cancel her order. A customer service representative then cancelled it in the OMS Web User Interface. At that point an event OrderCanceled is broadcast. Guess who is listening for that event? Correct, the IMS receives notification and acts accordingly reversing the inventory assignation and probably even deleting the order record because it is no longer necessary on this domain.
So, as you can see, the best way to do this is by using events and making copies of the relevant details on the other domain.
Since events need time to get broadcast and processed by interested parties, we say that the order data in the IMS is eventually consistent.
Followup Questions
Q: So, if I understood right in microservises we prefer to duplicate data and get better performance? That is the concept? I mean I know the concept is scaling and flexibility but when we must share data we will just duplicate it?
Not really. That´s definitively not what I meant although it may have sounded like that due to my poor choice of words in the original explanation. It appears to me that at the heart of your question lies a lack of sufficient understanding of the concept of a bounded context.
In my explanation I meant to indicate that the OMS has a domain concept known as the order, but so does the IMS. Therefore, they both have an entity within their domain that represents it. There is a good chance that the order entity in the OMS is much richer than the corresponding representation of the same concept in the IMS.
For example, if the system I was describing was not for retail, but for wholesale, then the same concept of a "sales order" in our system corresponds to the concept of a "purchase order" in that of our customers. So you see, the same data, mapped under a different name, simply because under a different bounded context the data may have a different perspective and meaning.
So, this is the realization that a given concept from our model may be represented in multiple bounded contexts, perhaps from a different perspective and names from our ubiquitous language.
Just to give another example, the OMS needs to know about the customer, but the representation of the idea of a customer in the OMS is probably different than the same representation of such a concept or entity in the CRM. In the OMS the customer's name, email, shipping and billing addresses are probably enough representation of this idea, but for the CRM the customer encompasses much more.
Another example: the IMS needs to know the shipping address of the customer to choose the best inventory (e.g. the one in a facility closest to its final destination), but probably does not care much about the billing address. On the other hand, the billing address is fundamental for the Payment Management System (PMS). So, both the IMS and PMS may have a concept of an "order", it is just that it is not exactly the same, neither it has the same meaning or perspective, even if we store the same data.
One final example: the accounting system cares about the inventory for accounting purposes, to be able to tell how much we own, but perhaps accounting does not care about the specific location of the inventory within the warehouse, that's a detail only the IMS cares about.
In conclusion, I would not say this is about "copying data", this is about appropriately representing a fundamental concept within your bounded context and the realization that some concepts from the model may overlap between systems and have different representations, sometimes even under different names and levels of details. That's why I suggested that you investigate the idea of context mapping some more.
In other words, from my perspective, it would be a mistake to assume that the concept of an "order" only exists in the OMS. I could probably say that the OMS is the master of record of orders and that if something happens to an order we should let other interested systems know about those events since they care about some of that data because those other systems could have mapping concepts related to orders and when reacting to the changes in the master of record, they probably want to change their data as well.
From this point of view, copying some data is a side effect of having a proper design for the bounded context and not a goal in itself.
I hope that answers your question.
Let's say we have a system to store appointments. Each appointment has multiple resources (e.g. trainers, rooms, etc.). We have decided to move all appointment data into an Appointment Service and all resources into a Resources Service.
Now we need a UI that shows filters for the appointments, to filter by trainer. Usually, you only want to display checkboxes for trainers that actually have appointments and not all trainers.
That means we can't really use the Resource Service to get all trainers, instead, we would have to ask the Appointment Service to get a grouped view of all trainers that have at least one appointment. Then we would have to call the Resource Service to get more info about each trainer.
So how do you get grouped data from a microservice?
Edit: Each system has it's own database. We also use RabbitMQ to sync data between services.
This is an interesting question with many possible solutions. #Welbog comment makes a good point about it depending on the scale of the application. Denormalized databases are obviously a possibility.
Getting grouped data is one of the challenges of implementing microservices, and this challenge becomes greater the more granular our services get. What does your database setup look like? I'm assuming your two services are using different databases otherwise your question would have a simple solution.
Without knowing the ins and outs of your system, I would assume that denormalizing your db's would be the path of least resistance.
You could possible explore the idea that maybe these two services should in fact be a single service. Nanoservices are not what we are after, and sometimes it just makes more logical sense for two services to actually be together. Things that must change together, should be contained together. I'm not saying this is applicable in your case, I'm just saying it's worth considering.
I'm certain others will have other ideas, but based on what little I know about the entirety of your system, it's hard to say; however I think this is an interesting question that I will follow to see what other peoples proposed solutions are.
I am learning to develop microservices using DDD, CQRS, and ES. It is HTTP RESTful service. The microservices is about online shop. There are several domains like products, orders, suppliers, customers, and so on. The domains built in separate services. How to do the validation if the command payload relates to other domains?
For example, here is the addOrderItemCommand payload in the order service (command-side).
{
"customerId": "CUST111",
"productId": "SKU222",
"orderId":"SO333"
}
How to validate the command above? How to know that the customer is really exists in database (query-side customer service) and still active? How to know that the product is exists in database and the status of the product is published? How to know whether the customer eligible to get the promo price from the related product?
Is it ok to call API directly (like point-to-point / ajax / request promise) to validate this payload in order command-side service? But I think, the performance will get worse if the API called directly just for validation. Because, we have developed an event processor outside the command-service that listen from the event and apply the event to the materalized view.
Thank you.
As there are more than one bounded contexts that need to be queried for the validation to pass you need to consider eventual consistency. That being said, there is always a chance that the process as a whole can be in an invalid state for a "small" amount of time. For example, the user could be deactivated after the command is accepted and before the order is shipped. An online shop is a complex system and exceptions could appear in any of its subsystems. However, being implemented as an event-driven system helps; every time the ordering process enters an invalid state you can take compensatory actions/commands. For example, if the user is deactivated in the meantime you can cancel all its standing orders, release the reserved products, announce the potential customers that have those products in the wishlist that they are not available and so on.
There are many kinds of validation in DDD but I follow the general rule that the validation should be done as early as possible but without compromising data consistency. So, in order to be early you could query the readmodel to reject the commands that couldn't possible be valid and in order for the system to be consistent you need to make another check just before the order is shipped.
Now let's talk about your specific questions:
How to know that the customer is really exists in database (query-side customer service) and still active?
You can query the readmodel to verify that the user exists and it is still active. You should do this as a command that comes from an invalid user is a strong indication of some kind of attack and you don't want those kind of commands passing through your system. However, even if a command passes this check, it does not necessarily mean that the order will be shipped as other exceptions could be raised in between.
How to know that the product is exists in database and the status of the product is published?
Again, you can query the readmodel in order to notify the user that the product is not available at the moment. Or, depending on your business, you could allow the command to pass if you know that those products will be available in less than 24 hours based on some previous statistics (for example you know that TV sets arrive daily in your stock). Or you could let the customer choose whether it waits or not. In this case, if the products are not in stock at the final phase of the ordering (the shipping) you notify the customer that the products are not in stock anymore.
How to know whether the customer eligible to get the promo price from the related product?
You will probably have to query another bounded context like Promotions BC to check this. This depends on how promotions are validated/used.
Is it ok to call API directly (like point-to-point / ajax / request promise) to validate this payload in order command-side service? But I think, the performance will get worse if the API called directly just for validation.
This depends on how resilient you want your system to be and how fast you want to reject invalid commands.
Synchronous call are simpler to implement but they lead to a less resilient system (you should be aware of cascade failures and use technics like circuit breaker to stop them).
Asynchronous (i.e. using events) calls are harder to implement but make you system more resilient. In order to have async calls, the ordering system can subscribe to other systems for events and maintain a private state that can be queried for validation purposes as the commands arrive. In this way, the ordering system continues to work even of the link to inventory or customer management systems are down.
In any case, it really depends on your business and none of us can tell you exaclty what to do.
As always everything depends on the specifics of the domain but as a general principle cross domain validation should be done via the read model.
In this case, I would maintain a read model within each microservice for use in validation. Of course, that brings with it the question of eventual consistency.
How you handle that should come from your understanding of the domain. Factors such as the length of the eventual consistency compared to the frequency of updates should be considered. The cost of getting it wrong for the business compared to the cost of development to minimise the problem. In many cases, just recording the fact there has been a problem is more than adequate for the business.
I have a blog post dedicated to validation which you can find here: How To Validate Commands in a CQRS Application
I am currently building a microservices-based application developed with the mean stack and am running into several situations where I need to share models between bounded contexts.
As an example, I have a User service that handles the registration process as well as login(generate jwt), logout, etc. I also have an File service which handles the uploading of profile pics and other images the user happens to upload. Additionally, I have an Friends service that keeps track of the associations between members.
Currently, I am adding the guid of the user from the user table used by the User service as well as the first, middle and last name fields to the File table and the Friend table. This way I can query for these fields whenever I need them in the other services(Friend and File) without needing to make any rest calls to get the information every time it is queried.
Here is the caveat:
The downside seems to be that I have to, I chose seneca with rabbitmq, notify the File and Friend tables whenever a user updates their information from the User table.
1) Should I be worried about the services getting too chatty?
2) Could this lead to any performance issues, if alot of updates take place over an hour, let's say?
3) in trying to isolate boundaries, I just am not seeing another way of pulling this off. What is the recommended approach to solving this issue and am I on the right track?
It's a trade off. I would personally not store the user details alongside the user identifier in the dependent services. But neither would I query the users service to get this information. What you probably need is some kind of read-model for the system as a whole, which can store this data in a way which is optimized for your particular needs (reporting, displaying together on a webpage etc).
The read-model is a pattern which is popular in the event-driven architecture space. There is a really good article that talks about these kinds of questions (in two parts):
https://www.infoq.com/articles/microservices-aggregates-events-cqrs-part-1-richardson
https://www.infoq.com/articles/microservices-aggregates-events-cqrs-part-2-richardson
Many common questions about microservices seem to be largely around the decomposition of a domain model, and how to overcome situations where requirements such as querying resist that decomposition. This article spells the options out clearly. Definitely worth the time to read.
In your specific case, it would mean that the File and Friends services would only need to store the primary key for the user. However, all services should publish state changes which can then be aggregated into a read-model.
If you are worry about a high volume of messages and high TPS for example 100,000 TPS for producing and consuming events I suggest that Instead of using RabbitMQ use apache Kafka or NATS (Go version because NATS has Rubby version also) in order to support a high volume of messages per second.
Also Regarding Database design you should design each micro-service base business capabilities and bounded-context according to domain driven design (DDD). so because unlike SOA it is suggested that each micro-service should has its own database then you should not be worried about normalization because you may have to repeat many structures, fields, tables and features for each microservice in order to keep them Decoupled from each other and letting them work independently to raise Availability and having scalability.
Also you can use Event sourcing + CQRS technique or Transaction Log Tailing to circumvent 2PC (2 Phase Commitment) - which is not recommended when implementing microservices - in order to exchange events between your microservices and manipulating states to have Eventual Consistency according to CAP theorem.