How to decompose a monolith into microservices by business capability? [closed] - microservices

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 15 days ago.
Improve this question
I have a monolith application that uses one database, and in my company, we decide to rewrite the application and use microservices in the backend.
At this time, we decided NOT to split the database because other applications and processes are using it, and it takes two years to change.
The difficulty in the process is to decompose and identify the right microservices.
I'll try to explain our system by start describing the UI. Please read carefully because I am trying to explain it in detail.
The system displays the stock market data. The Company or Fund or Fund manager in the market is posting everyday reports about the company's activities like status, information for investors, and more.
"breaking announcement" page
displays a list of today's priority reports. Each row contains the subject from the pdf document (the report) that the company is publishing and the company that belongs to the report:
When the user clicks on the row, we redirect to "report page" and which contains the report details:
In the database, we have entities such report, company, company_report, event, public_offers, upcoming_offering, and more.
So to get the list, we run an inner join query like this:
Select ... From report r inner join
company_report cr on r.reportid=cr.reportid
inner join company c on cr.company_cd=c.company_cd
Where ....
Most of our server endpoints are not changing anything but are only used to retrieve the data.
So I'll create this endpoint /reports/breaking-announcement to get the list, and it returns an object like that:
[{ reportId, subject, createAt, updateAt, pdfUrl, company: { id, name } }]
today's companies report page acts like "breaking announcement" page. but the page displays all the reports from today (not necessarily with priority).
disclosures are reports
On this page, we also have a search to get all reports by cretiria for example to get reports by company name. to do that we have autocomplete so the user types the company name or id.
In other to do that we think it should be API endpoint /companies/by-autocomplete and the response will [{ companyId, companyName, isCompany }].
eft page same as before, but this time we display the Funds report's (not a companys reports).
The list contained the fund name and the subject of the report. each click on the row leads to the report detail page (same page).
On this page we have a search by criteria such date-from date-to, name or id of the funds by autocomplete. and endpoint (/funds/by-autocomplete returns [{ fundId, fundName, ...}]).
foreign etf page same as before, list of items. each item is like before:
<fund name>
<subject of the report>
The query is different.
Okay, this was a very long description. thank you for reading.
Now I want to detect what are the microservices for this application.
I endup with:
Report microservice - responsible for getting and handling all the reports in the system.
which have a endpoints like getall, getbyid, get like getbreakingannouncement, getcompanytodayreports, getfunds, getforeignfunds. the report microservice will make a request to company or funds microservice to join the data from the company and build to the response.
company microservice:
handle all companies data. I mean endpoints such getall, getByIds (for report service), getByAutocomplete.
funds microservice:
handle all funds data. I mean endpoints such getall, getByIds (for report service), getByAutocomplete.
There are other services, such as a notification service or email service. but those are not business services. I want to split up my business logic into microservices in order to deploy and maintain them easily.
I'm not sure I decomposing right. maybe I do. but is fit the microservice ideas? it's fit the Pattern: Decompose by business capability
? if not what are the business capability in my system?

I don't think a query-oriented decomposition of your current application monolith will lead to a good microservice (MS) design. Two of your proposed microserivces have the same end-point query API which suggests to me that you are viewing your first-generation microservices as just entity-servers.
Your idea to perform joins on cross MS query operations indicates these first gen "microservices" are closely coupled and hence fall short of a genuine MS architecture.
One technique to verify an MS design is to ask yourself, "how would the whole system cope if one MS is unavailable for 3 minutes?". Solving that design challenge leads down a path towards decoupled message-base interactions between the microservices. And this in turn leads to interactions between Microservices being expressed as business operations where one MS raises messages that trigger a mutation in the state of another MS.
Maybe you should reduce the scope of your MS ambitions and instead look at Schema Stitching in GraphQL. Reading between the lines of your question I think a more realistic first step towards a distributed system would be to create specialised query services with a GraphQL endpoint.

At this time, we decided NOT to split the database because other applications and processes are using it, and it takes two years to change.
I'll try to stop you right here. In general case shared database is a huge antipattern in microservices architecture and should be avoided as much as possible. There are multiple problems here - less transparent dependencies between services which can cause high coupling with all the consequences in development and deployment, increasing chance to eventually end up with distributed monolith instead of microservices, etc.
Other applications and processes using it should not stop you from moving away from it - there are things which allow to mitigate that - you just sync data between services and "legacy" database (asynchronously using basically the same approaches like you will use in your microservices - for example transaction log tailing for example using something like debezium). It have it's own costs but I would argue that it is usually better to pay them upfront then have to pay bigger percentages on the tech debt.
I endup with: ....
I would argue that this split looks more like decomposition by subdomain then by business capability. Which is actually can be quite fine and suits microservices architecture also.
Based on your description I see at least the following business capabilities in your system that can be defined:
View (manage?) breaking announcements
View (manage?) reports
Search (reports?)
Potentially "today's reports" and "Funds reports" can be considered as separate business capabilities.
I want to split up my business logic into microservices in order to deploy and maintain them easily.
Then again - I highly recommend to reconsider not moving away from shared database.
I'm not sure I decomposing right
Without whole overview of the system including amount of data, data flows, resources available for development and competences in the teams, amount of incoming new business requirements, potential vectors of change, etc. it is hard to actually tell.
P.S.
Note that despite the microservices architecture having a lot of popularity it is not always a right solution for a concrete project to go full-blown microservices. If you have quite small team and/or do not handle high loads/large amount of data with various access patterns then potentially you do not need microservices. You still can leverage a lot of approaches used in the microservices architecture though.

Related

Attribute Based Access Control (ABAC) in a microservices architecture for lists of resources

I am investigating options to build a system to provide "Entity Access Control" across a microservices based architecture to restrict access to certain data based on the requesting user. A full Role Based Access Control (RBAC) system has already been implemented to restrict certain actions (based on API endpoints), however nothing has been implemented to restrict those actions against one data entity over another. Hence a desire for an Attribute Based Access Control (ABAC) system.
Given the requirements of the system to be fit-for-purpose and my own priorities to follow best practices for implementations of security logic to remain in a single location I devised to creation of an externalised "Entity Access Control" API.
The end result of my design was something similar to the following image I have seen floating around (I think from axiomatics.com)
The problem is that the whole thing falls over the moment you start talking about an API that responds with a list of results.
Eg. A /api/customers endpoint on a Customers API that takes in parameters such as a query filter, sort, order, and limit/offset values to facilitate pagination, and returns a list of customers to a front end. How do you then also provide ABAC on each of these entities in a microservices landscape?
Terrible solutions to the above problem tested so far:
Get the first page of results, send all of those to the EAC API, get the responses, drop the ones that are rejected from the response, get more customers from the DB, check those... and repeat until either you get a page of results or run out of customers in the DB. Tested that for 14,000 records (which is absolutely within reason in my situation) would take 30 seconds to get an API response for someone who had zero permission to view any customers.
On every request to the all customers endpoint, a request would be sent to the EAC API for every customer available to the original requesting user. Tested that for 14,000 records the response payload would be over half a megabyte for someone who had permission to view all customers. I could split it into multiple requests, but then you are just balancing payload size with request spam and the performance penalty doesn't go anywhere.
Give up on the ability to view multiple records in a list. This totally breaks the APIs use for customer needs.
Store all the data and logic required to perform the ABAC controls in each API. This is fraught with danger and basically guaranteed to fail in a way that is beyond my risk appetite considering the domain I am working within.
Note: I tested with 14,000 records just because its a benchmark of our current state of data. It is entirely feasible that a single API could serve 100,000 or 1m records, so anything that involves iterating over the whole data set or transferring the whole data set over the wire is entirely unsustainable.
So, here lies the question... How do you implement an externalised ABAC system in a microservices architecture (as per the diagram) whilst also being able to service requests that respond with multiple entities with a query filter, sort, order, and limit/offset values to facilitate pagination.
After dozens of hours of research, it was decided that this is an entirely unsolvable problem and is simply a side effect of microservices (and more importantly, segregated entity storage).
If you want the benefits of a maintainable (as in single piece of externalised infrastructure) entity level attribute access control system, a monolithic approach to entity storage is required. You cannot simultaneously reap the benefits of microservices.

How to get grouped data from a microservice?

Let's say we have a system to store appointments. Each appointment has multiple resources (e.g. trainers, rooms, etc.). We have decided to move all appointment data into an Appointment Service and all resources into a Resources Service.
Now we need a UI that shows filters for the appointments, to filter by trainer. Usually, you only want to display checkboxes for trainers that actually have appointments and not all trainers.
That means we can't really use the Resource Service to get all trainers, instead, we would have to ask the Appointment Service to get a grouped view of all trainers that have at least one appointment. Then we would have to call the Resource Service to get more info about each trainer.
So how do you get grouped data from a microservice?
Edit: Each system has it's own database. We also use RabbitMQ to sync data between services.
This is an interesting question with many possible solutions. #Welbog comment makes a good point about it depending on the scale of the application. Denormalized databases are obviously a possibility.
Getting grouped data is one of the challenges of implementing microservices, and this challenge becomes greater the more granular our services get. What does your database setup look like? I'm assuming your two services are using different databases otherwise your question would have a simple solution.
Without knowing the ins and outs of your system, I would assume that denormalizing your db's would be the path of least resistance.
You could possible explore the idea that maybe these two services should in fact be a single service. Nanoservices are not what we are after, and sometimes it just makes more logical sense for two services to actually be together. Things that must change together, should be contained together. I'm not saying this is applicable in your case, I'm just saying it's worth considering.
I'm certain others will have other ideas, but based on what little I know about the entirety of your system, it's hard to say; however I think this is an interesting question that I will follow to see what other peoples proposed solutions are.

Monolithic Web API to microservice design

We have a monolithic Web API layer in our application with a hundred end points. I am trying to break it into microservices using Azure Service Fabric.
When we break them into multiple services, we may end up having duplicate code.
Example: Let's say we have an Account Services to create an account. And there is a payment service to apply payments to transactions.
In this case, both services need the Customer class/domain. Probably the Account Services need an exhaustive customer with full details, but the payment might need a light weight one.
The question is do we need to copy several domain entities, and other layers like this? Doesn't that create more maintenance issues?
If we don't we end up copying the code and creating different services, one monolithic service same is the existing Web API.
Any thoughts on this?
2ndly, we have some cases where transactions are mentioned today. If we separate them, is there any good design to record failures and rollback without trying too much to maintain transactions?
Breaking a monolith up into proper microservices with appropriate boundaries for your domain is certainly more of an art than a science. The prerequisite to taking on such a task is a thorough understanding of your domain and the interactions within, and you won't get it right the first time. One of points that Evans makes in his book on Domain-Driven Design is that for any sufficiently complex domain, the domain model continually evolves because your understanding of the domain is continually evolving; you will understand it a little better tomorrow than you do today. That said, don't be afraid to start when you have an understanding that is "good enough" and be willing to adapt/evolve your model.
I don't know your domain, but it sounds to me like you need to first figure out in which bounded context Customer primarily belongs. Yes, you want to minimize duplication of domain logic, and though it may not fit completely and neatly into a single service, to the extent that you make one service take primary responsibility for accessing, persisting, manipulating, validating, and ensuring the integrity of a Customer, the better off you'll be.
From your question, I see two possibilities:
The Account Services bounded context is the primary stakeholder in Customer, and Customer has non-trivial ties to other Account Services entities and services. It's difficult to draw clear boundaries around a Customer in isolation. In this case, Customer belongs in the Account Services bounded context.
Customer is an independent enough concept to merit its own microservice. A Customer can stand alone. In this case, Customer belongs in its own bounded context.
In either case, great care should be taken to ensure that the Customer-specific domain logic stays centralized in the Customer microservice behind strong boundaries. Other services might use Customer, or perhaps a light-weight (even read-only) CustomerView, but their interactions should go through the Customer service to the extent that they can.
In your question, you indicate that the Payments bounded context will need access to Customer, but it might just need a light-weight version. It should communicate with the Customer service to get that light-weight object. If, during Payments processing you need to update the Customer's billing address for example, Payments should call into the Customer microservice telling it to update its billing address. Payments need not know anything about how to update a Customer's billing address other than the single API call; any domain logic, validation, firing of domain events, etc... that need to happen as part of that operation are contained within the Customer microservice.
Regarding your second question: it's true that atomic transactions become more complex/difficult in a distributed architecture. Do some reading on the Saga pattern: https://blog.couchbase.com/saga-pattern-implement-business-transactions-using-microservices-part/. Also, Jimmy Bogard is currently in the midst of a blog series called
Life Beyond Distributed Transactions: An Apostate's Implementation that may offer some good insights.
Hope this helps!

Microservices: model sharing between bounded contexts

I am currently building a microservices-based application developed with the mean stack and am running into several situations where I need to share models between bounded contexts.
As an example, I have a User service that handles the registration process as well as login(generate jwt), logout, etc. I also have an File service which handles the uploading of profile pics and other images the user happens to upload. Additionally, I have an Friends service that keeps track of the associations between members.
Currently, I am adding the guid of the user from the user table used by the User service as well as the first, middle and last name fields to the File table and the Friend table. This way I can query for these fields whenever I need them in the other services(Friend and File) without needing to make any rest calls to get the information every time it is queried.
Here is the caveat:
The downside seems to be that I have to, I chose seneca with rabbitmq, notify the File and Friend tables whenever a user updates their information from the User table.
1) Should I be worried about the services getting too chatty?
2) Could this lead to any performance issues, if alot of updates take place over an hour, let's say?
3) in trying to isolate boundaries, I just am not seeing another way of pulling this off. What is the recommended approach to solving this issue and am I on the right track?
It's a trade off. I would personally not store the user details alongside the user identifier in the dependent services. But neither would I query the users service to get this information. What you probably need is some kind of read-model for the system as a whole, which can store this data in a way which is optimized for your particular needs (reporting, displaying together on a webpage etc).
The read-model is a pattern which is popular in the event-driven architecture space. There is a really good article that talks about these kinds of questions (in two parts):
https://www.infoq.com/articles/microservices-aggregates-events-cqrs-part-1-richardson
https://www.infoq.com/articles/microservices-aggregates-events-cqrs-part-2-richardson
Many common questions about microservices seem to be largely around the decomposition of a domain model, and how to overcome situations where requirements such as querying resist that decomposition. This article spells the options out clearly. Definitely worth the time to read.
In your specific case, it would mean that the File and Friends services would only need to store the primary key for the user. However, all services should publish state changes which can then be aggregated into a read-model.
If you are worry about a high volume of messages and high TPS for example 100,000 TPS for producing and consuming events I suggest that Instead of using RabbitMQ use apache Kafka or NATS (Go version because NATS has Rubby version also) in order to support a high volume of messages per second.
Also Regarding Database design you should design each micro-service base business capabilities and bounded-context according to domain driven design (DDD). so because unlike SOA it is suggested that each micro-service should has its own database then you should not be worried about normalization because you may have to repeat many structures, fields, tables and features for each microservice in order to keep them Decoupled from each other and letting them work independently to raise Availability and having scalability.
Also you can use Event sourcing + CQRS technique or Transaction Log Tailing to circumvent 2PC (2 Phase Commitment) - which is not recommended when implementing microservices - in order to exchange events between your microservices and manipulating states to have Eventual Consistency according to CAP theorem.

Access and scheduling of FHIR Questionnaire resource

I am trying to understand how to use the FHIR Questionnaire resource, and have a specific question regarding this.
My project is specifically regarding how a citizen in our country could be responding to Questionnaires via a web app, which are then submitted to the FHIR server as QuestionnaireAnswers, to be read/analyzed by a health professional.
A FHIR-based system could have lots of Questionnaires (Qs), groups of Qs or even specific Qs would be targeted towards certain users or groups of users. The display of the questionnare to the citizen could also be based on a Care-plan of a sort, for example certain Questionnaires needing filling-in in the weeks after surgery. The Questionnaires could also be regular ones that need to be filled in every day or week permanently, to support data collection on the state of a chronic disease.
What I'm wondering is if FHIR has a resource which fits into organizing the 'logistics' of displaying the right form to the right person. I can see CarePlan, which seems to partly fit. Or is this something that would typically be handled out-of-FHIR-scope by specific server implementations?
So, to summarize:
Which resource or mechanism would a health professional use to set up that a patient should answer certain Questionnaires, either regularly or as part of for example a follow-up after a surgery. So this would include setting up the schedule for the form(s) to be filled in, and possibly configure what would happen if the form wasn't filled in as required.
Which resource (possibly the same) or mechanism would be used for the patient's web app to retrieve the relevant Questionnaire(s) at a given point in time?
At the moment, the best resource for saying "please capture data of type X on schedule Y" would be DiagnosticOrder, though the description probably doesn't make that clear. (If you'd be willing to click the "Propose a change" link and submit a change request for us to clarify, that'd be great.) If you wanted to order multiple questionnaires, then CarePlan would be a way to group that.
The process of taking a complex schedule (or set of schedules) and turning that into a simple list of "do this now" requests that might be more suitable for a mobile application to deal with is scheduled for DSTU 2.1. Until then, you have a few options for the mobile app:
- have it look at the CarePlan and complex DiagnosticOrder schedule and figure things out itself
- have a server generate a List of mini 1-time DiagnosticOrders and/or Orders identifying the specific "answer" times
- roll your own mechanism using the Other/Basic resource
Depending on your timelines, you might want to stay tuned to discussions by the Patient Care and Orders and Observations work groups as they start dealing with the issues around workflow management starting next month in Atlanta.

Resources