Microservices: model sharing between bounded contexts - mean-stack

I am currently building a microservices-based application developed with the mean stack and am running into several situations where I need to share models between bounded contexts.
As an example, I have a User service that handles the registration process as well as login(generate jwt), logout, etc. I also have an File service which handles the uploading of profile pics and other images the user happens to upload. Additionally, I have an Friends service that keeps track of the associations between members.
Currently, I am adding the guid of the user from the user table used by the User service as well as the first, middle and last name fields to the File table and the Friend table. This way I can query for these fields whenever I need them in the other services(Friend and File) without needing to make any rest calls to get the information every time it is queried.
Here is the caveat:
The downside seems to be that I have to, I chose seneca with rabbitmq, notify the File and Friend tables whenever a user updates their information from the User table.
1) Should I be worried about the services getting too chatty?
2) Could this lead to any performance issues, if alot of updates take place over an hour, let's say?
3) in trying to isolate boundaries, I just am not seeing another way of pulling this off. What is the recommended approach to solving this issue and am I on the right track?

It's a trade off. I would personally not store the user details alongside the user identifier in the dependent services. But neither would I query the users service to get this information. What you probably need is some kind of read-model for the system as a whole, which can store this data in a way which is optimized for your particular needs (reporting, displaying together on a webpage etc).
The read-model is a pattern which is popular in the event-driven architecture space. There is a really good article that talks about these kinds of questions (in two parts):
https://www.infoq.com/articles/microservices-aggregates-events-cqrs-part-1-richardson
https://www.infoq.com/articles/microservices-aggregates-events-cqrs-part-2-richardson
Many common questions about microservices seem to be largely around the decomposition of a domain model, and how to overcome situations where requirements such as querying resist that decomposition. This article spells the options out clearly. Definitely worth the time to read.
In your specific case, it would mean that the File and Friends services would only need to store the primary key for the user. However, all services should publish state changes which can then be aggregated into a read-model.

If you are worry about a high volume of messages and high TPS for example 100,000 TPS for producing and consuming events I suggest that Instead of using RabbitMQ use apache Kafka or NATS (Go version because NATS has Rubby version also) in order to support a high volume of messages per second.
Also Regarding Database design you should design each micro-service base business capabilities and bounded-context according to domain driven design (DDD). so because unlike SOA it is suggested that each micro-service should has its own database then you should not be worried about normalization because you may have to repeat many structures, fields, tables and features for each microservice in order to keep them Decoupled from each other and letting them work independently to raise Availability and having scalability.
Also you can use Event sourcing + CQRS technique or Transaction Log Tailing to circumvent 2PC (2 Phase Commitment) - which is not recommended when implementing microservices - in order to exchange events between your microservices and manipulating states to have Eventual Consistency according to CAP theorem.

Related

How to decompose a monolith into microservices by business capability? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 15 days ago.
Improve this question
I have a monolith application that uses one database, and in my company, we decide to rewrite the application and use microservices in the backend.
At this time, we decided NOT to split the database because other applications and processes are using it, and it takes two years to change.
The difficulty in the process is to decompose and identify the right microservices.
I'll try to explain our system by start describing the UI. Please read carefully because I am trying to explain it in detail.
The system displays the stock market data. The Company or Fund or Fund manager in the market is posting everyday reports about the company's activities like status, information for investors, and more.
"breaking announcement" page
displays a list of today's priority reports. Each row contains the subject from the pdf document (the report) that the company is publishing and the company that belongs to the report:
When the user clicks on the row, we redirect to "report page" and which contains the report details:
In the database, we have entities such report, company, company_report, event, public_offers, upcoming_offering, and more.
So to get the list, we run an inner join query like this:
Select ... From report r inner join
company_report cr on r.reportid=cr.reportid
inner join company c on cr.company_cd=c.company_cd
Where ....
Most of our server endpoints are not changing anything but are only used to retrieve the data.
So I'll create this endpoint /reports/breaking-announcement to get the list, and it returns an object like that:
[{ reportId, subject, createAt, updateAt, pdfUrl, company: { id, name } }]
today's companies report page acts like "breaking announcement" page. but the page displays all the reports from today (not necessarily with priority).
disclosures are reports
On this page, we also have a search to get all reports by cretiria for example to get reports by company name. to do that we have autocomplete so the user types the company name or id.
In other to do that we think it should be API endpoint /companies/by-autocomplete and the response will [{ companyId, companyName, isCompany }].
eft page same as before, but this time we display the Funds report's (not a companys reports).
The list contained the fund name and the subject of the report. each click on the row leads to the report detail page (same page).
On this page we have a search by criteria such date-from date-to, name or id of the funds by autocomplete. and endpoint (/funds/by-autocomplete returns [{ fundId, fundName, ...}]).
foreign etf page same as before, list of items. each item is like before:
<fund name>
<subject of the report>
The query is different.
Okay, this was a very long description. thank you for reading.
Now I want to detect what are the microservices for this application.
I endup with:
Report microservice - responsible for getting and handling all the reports in the system.
which have a endpoints like getall, getbyid, get like getbreakingannouncement, getcompanytodayreports, getfunds, getforeignfunds. the report microservice will make a request to company or funds microservice to join the data from the company and build to the response.
company microservice:
handle all companies data. I mean endpoints such getall, getByIds (for report service), getByAutocomplete.
funds microservice:
handle all funds data. I mean endpoints such getall, getByIds (for report service), getByAutocomplete.
There are other services, such as a notification service or email service. but those are not business services. I want to split up my business logic into microservices in order to deploy and maintain them easily.
I'm not sure I decomposing right. maybe I do. but is fit the microservice ideas? it's fit the Pattern: Decompose by business capability
? if not what are the business capability in my system?
I don't think a query-oriented decomposition of your current application monolith will lead to a good microservice (MS) design. Two of your proposed microserivces have the same end-point query API which suggests to me that you are viewing your first-generation microservices as just entity-servers.
Your idea to perform joins on cross MS query operations indicates these first gen "microservices" are closely coupled and hence fall short of a genuine MS architecture.
One technique to verify an MS design is to ask yourself, "how would the whole system cope if one MS is unavailable for 3 minutes?". Solving that design challenge leads down a path towards decoupled message-base interactions between the microservices. And this in turn leads to interactions between Microservices being expressed as business operations where one MS raises messages that trigger a mutation in the state of another MS.
Maybe you should reduce the scope of your MS ambitions and instead look at Schema Stitching in GraphQL. Reading between the lines of your question I think a more realistic first step towards a distributed system would be to create specialised query services with a GraphQL endpoint.
At this time, we decided NOT to split the database because other applications and processes are using it, and it takes two years to change.
I'll try to stop you right here. In general case shared database is a huge antipattern in microservices architecture and should be avoided as much as possible. There are multiple problems here - less transparent dependencies between services which can cause high coupling with all the consequences in development and deployment, increasing chance to eventually end up with distributed monolith instead of microservices, etc.
Other applications and processes using it should not stop you from moving away from it - there are things which allow to mitigate that - you just sync data between services and "legacy" database (asynchronously using basically the same approaches like you will use in your microservices - for example transaction log tailing for example using something like debezium). It have it's own costs but I would argue that it is usually better to pay them upfront then have to pay bigger percentages on the tech debt.
I endup with: ....
I would argue that this split looks more like decomposition by subdomain then by business capability. Which is actually can be quite fine and suits microservices architecture also.
Based on your description I see at least the following business capabilities in your system that can be defined:
View (manage?) breaking announcements
View (manage?) reports
Search (reports?)
Potentially "today's reports" and "Funds reports" can be considered as separate business capabilities.
I want to split up my business logic into microservices in order to deploy and maintain them easily.
Then again - I highly recommend to reconsider not moving away from shared database.
I'm not sure I decomposing right
Without whole overview of the system including amount of data, data flows, resources available for development and competences in the teams, amount of incoming new business requirements, potential vectors of change, etc. it is hard to actually tell.
P.S.
Note that despite the microservices architecture having a lot of popularity it is not always a right solution for a concrete project to go full-blown microservices. If you have quite small team and/or do not handle high loads/large amount of data with various access patterns then potentially you do not need microservices. You still can leverage a lot of approaches used in the microservices architecture though.

Eventually consistent DB : How to deal with relational data?

So let's say we have microservices that uses an event broker to communicate each other.
To secure sovereignty of data, each microservices has denormalized documents.
So whenever the data is changed, from the service changed the data, 'DataAHasChanged' event gets fired. Next, all the microservices that have subscribed this event will change document they have to maintain consistency of data A. (A here is not foreign key, but it's actual data, since it's denormalized)
This seems really not good to me if services have multiple documents that have data A. And if data A is changing often. I would just send API call to other services using data A's ID as a foreign key.
Real world use case would be:
User creates 'contract requests' and it has multiple vendor information.
Vendors information will be changed often.
So if there are 2000 contract requests. It means whenever vendor changes their information. We should go through every contract requests and change the denormalized document.
Is eventual consistency still the best practice in this case? or should I just use synchronous call to just read data from vendor service?
Thank you.
I would revisit the microservices decoupling and would ask a question - who is the source of truth for each type of data? You'll probably arrive to one service owning documents and that service will be responsible for updating those documents as well.
Even with a dedicated service owning documents, you still have to answer what are the consistency guarantees you need. Usually you start with SLA's - how available your service should be? How the data is stored? Often the underlaying data storage will dictate those.
Also, I would like to note that even with synchronous calls your system will be eventually consistent - since it takes time to execute all those calls, it will be a period when the system as a whole might see non-latest data.
If you really need true strong consistency, you may will have to pick right storage for that. I would go with a strongly consistent option assuming my performance and availability goals are met. And the reason for strong consistency - it is much easier to reason about; hence the system gets simpler.

Is event sourcing an overkill for simple incremental form updates?

I am working in a web app which implements backend in event sourcing. Event sourcing has given us great power to go back in time, run projections to get different types of reports. Also, we can potentially build our database from scratch by replaying the projections if we need to.
We have certain modules which do not give much analytical value by implementing event sourcing in it. For example, a questionnaire creation, which is nothing but a simple form CRUD. We have event sourcing in it, but the only advantage we potentially have from that is to rebuild the forms database from the stored domain events. Or to get values like how much time a user took to make the questionnaire etc.
But still those analytics do not give us much info because the state change in a form is not as valuable as other parts of the system. Like e.g. changing state of a bank account through domain events give us much more information as compared to a changing state of a form CRUD.
How do you guys approach such situations and know if a certain part of the app is good for event sourcing or if it is an overkill?
Whether or not it's overkill is a matter of opinion, but CRUD (or really CUD, since a read isn't a meaningful event) events (e.g. WidgetCreated, WidgetUpdated, WidgetDeleted; especially the WidgetUpdated) can be a sign of an anemic domain.
Assuming that each update is atomic in your DB, you can likely get the same results (a stream of events for other components to consume) by using change data capture (e.g. Debezium for many SQL DBs to put a change feed into a Kafka topic, or some DBs like Azure Cosmos offer a native change feed) to capture changes to records in the DB.
Event sourcing now with CRUD events does give you the flexibility to flesh out the domain model later if requirements change. That requires a sense of how likely (and when...) the requirements are to change in a way that makes richer domain events handy vs. how much effort you're expending in event-sourcing now.

Attribute Based Access Control (ABAC) in a microservices architecture for lists of resources

I am investigating options to build a system to provide "Entity Access Control" across a microservices based architecture to restrict access to certain data based on the requesting user. A full Role Based Access Control (RBAC) system has already been implemented to restrict certain actions (based on API endpoints), however nothing has been implemented to restrict those actions against one data entity over another. Hence a desire for an Attribute Based Access Control (ABAC) system.
Given the requirements of the system to be fit-for-purpose and my own priorities to follow best practices for implementations of security logic to remain in a single location I devised to creation of an externalised "Entity Access Control" API.
The end result of my design was something similar to the following image I have seen floating around (I think from axiomatics.com)
The problem is that the whole thing falls over the moment you start talking about an API that responds with a list of results.
Eg. A /api/customers endpoint on a Customers API that takes in parameters such as a query filter, sort, order, and limit/offset values to facilitate pagination, and returns a list of customers to a front end. How do you then also provide ABAC on each of these entities in a microservices landscape?
Terrible solutions to the above problem tested so far:
Get the first page of results, send all of those to the EAC API, get the responses, drop the ones that are rejected from the response, get more customers from the DB, check those... and repeat until either you get a page of results or run out of customers in the DB. Tested that for 14,000 records (which is absolutely within reason in my situation) would take 30 seconds to get an API response for someone who had zero permission to view any customers.
On every request to the all customers endpoint, a request would be sent to the EAC API for every customer available to the original requesting user. Tested that for 14,000 records the response payload would be over half a megabyte for someone who had permission to view all customers. I could split it into multiple requests, but then you are just balancing payload size with request spam and the performance penalty doesn't go anywhere.
Give up on the ability to view multiple records in a list. This totally breaks the APIs use for customer needs.
Store all the data and logic required to perform the ABAC controls in each API. This is fraught with danger and basically guaranteed to fail in a way that is beyond my risk appetite considering the domain I am working within.
Note: I tested with 14,000 records just because its a benchmark of our current state of data. It is entirely feasible that a single API could serve 100,000 or 1m records, so anything that involves iterating over the whole data set or transferring the whole data set over the wire is entirely unsustainable.
So, here lies the question... How do you implement an externalised ABAC system in a microservices architecture (as per the diagram) whilst also being able to service requests that respond with multiple entities with a query filter, sort, order, and limit/offset values to facilitate pagination.
After dozens of hours of research, it was decided that this is an entirely unsolvable problem and is simply a side effect of microservices (and more importantly, segregated entity storage).
If you want the benefits of a maintainable (as in single piece of externalised infrastructure) entity level attribute access control system, a monolithic approach to entity storage is required. You cannot simultaneously reap the benefits of microservices.

Eventual Consistency in microservice-based architecture temporarily limits functionality

I'll illustrate my question with Twitter. For example, Twitter has microservice-based architecture which means that different processes are in different servers and have different databases.
A new tweet appears, server A stored in its own database some data, generated new events and fired them. Server B and C didn't get these events at this point and didn't store anything in their databases nor processed anything.
The user that created the tweet wants to edit that tweet. To achieve that, all three services A, B, C should have processed all events and stored to db all required data, but service B and C aren't consistent yet. That means that we are not able to provide edit functionality at the moment.
As I can see, a possible workaround could be in switching to immediate consistency, but that will take away all microservice-based architecture benefits and probably could cause problems with tight coupling.
Another workaround is to restrict user's actions for some time till data aren't consistent across all necessary services. Probably a solution, depends on customer and his business requirements.
And another workaround is to add additional logic or probably service D that will store edits as user's actions and apply them to data only when they will be consistent. Drawback is very increased complexity of the system.
And there are two-phase commits, but that's 1) not really reliable 2) slow.
I think slowness is a huge drawback in case of such loads as Twitter has. But probably it could be solved, whereas lack of reliability cannot, again, without increased complexity of a solution.
So, the questions are:
Are there any nice solutions to the illustrated situation or only things that I mentioned as workarounds? Maybe some programming platforms or databases?
Do I misunderstood something and some of workarounds aren't correct?
Is there any other approach except Eventual Consistency that will guarantee that all data will be stored and all necessary actions will be executed by other services?
Why Eventual Consistency has been picked for this use case? As I can see, right now it is the only way to guarantee that some data will be stored or some action will be performed if we are talking about event-driven approach when some of services will start their work when some event is fired, and following my example, that event would be “tweet is created”. So, in case if services B and C go down, I need to be able to perform action successfully when they will be up again.
Things I would like to achieve are: reliability, ability to bear high loads, adequate complexity of solution. Any links on any related subjects will be very much appreciated.
If there are natural limitations of this approach and what I want cannot be achieved using this paradigm, it is okay too. I just need to know that this problem really isn't solved yet.
It is all about tradeoffs. With eventual consistency in your example it may mean that the user cannot edit for maybe a few seconds since most of the eventual consistent technologies would not take too long to replicate the data across nodes. So in this use case it is absolutely acceptable since users are pretty slow in their actions.
For example :
MongoDB is consistent by default: reads and writes are issued to the
primary member of a replica set. Applications can optionally read from
secondary replicas, where data is eventually consistent by default.
from official MongoDB FAQ
Another alternative that is getting more popular is to use a streaming platform such as Apache Kafka where it is up to your architecture design how fast the stream consumer will process the data (for eventual consistency). Since the stream platform is very fast it is mostly only up to the speed of your stream processor to make the data available at the right place. So we are talking about milliseconds and not even seconds in most cases.
The key thing in these sorts of architectures is to have each service be autonomous when it comes to writes: it can take the write even if none of the other application-level services are up.
So in the example of a twitter like service, you would model it as
Service A manages the content of a post
So when a user makes a post, a write happens in Service A's DB and from that instant the post can be edited because editing is just a request to A.
If there's some other service that consumes the "post content" change events from A and after a "new post" event exposes some functionality, that functionality isn't going to be exposed until that service sees the event (yay tautologies). But that's just physics: the sun could have gone supernova five minutes ago and we can't take any action (not that we could have) until we "see the light".

Resources