Data Dependency Among Microservices - microservices

In my microservices architecture, I have a bunch of services A, B, C, D etc.
For ex: Service A is responsible for managing students. Service B is responsible for managing assessments, the student take.
Service B stores the studentid in the tables for reference. However when I have to query for all the assessments taken in a given time period, Service B has to call Service A to get the student name. Because the client app wants the name. not the id.
I see a lot of network calls among services because of this. So I was thinking Service A could raise an event whenever a new student is registering. Service B will consume the event and stores student info in its db. (same for student name update as well).
Questions:
Is this a bad practice? What are the pros and cons of this approach?
Feel free to suggest any alternatives.

It is good to allow some data duplication across the services and you can do it many many different ways.
One option is to having Service A publishing an event when a new student is registered.
One alternative (That might be simpler) is that when you create a new assessment against Service B, then you provide the username as part of the CreateAssessment command. In this way you don't need to publish any events between the two services when a new user is created.

Publishing events and replicating data into each service's database is a totally reasonable approach to minimizing network calls. I think you might find my answer to a similar question helpful as well (option 1 is the same as what you described):
https://stackoverflow.com/a/57791951/1563240

Related

How to design validation rules between microservices?

We have two Microservices (M1 and M2) and each microservice has it's own schema DB1 and DB2.
M1 receives the request for registration
M1 calls M2 for validation
M2 returns validation results (with validation id - VID) to M1
M1 completes the registration and persists in DB1 and each registration will have Record Identifier (RID)
My question here is where do we persist the relationship between RID and Validation Results for RID?
Should they be persisted in DB1 (associated to M1) or DB2 (validation schema)?
If the relationship needs to be persisted in M2, then M1 has to make a call to M2 with RID and VID (validation id)
what is the recommended approach in microservices world?
The information provided is not really sufficient for a really valuable answer, but you may consider the following:
It seems to me, that you have more than one RID for one VID but only one VID for each RID. If this is true, it seems more reasonable to me to store this 1:M relationship in the Registration DB.
In what circumstances do you need the Validation schema information behind the VID?
What I mean:
Do you have a method, which returns all the registrations connected to specific VID?
Or may be, you have a method, returning the Validation schema used for specific registration?
Do you see the difference and why this questions is important?
Finally, you may be interested in this article, especially in chapter 4.4 Decentralisation -> Shared persistence.
You don't share the same storage, but it seems to me that it is possible, that you have split it more than needed and it may be a good idea to combine the Registration and the validation services into one. Of course, this is very speculative statement. But if you are unsure if I am right or not, ask yourself:
Does other services / clients use the Validation service?
Does the Validation service represent a dedicated business unit / domain or is it just part of other's unit processes?
And things like that.
And finally: The microservices world doesn't recommend where to put your data, but what to think about, when you decide where to put your data and the main things you may consider, are:
Your services should be deployed autonomously and should operate autonomously.
Your services shouldn't share their storage (because of the previous one)
You should be able to scale individual services by need, without touching the other (this is why we need autonomous deployment and operability)
The granularity principle is very dependent to your concrete project. When you decide "how much", you should take care of the business domain and the ability to maintain all other principles.
Remark: The principles above are by no means exhaustive, but I hope, that all this gives you some directions to get your job done.

How to handle events processing time between services

Let's say we have two services A and B. B has a relation to A so it needs to know about the existing entities of A.
Service A publishes events every time an entity is created or updated. Service B subscribes to the events published by A and therefore knows about the entities existing in service A.
Problem: The client (UI or other micro services) creates a new entity 'a' and right away creates a new entity 'b' with a reference to 'a'. This is done without much delay so what happens if service B did not receive/handle the event from B before getting the create request with a reference to 'b'?
How should this be handled?
Service B must fail and the client should handle this and possibly do retry.
Service B accepts the entity and over time expect the relation to be fulfilled when the expected event is received. Service B provides a state for the entity that ensures it cannot be trusted before the relation have been verified.
It is poor design that the client can/has to do these two calls in the same transaction. The design should be different. How?
Other ways?
I know that event platforms like Kafka ensures very fast event transmittance but there will always be a delay and since this is an asynchronous process there will be kind of a race condition.
What you're asking about falls under the general category of bridging the gap between Eventual Consistency and good User Experience which is a well-documented challenge with a distributed architecture. You have to choose between availability and consistency; typically you cannot have both.
Your example raises the question as to whether service boundaries are appropriate. It's a common mistake to define microservice boundaries around Entities, but that's an anti-pattern. Microservice boundaries should be consistent with domain boundaries related to the business use case, not how entities are modeled within those boundaries. Here's a good article that discusses decomposition, but the TL;DR; is:
Microservices should be verbs, not nouns.
So, for example, you could have a CreateNewBusinessThing microservice that handles this specific case. But, for now, we'll assume you have good and valid reasons to have the services divided as they are.
The "right" solution in your case depends on the needs of the consuming service/application. If the consumer is an application or User Interface of some sort, responsiveness is required and that becomes your overriding need. If the consumer is another microservice, it may well be that it cares more about getting good "finalized" data rather than being responsive.
In either of those cases, one good option is a facade (aka gateway) service that lives between your client and the highly-dependent services. This service can receive and persist the request, then respond however you'd like. It can give the consumer a 200 - OK response with an endpoint to call back to check status of the request - very responsive. Or, it could receive a URL to use as a webhook when the response is completed from both back-end services, so it could notify the client directly. Or it could publish events of its own (it likely should). Essentially, you can tailor the facade service to provide to as many consumers as needed in the way each consumer wants to talk.
There are other options too. You can look into Task-Based UI, the Saga pattern, or even just Faking It.
I think you would like to leverage the flexibility of a broker and the confirmation of a synchronous call . Both of them can be achieved by this
https://www.rabbitmq.com/tutorials/tutorial-six-dotnet.html

Monolithic Web API to microservice design

We have a monolithic Web API layer in our application with a hundred end points. I am trying to break it into microservices using Azure Service Fabric.
When we break them into multiple services, we may end up having duplicate code.
Example: Let's say we have an Account Services to create an account. And there is a payment service to apply payments to transactions.
In this case, both services need the Customer class/domain. Probably the Account Services need an exhaustive customer with full details, but the payment might need a light weight one.
The question is do we need to copy several domain entities, and other layers like this? Doesn't that create more maintenance issues?
If we don't we end up copying the code and creating different services, one monolithic service same is the existing Web API.
Any thoughts on this?
2ndly, we have some cases where transactions are mentioned today. If we separate them, is there any good design to record failures and rollback without trying too much to maintain transactions?
Breaking a monolith up into proper microservices with appropriate boundaries for your domain is certainly more of an art than a science. The prerequisite to taking on such a task is a thorough understanding of your domain and the interactions within, and you won't get it right the first time. One of points that Evans makes in his book on Domain-Driven Design is that for any sufficiently complex domain, the domain model continually evolves because your understanding of the domain is continually evolving; you will understand it a little better tomorrow than you do today. That said, don't be afraid to start when you have an understanding that is "good enough" and be willing to adapt/evolve your model.
I don't know your domain, but it sounds to me like you need to first figure out in which bounded context Customer primarily belongs. Yes, you want to minimize duplication of domain logic, and though it may not fit completely and neatly into a single service, to the extent that you make one service take primary responsibility for accessing, persisting, manipulating, validating, and ensuring the integrity of a Customer, the better off you'll be.
From your question, I see two possibilities:
The Account Services bounded context is the primary stakeholder in Customer, and Customer has non-trivial ties to other Account Services entities and services. It's difficult to draw clear boundaries around a Customer in isolation. In this case, Customer belongs in the Account Services bounded context.
Customer is an independent enough concept to merit its own microservice. A Customer can stand alone. In this case, Customer belongs in its own bounded context.
In either case, great care should be taken to ensure that the Customer-specific domain logic stays centralized in the Customer microservice behind strong boundaries. Other services might use Customer, or perhaps a light-weight (even read-only) CustomerView, but their interactions should go through the Customer service to the extent that they can.
In your question, you indicate that the Payments bounded context will need access to Customer, but it might just need a light-weight version. It should communicate with the Customer service to get that light-weight object. If, during Payments processing you need to update the Customer's billing address for example, Payments should call into the Customer microservice telling it to update its billing address. Payments need not know anything about how to update a Customer's billing address other than the single API call; any domain logic, validation, firing of domain events, etc... that need to happen as part of that operation are contained within the Customer microservice.
Regarding your second question: it's true that atomic transactions become more complex/difficult in a distributed architecture. Do some reading on the Saga pattern: https://blog.couchbase.com/saga-pattern-implement-business-transactions-using-microservices-part/. Also, Jimmy Bogard is currently in the midst of a blog series called
Life Beyond Distributed Transactions: An Apostate's Implementation that may offer some good insights.
Hope this helps!

Microservices: model sharing between bounded contexts

I am currently building a microservices-based application developed with the mean stack and am running into several situations where I need to share models between bounded contexts.
As an example, I have a User service that handles the registration process as well as login(generate jwt), logout, etc. I also have an File service which handles the uploading of profile pics and other images the user happens to upload. Additionally, I have an Friends service that keeps track of the associations between members.
Currently, I am adding the guid of the user from the user table used by the User service as well as the first, middle and last name fields to the File table and the Friend table. This way I can query for these fields whenever I need them in the other services(Friend and File) without needing to make any rest calls to get the information every time it is queried.
Here is the caveat:
The downside seems to be that I have to, I chose seneca with rabbitmq, notify the File and Friend tables whenever a user updates their information from the User table.
1) Should I be worried about the services getting too chatty?
2) Could this lead to any performance issues, if alot of updates take place over an hour, let's say?
3) in trying to isolate boundaries, I just am not seeing another way of pulling this off. What is the recommended approach to solving this issue and am I on the right track?
It's a trade off. I would personally not store the user details alongside the user identifier in the dependent services. But neither would I query the users service to get this information. What you probably need is some kind of read-model for the system as a whole, which can store this data in a way which is optimized for your particular needs (reporting, displaying together on a webpage etc).
The read-model is a pattern which is popular in the event-driven architecture space. There is a really good article that talks about these kinds of questions (in two parts):
https://www.infoq.com/articles/microservices-aggregates-events-cqrs-part-1-richardson
https://www.infoq.com/articles/microservices-aggregates-events-cqrs-part-2-richardson
Many common questions about microservices seem to be largely around the decomposition of a domain model, and how to overcome situations where requirements such as querying resist that decomposition. This article spells the options out clearly. Definitely worth the time to read.
In your specific case, it would mean that the File and Friends services would only need to store the primary key for the user. However, all services should publish state changes which can then be aggregated into a read-model.
If you are worry about a high volume of messages and high TPS for example 100,000 TPS for producing and consuming events I suggest that Instead of using RabbitMQ use apache Kafka or NATS (Go version because NATS has Rubby version also) in order to support a high volume of messages per second.
Also Regarding Database design you should design each micro-service base business capabilities and bounded-context according to domain driven design (DDD). so because unlike SOA it is suggested that each micro-service should has its own database then you should not be worried about normalization because you may have to repeat many structures, fields, tables and features for each microservice in order to keep them Decoupled from each other and letting them work independently to raise Availability and having scalability.
Also you can use Event sourcing + CQRS technique or Transaction Log Tailing to circumvent 2PC (2 Phase Commitment) - which is not recommended when implementing microservices - in order to exchange events between your microservices and manipulating states to have Eventual Consistency according to CAP theorem.

Micro Services and noSQL - Best practice to enrich data in micro service architecture

I want to plan a solution that manages enriched data in my architecture.
To be more clear, I have dozens of micro services.
let's say - Country, Building, Floor, Worker.
All running over a separate NoSql data store.
When I get the data from the worker service I want to present also the floor name (the worker is working on), the building name and country name.
Solution1.
Client will query all microservices.
Problem - multiple requests and making the client be aware of the structure.
I know multiple requests shouldn't bother me but I believe that returning a json describing the entity in one single call is better.
Solution 2.
Create an orchestration that retrieves the data from multiple services.
Problem - if the data (entity names, for example) is not stored in the same document in the DB it is very hard to sort and filter by these fields.
Solution 3.
Before saving the entity, e.g. worker, call all the other services and fill the relative data (Building Name, Country name).
Problem - when the building name is changed, it doesn't reflect in the worker service.
solution 4.
(This is the best one I can come up with).
Create a process that subscribes to a broker and receives all entities change.
For each entity it updates all the relavent entities.
When an entity changes, let's say building name changes, it updates all the documents that hold the building name.
Problem:
Each service has to know what can be updated.
When a trailing update happens it shouldnt update the broker again (recursive update), so this can complicate to the microservices.
solution 5.
Keeping everything normalized. Fileter and sort in ElasticSearch.
Problem: keeping normalized data in ES is too expensive performance-wise
One thing I saw Netflix do (which i like) is create intermediary services for stuff like this. So maybe a new intermediary service that can call the other services to gather all the data then create the unified output with the Country, Building, Floor, Worker.
You can even go one step further and try to come up with a scheme for providing as input which resources you want to include in the output.
So I guess this closely matches your solution 2. I notice that you mention for solution 2 that there are concerns with sorting/filtering in the DB's. I think that if you are using NoSQL then it has to be for a reason, and more often then not the reason is for performance. I think if this was done wrong then yeah you will have problems but if all the appropriate fields that are searchable are properly keyed and indexed (as #Roman Susi mentioned in his bullet points 1 and 2) then I don't see this as being a problem. Yeah this service will only be as fast as the culmination of your other services and data stores, so they have to be fast.
Now you keep your individual microservices as they are, keep the client calling one service, and encapsulate the complexity of merging the data into this new service.
This is the video that I saw this in (https://www.youtube.com/watch?v=StCrm572aEs)... its a long video but very informative.
It is hard to advice on the Solution N level, but certain problems can be avoided by the following advices:
Use globally unique identifiers for entities. For example, by assigning key values some kind of URI.
The global ids also simplify updates, because you track what has actually changed, the name or the entity. (entity has one-to-one relation with global URI)
CAP theorem says you can choose only two from CAP. Do you want a CA architecture? Or CP? Or maybe AP? This will strongly affect the way you distribute data.
For "sort and filter" there is MapReduce approach, which can distribute the load of figuring out those things.
Think carefully about the balance of normalization / denormalization. If your services operate on URIs, you can have a service which turns URIs to labels (names, descriptions, etc), but you do not need to keep the redundant information everywhere and update it. Do not do preliminary optimization, but try to keep data normalized as long as possible. This way, worker may not even need the building name but it's global id. And the microservice looks up the metadata from another microservice.
In other words, minimize the number of keys, shared between services, as part of separation of concerns.
Focus on the underlying model, not the JSON to and from. Right modelling of the data in your system(s) gains you more than saving JSON calls.
As for NoSQL, take a look at Riak database: it has adjustable CAP properties, IIRC. Even if you do not use it as such, reading it's documentation may help to come up with suitable architecture for your distributed microservices system. (Of course, this applies if you have essentially parallel system)
First of all, thanks for your question. It is similar to Main Problem Of Document DBs: how to sort collection by field from another collection? I have my own answer for that so i'll try to comment all your solutions:
Solution 1: It is good if client wants to work with Countries/Building/Floors independently. But, it does not solve problem you mentioned in Solution 2 - sorting 10k workers by building gonna be slow
Solution 2: Similar to Solution 1 if all client wants is a list enriched workers without knowing how to combine it from multiple pieces
Solution 3: As you said, unacceptable because of inconsistent data.
Solution 4: Gonna be working, most of the time. But:
Huge data duplication. If you have 20 entities, you are going to have x20 data.
Large complexity. 20 entities -> 20 different procedures to update related data
High cohesion. All your services must know each other. Data model change will propagate to every service because of update procedures
Questionable eventual consistency. It can be done so data will be consistent after failures but it is not going to be easy
Solution 5: Kind of answer :-)
But - you do not want everything. Keep separated services that serve separated entities and build other services on top of them.
If client wants enriched data - build service that returns enriched data, as in Solution 2.
If client wants to display list of enriched data with filtering and sorting - build a service that provides enriched data with filtering and sorting capability! Likely, implementation of such service will contain ES instance that contains cached and indexed data from lower-level services. Point here is that ES does not have to contain everything or be shared between every service - it is up to you to decide better balance between performance and infrastructure resources.
This is a case where Linked Data can help you.
Basically the Floor attribute for the worker would be an URI (a link) to the floor itself. And Any other linked data should be expressed as URIs as well.
Modeled with some JSON-LD it would look like this:
worker = {
'#id': '/workers/87373',
name: 'John',
floor: {
'#id': '/floors/123'
}
}
floor = {
'#id': '/floor/123',
'level': 12,
building: { '#id': '/buildings/87' }
}
building = {
'#id': '/buildings/87',
name: 'John's home',
city: { '#id': '/cities/908' }
}
This way all the client has to do is append the BASE URL (like api.example.com) to the #id and make a simple GET call.
To remove the extra calls burden from the client (in case it's a slow mobile device), we use the gateway pattern with micro-services. The gateway can expand those links with very little effort and augment the return object. It can also do multiple calls in parallel.
So the gateway will make a GET /floor/123 call and replace the floor object on the worker with the reply.

Resources