How to make microservices call async when microservice B depends on the response of A and microservice C depend on response B?
You should avoid chaining calls from one microservice to another in order to fulfill a client's request. It doesn't matter if the calls are synchronous or asynchronous. This can lead to cascade failures so the availability of the system is affected.
Instead, you should gather all the needed data in background (i.e. using cron or events) before the clients' requests. In this way, if service A is down, service B continues to work.
Related
Suppose, I have an api POST /order which invokes PlaceOrder lambda and expects response from this. PlaceOrder lambda does some works, invokes another lambda ProcessPayment lambda and expects response. Also, ProcessPayment invokes CreateInvoice lambda expecting response. Whole architecture is like a RequestResponse cycle. I woulde like to achieve that without lambda invoking another lambda as it is considered as anti-pattern. My question is what is the best design pattern to achieve this behavior within 29 seconds with event-driven architecture.
What AWS suggests: As per this official documentation, they suggests to use SQS. But regarding using SQS, I have some thoughts.
My thoughts:
At event sources architecture, I can orchestrate these lambdas with SQS, SNS etc other event sources, but in that case, the nature would not be synchronous and thus I would not get response from the api.
My other solution:
Using Step Function: I can orchestrate this workflow with step function, and I think it is more elegant solution in this synchronous calling case. But I would like to achieve
this via event sources.
How can I design this scenerio with best practices using event-based achitecture?
In an Event-Driven Architecture, the communication between producers and consumers is asynchronous by design, that's the way the architecture scales.
You can get nearly synchronous communication between 2 services in an EDA, by creating dedicated queues / channels to communicate between them, make sure they're scaled up to a level where the latency is acceptable (close to synchronous values).
This adds some complexity, because the services which need responses, have to wait in a hot-loop to get them as soon as possible, and also if messages are lost, you need to have retry policies, etc.
I think you need to focus more on the mechanics of your program and a bit less on design patterns. You need to use the design patterns that fit your use-case, the other way around will not work. In the end, you build a program to fulfill a certain task or set of tasks, so that should be your end goal.
You’re stating that you have a process order Lambda, a create invoice Lambda and a process payment Lambda. I’d say the most interesting question is what you need to get done before you return a response to the user. Maybe you can process the order, respond to the user that it is done and handle the invoicing and payments on a later moment. Typically that would mean you put a message in a SQS queue or on an SNS topic.
It could be that you need your payment to be processed before you respond to the user, because they need to be informed about the status of the payment. You could then combine both actions in a single Lambda, because there is no way to spit the two tasks from one another. Keep in mind that often another option exist where you process the order first, put a message in a queue for the process payment (as it typically is a process that involves a third party) and the front end will poll for an update on the payment status. This way you can return a response quickly and still give an update on the payment as soon as possible.
The create invoice process is typically something you would never want to synchronously invoke during order confirmation. What if your invoicing application (intern or extern) is down? Theoretically you could still process orders as long as you create the invoice at some later moment in time. If you couple everything together you make order confirmation dependent on your invoice creation process, which I would regard as an unnecessary dependency.
I would really advice against step functions for this use-case. They can be utilized for long running processes that need to keep state and ‘wake up’ at specific moments, but for this specific flow I would say they do not help and are unnecessarily complex. If you have 3 things you need to do that you cannot separate from
one another, just run them in the same Lambda.
I'm in the process of designing a micro-service architecture and I have a performance related question. This is what I am trying out with my design:
I have a several micro-services which perform distinct actions and store those results in their own data-store.
The micro-services receive work via a message queue where they receive requests to run their process for the specific data given. The micro-services do NOT communicate with each other.
I have an API gateway which effectively has three journeys:
1) Receive a request to process data which it then translates into several messages which it puts on the queue for the micro-services to process in their own time. The processing time can be in minutes or longer (not-instant)
2) Receives a request for the status of the process, where it returns the progress of the overall process.
3) Receives a request for combined data, which is some combination of all the results from the services.
My problem lies in #3 above and the performance of this process.
Whenever this request is received, the api gateway has to put a message request onto the queue for information from all the services, it than has to wait for all the services to reply with the latest state of their data and then it combines this data and returns to the caller.
This process is obviously rather slow as it has to wait for every service to respond. What is the way of speeding this up?
The only way I thought of solving this is having another aggregate service/data-store where duplicate data is stored and queried by my api gateway. I really don't like this approach as it duplicates data and is extra work/code.
What is the 'correct' and performant way of querying up-to-date data from my micro-services.
You can use these approach for Querying data across microservices. Reference
Selective data replication
With this approach, we replicate the data needed from other microservices into the database of our microservice. The only coupling between microservices is in the data replication configuration.
Composite service layer
With this approach, you introduce composite services that aggregate data from lower-level microservices.
I would like to understand how to detect the failed service ( in a fast / reliably manner ), ie the service what is a root of all 5xx responses?
Let me try to elaborate. Lets assume we have 300+ microservices and they have only synchroneous http interaction via GET request without any data modifications ( we assume it for simplicity ). Each customer request may transform in calling ~10 different microservices, moreover it could be a 'calling chain' of requests, ie API Gateway calls 3 different microservices, each of them calls 1-5 more, each of these 1-5 calls 1-5 more etc.
We closely monitor 5xx errors on each of microservice and react on these errors.
Now one of the microservices fails. It appears to be somewhere in the end of a 'calling chain', which means that other microservices which depend on it will start to return 5xx as well.
Yes, there are circuit breakers, yes they become 'triggered / opened' and instead of calling the downstream service, they right away return error as well ( in most cases we cannot return a good fallback like empty response ).
So we see that relatively big amount of microservices return 5xx. Like 30-40 microservices return 5xx, we see 30-40 triggered / opened circuit breakers.
How to detect a failed microservice, a root of all evil, in a fast manner?
Did anybody encounter this issue?
Regards
You will need to implement a distributed tracing solution that tracks the origin transaction with a global ID. The name of this global identifier is typically called Correlation ID and it is generated by the very first service which creates the request and propagated to all the other microservices that work together to fulfill the request.
Take a look at OpenTracing for your implementation needs. It provides libraries for you to add the instrumentation required for identifying faulty microservices in a distributed environment.
However, if you really do have 300 microservices all using synchronous calls...maybe it is time to consider using asynchronous communications to eliminate the temporal coupling inherent in synchronous communications.
Let's say we have two services A and B. B has a relation to A so it needs to know about the existing entities of A.
Service A publishes events every time an entity is created or updated. Service B subscribes to the events published by A and therefore knows about the entities existing in service A.
Problem: The client (UI or other micro services) creates a new entity 'a' and right away creates a new entity 'b' with a reference to 'a'. This is done without much delay so what happens if service B did not receive/handle the event from B before getting the create request with a reference to 'b'?
How should this be handled?
Service B must fail and the client should handle this and possibly do retry.
Service B accepts the entity and over time expect the relation to be fulfilled when the expected event is received. Service B provides a state for the entity that ensures it cannot be trusted before the relation have been verified.
It is poor design that the client can/has to do these two calls in the same transaction. The design should be different. How?
Other ways?
I know that event platforms like Kafka ensures very fast event transmittance but there will always be a delay and since this is an asynchronous process there will be kind of a race condition.
What you're asking about falls under the general category of bridging the gap between Eventual Consistency and good User Experience which is a well-documented challenge with a distributed architecture. You have to choose between availability and consistency; typically you cannot have both.
Your example raises the question as to whether service boundaries are appropriate. It's a common mistake to define microservice boundaries around Entities, but that's an anti-pattern. Microservice boundaries should be consistent with domain boundaries related to the business use case, not how entities are modeled within those boundaries. Here's a good article that discusses decomposition, but the TL;DR; is:
Microservices should be verbs, not nouns.
So, for example, you could have a CreateNewBusinessThing microservice that handles this specific case. But, for now, we'll assume you have good and valid reasons to have the services divided as they are.
The "right" solution in your case depends on the needs of the consuming service/application. If the consumer is an application or User Interface of some sort, responsiveness is required and that becomes your overriding need. If the consumer is another microservice, it may well be that it cares more about getting good "finalized" data rather than being responsive.
In either of those cases, one good option is a facade (aka gateway) service that lives between your client and the highly-dependent services. This service can receive and persist the request, then respond however you'd like. It can give the consumer a 200 - OK response with an endpoint to call back to check status of the request - very responsive. Or, it could receive a URL to use as a webhook when the response is completed from both back-end services, so it could notify the client directly. Or it could publish events of its own (it likely should). Essentially, you can tailor the facade service to provide to as many consumers as needed in the way each consumer wants to talk.
There are other options too. You can look into Task-Based UI, the Saga pattern, or even just Faking It.
I think you would like to leverage the flexibility of a broker and the confirmation of a synchronous call . Both of them can be achieved by this
https://www.rabbitmq.com/tutorials/tutorial-six-dotnet.html
I am designing and developing a microservice platform based on the specifications of http://microservices.io/
The entire framework integrates through socket thus removing the overhead of multiple HTTP requests (like most REST APIs).
A service registry host receives the registry of multiple microservice hosts, each microservice is responsible for a domain of the business. Another host we call a router (or API gateway) is responsible for exposing the microservices for consumption by third parties.
We will use the structure of Sagas (in choreography style) to distribute the requisitions, so we have some doubts:
Should a microservice issue the event in any process manager or should it be passed directly to the next microservice responsible for the chain of events? (the same logic applies to rollback)
Who should know how to build the Saga chain of events? The first microservice that receives a certain work or the router?
If an event needs to pass a very large volume of data to the next Saga event, how is this done in terms of the request structure? Is it divided into multiple Sagas for example (as a result pagination type)?
I think the main point is that in this router and microservice structure, who is responsible for building the Sagas and propagating their events.
The article Patterns for Microservices — Sync vs. Async does a great job defining many of the terms used here and has animated gifs demonstrating sync vs. async and orchestrated vs. choreographed as well as hybrid setups.
I know the OP answered his own question for his use case, but I want to try and address the questions raised a bit more generally in lieu of the linked article.
Should a microservice issue the event in any process manager or should it be passed directly to the next microservice responsible for the chain of events?
To use a more general term, a process manager is an orchestrator. A concrete implementation of this may involve a stateful actor that orchestrates a workflow, keeping track of the progress in some way. Since a saga is workflow itself (composed of both forward and compensating actions), it would be the job of the process manager to keep track of the state the saga until completion (success or failure). This typically involves the actor sending synchronous* calls to services waiting for some result before going to the next step. Parallel operations can of course be introduced and what not, but the point is that this actor dictates the progression of the saga.
This is fundamentally different from the choreography model. With this model there is no central actor keeping track of the state of a saga, but rather the saga progresses implicitly via the events that each step emits. Arguably, this is a more pure case of an event-driven model since there is no coordination.
That said, the challenge with this model is observing the state at any given point in time. With the orchestration model above, in theory, each actor could be queried for the state of the saga. In this choreographed model, we don't have this luxury, so in practice a correlation ID is added to every message corresponding to (in this case) a saga. If the messages are queryable in some way (the event bus supports it or through some other storage means), then the messages corresponding to a saga could be queried and the saga state could be reconstructed.. (effectively an event sourced modeled).
Who should know how to build the Saga chain of events? The first microservice that receives a certain work or the router?
This is an interesting question by itself and one that I have been thinking about quite a lot. The easiest and default answer would be.. hard code the saga plans and map them to the incoming message types. E.g. message A triggers plan X, message B triggers plan Y, etc.
However, I have been thinking about what a control plane might look like that manages these plans and provides the mechanism for pushing changes dynamically to message handlers and/or orchestrators dynamically. The two specific use cases in mind are changes in authorization policies or dynamically adding new steps to a plan.
If an event needs to pass a very large volume of data to the next Saga event, how is this done in terms of the request structure? Is it divided into multiple Sagas for example (as a result pagination type)?
The way I have approached this is to include references to the large data if these are objects such as a file or something. For data that are inherently streams themselves, a parallel channel could be referenced that a consumer could read from once it receives the message. I think the important distinction here is to decouple thinking about the messages driving the workflow from where the data is physically materialized which depends on the data representation.
For microservices, every microservice should be responsible for its domain business.
Should a microservice issue the event in any process manager or should it be passed directly to the next microservice responsible for the chain of events? (the same logic applies to rollback)
All events are not passed to the next microservice, but are published, then all microservices interested in the events should subscribe to them.
If there is rollback, you should consider orchestration.
Who should know how to build the Saga chain of events? The first microservice that receives a certain work or the router?
The microservice who publish the event will certainly know how to build it. There are no chain of events, because every microservice interested in the event will subscribe it separately.
If an event needs to pass a very large volume of data to the next Saga event, how is this done in terms of the request structure? Is it divided into multiple Sagas for example (as a result pagination type)?
Only publish the data others may be interested, not all. In most cases, the data are not large, and message queue can handle them efficiently