How to handle failure in API gateway - microservices

Say I design a microservice architecture for a system of payment and a user one click the pay button in its UI screen send request for an API gateway and the gateway start chain of api calls to the few microservices. how to handle to case when one of the micorservices is down or not responding in the middle of the chain call?
I want the user to think that his payment has been successful and no to return him "try again later" can I save the state of the chain somewhere?

For such cases using asynchronous communication is more preferable rather than synchronous communication.
In this case;
When client sends a request to the system, API gateway takes the request and delegates to corresponding microservice. After that this microservice sends an event to other related microservice. Generally a message broker is used for this, messages are stored in broker and even the consumer (subcsriber) microservice is down, the message would not be lost.
You can also send events directly from api gateway to message broker. (see: https://microservices.io/patterns/apigateway.html)
For achiving atomicity and consistency (eventual consistency of course) SAGA pattern can be applied. You can check this for more information.
But if your requirement is calling a microservice for getting some data immediately and if this required microservice is down, then this solution would not work. You should avoid this kind of coupling between microservices by design. In my opinion, this is one of the most challenging part in microservices architecture. Domain Driven Design techniques can be used to determine bounded contexts.

Related

REST API uses asynchronous (events) internally

I am implementing a REST API that internally places a message on a message queue and receives a message as a response on a different topic.
How could API implementation handle publishing and consuming different messages and responds to the client?
What if it never receives a message?
How does the service handle this time-out scenario?
Example
I am implementing a REST API to process an order. The implementation internally publishes a series of messages to verify the payment, update inventory, and prepare shipping info. Finally, it sends the response back to the client.
Queues are too low-level abstraction to implement your requirements directly. Look at an orchestration solution like temporal.io that makes programming such async systems trivial.
Disclaimer: I'm one of the founders of the Temporal open source project.
How could API implementation handle publishing and consuming different messages and responds to the client?
Even though messaging systems can be used in RPC like fashion:
there is a request topic/queue and a reply topic/queue
with a request identifier in the messages' header/metadata
this type of communication kills the promise of the messaging system: decouple components in time and space.
Back to your example. If ServiceA receives the request then it publishes a message to topicA and returns with an 202 Accepted status code to indicate that the request is received but not yet processed completely. In the response you can indicate an url on which the consumer of ServiceA's API can retrieve the latest status of its previously issued request.
What if it never receives a message?
In that case the request related data remains in the same state as it was at the time of the message publishing.
How does the service handle this time-out scenario?
You can create scheduled jobs to clean-up never finished/got stuck requests. Based on your business requirements you can simple delete them or transfer them to manual processing by the customer service.
Order placement use case
Rather than creating a customer-facing service which waits for all the processing to be done you can define several statuses/stages of the process:
Order requested
Payment verified
Items locked in inventory
...
Order placed
You can inform your customers about these status/stage changes via websocket, push notification, e-mail, etc.. The orchestration of this order placement flow can be achieved for example via the Saga pattern.

Saga Pattern on hardware failure and inter services communication

I am building a Spring Boot microservice application. I am planning on adopting the Saga pattern to tackle the distributed transaction problem. Below is the list of questions and problems that I am facing.
Here is the context for ease of explanation.
Client -> Service A -> Service B
Handling of non-alive microservices due to failure
Assuming that Service B is not alive due to hardware / software failure, how should A react?
Async communication
It is recommended that we have async communication for saga pattern. Assuming that time for client -> A < A -> B, how does the Client receive the data that A receives from B at a later time? Is it that A has to return an Async object back to client? Something like CompletableFuture class?
Service requesting resources from other services.
Assuming that Service A has to request some resources from Service B, how should A go about doing this? All I can think of is using HTTP / gRPC (eliminated communication from message broker).
If you happened to have some experience / advice, please share :)
Any help or advice on Saga pattern is appreciated!
SAGA is used for distributed transaction. It can be implemented by using Orchestration or Choreography based. It is mostly (prefer) implemented by using async way of communication. Message Broker plays important role here.
There are lots of queries. Let me try to answer those.
If one service is down - You can setup a monitoring system for SAGA. In case, if any service is down or SAGA is not processed for some threshold time then you can raise alert.
Async Communication - It is mostly used to process some commands (not query). Whenever client call service A, it initiate the SAGA and reply back with current status. It also return a id (you can say job id). Now there are 2 ways through which Client get updated status. One is Poll (where client ask for status update after N sec) and 2nd is Push (where server push the changes when there is change in state.)
Service request resource from other - Yeah, prefer way is REST or gRPC. Also, if data is type of constant then you can use cache.
Suggestion - SRE (Monitoring etc.) play an important role in Microservice architecture. So, if you have setup that well then you can easily handle other challenges of microservice.

Microservices asynchronous response

I come across many blog that say using rabbitmq improve the performance of microservices due to asynchronous nature of rabbitmq.
I don't understand in that case how the the http response is send to end user I am elaborating my question below more clearly.
user send a http request to microservice1(which is user facing service)
microservice1 send it to rabbitmq because it need some service from microservice2
microservice2 receive the request process it and send the response to rabbitmq
microservice1 receive the response from rabbitmq
NOW how this response is send to browser?
Does microservice1 waits untill it receive the response from rabbitmq?
If yes then how it become aynchronous??
It's a good question. To answer, you have to imagine the server running one thread at a time. Making a request to a microservice via RestTemplate is a blocking request. The user clicks a button on the web page, which triggers your spring-boot method in microservice1. In that method, you make a request to microservice2, and the microservice1 does a blocking wait for the response.
That thread is busy waiting for microservice2 to complete the request. Threads are not expensive, but on a very busy server, they can be a limiting factor.
RabbitMQ allows microservice1 to queue up a message to microservice2, and then release the thread. Your receive message will be trigger by the system (spring-boot / RabbitMQ) when microservice2 processes the message and provides a response. That thread in the thread pool can be used to process other users' requests in the meantime. When the RabbitMQ response comes, the thread pool uses an unused thread to process the remainder of the request.
Effectively, you're making the server running microservice1 have more threads available more of the time. It only becomes a problem when the server is under heavy load.
Good question , lets discuss one by one
Synchronous behavior:
Client send HTTP or any request and waits for the response HTTP.
Asynchronous behavior:
Client sends the request, There's another thread that is waiting on the socket for the response. Once response arrives, the original sender is notified (usually, using a callback like structure).
Now we can talk about blocking vs nonblocking call
When you are using spring rest then each call will initiate new thread and waiting for response and block your network , while nonblocking call all call going via single thread and pushback will return response without blocking network.
Now come to your question
Using rabbitmq improve the performance of microservices due to
asynchronous nature of rabbitmq.
No , performance is depends on your TPS hit and rabbitmq not going to improve performance .
Messaging give you two different type of messaging model
Synchronous messaging
Asynchronous messaging
Using Messaging you will get loose coupling and fault tolerance .
If your application need blocking call like response is needed else cannot move use Rest
If you can work without getting response go ahaead with non blocking
If you want to design your app loose couple go with messaging.
In short above all are architecture style how you want to architect your application , performance depends on scalability .
You can combine your app with rest and messaging and non-blocking with messaging.
In your scenario microservice 1 could be rest blocking call give call other api using rest template or web client and or messaging queue and once get response will return rest json call to your web app.
I would take another look at your architecture. In general, with microservices - especially user-facing ones that must be essentially synchronous, it's an anti-pattern to have ServiceA have to make a call to ServiceB (which may, in turn, call ServiceC and so on...) to return a response. That condition indicates those services are tightly coupled which makes them fragile. For example: if ServiceB goes down or is overloaded in your example, ServiceA also goes offline due to no fault of its own. So, probably one or more of the following should occur:
Deploy the related services behind a facade that encloses the entire domain - let the client interact synchronously with the facade and let the facade handle talking to multiple services behind the scenes.
Use MQTT or AMQP to publish data as it gets added/changed in ServiceB and have ServiceA subscribe to pick up what it needs so that it can fulfill the user request without explicitly calling another service
Consider merging ServiceA and ServiceB into a single service that can handle requests without having to make external calls
You can also send the HTTP request from the client to the service, set the application-state to waiting or similar, and have the consuming application subscribe to a eventSuccess or eventFail integration message from the bus. The main point of this idea is that you let daisy-chained services (which, again, I don't like) take their turns and whichever service "finishes" the job publishes an integration event to let anyone who's listening know. You can even do things like pass webhook URI's with the initial request to have services call the app back directly on completion (or use SignalR, or gRPC, or...)
The way we use RabbitMQ is to integrate services in real-time so that each service always has the info it needs to be responsive all by itself. To use your example, in our world ServiceB publishes events when data changes. ServiceA only cares about, and subscribes to a small subset of those events (and typically only a field or two of the event data), but it knows within seconds (usually less) when B has changed and it has all the information it needs to respond to requests. Each service literally has no idea what other services exist, it just knows events that it cares about (and that conform to a contract) arrive from time-to-time and it needs to pay attention to them.
You could also use events and make the whole flow async. In this scenario microservice1 creates an event representing the user request and then return a requested created response immediately to the user. You can then notify the user later when the request is finished processing.
I recommend the book Designing Event-Driven Systems written by Ben Stopford.
I asked a similar question to Chris Richardson (www.microservices.io). The result was:
Option 1
You use something like websockets, so the microservice1 can send the response, when it's done.
Option 2
microservice1 responds immediately (OK - request accepted). The client pulls from the server repeatedly until the state changed. Important is that microservice1 stores some state about the request (ie. initial state "accepted", so the client can show the spinner) which is modified, when you finally receive the response (ie. update state to "complete").

Should an API Gateway Communicate via a Queue or directly to other μServices?

I was wondering which of my two methods is more appropriate, or is there event another one?
(1) Direct
Direct communication between GATEWAY and μSERVICE A
UI sends HTTP request to GATEWAY
GATEWAY sends HTTP request to μSERVICE A
μSERVICE A returns either SUCCESS or ERROR
Event is stored in EVENT STORE and published to QUEUE
PROJECTION DATABASE is updated
Other μSERVICES might consume event
(2) Events
Event-based communication via a message queue
UI sends HTTP request to GATEWAY
GATEWAY published event to QUEUE
μSERVICE A consumes event
Event is stored in EVENT STORE and published to QUEUE
PROJECTION DATABASE is updated
Other μSERVICES might consume event
GATEWAY consumes event and sends response (SUCCESS or ERROR) to UI
I am really sorry if I misunderstood some concept, I am relatively new to this style of architecture.
Thanks in advance for every help! :)
Second approach is a preferred way and is async approach.
Direct
In first approach your microsvc B and C wait for the event to get published . The scalability of this system is directly dependent on microsvc A. what if microsvc A is down or falling behind writing events to queue? it's like single point of failure and bottleneck. you can't scale system easily.
Events
In microservices we keep system async so they can scale.
Gateway should be writing to the queue using pub/sub and all these microservices can use events at same time. system over all is more robust and can be scaled.

microservice: How to do validations from other microservice

If there are 2 micro services and if you want a validation to be performed against other micro service. What would be the best scenario to handle these cases?
If you need resilience and scalability then the best practices says to use asynchronous message based communication between microservices. In your case, one microservice asynchronously sends a RequestValidationOrSomething message to the other one (async means it does not block while waiting for the response). The validating microservice receive the message, perform the validation and sends another message back (success or failure).
If you need a simple solution then one microservice make synchronous calls to the other, similar to local in-process calls.

Resources