I have an application that contains several microservices.
For example, microservices: A,B,C,D,E,F,G
Form those microservices I build several processes.
For example,
P1: A-->B-->C-->E
P2: B-->C-->A-->G-->D
P3: A-->G
P4: A-->B-->E
...
The microservices communicate between them through RabbitMQ.
If I execute process P1, I don't want that any other process will be executed (probably need to save state)
In the future I may want to add more microservices, and more processes.
I'm wondering how and where to set the logic of microservices path per process.
Should each microservice call directly to the next microservice (using RabbitMQ) according to the specific process (as in the example above)?
Or, would it be better to add a "manager" microservice which will know the exact order per process, and each microservice will communicate through manager?
For example,
P1: Manager-->A-->Manager-->B-->Manager-->C-->Manager-->E-->Manager
P2: Manager-->B-->Manager-->C-->Manager-->A-->Manager-->G-->Manager-->D-->Manager
P3: Manager-->A-->Manager-->G-->Manager-->Manager
...
To make it work you only need the set your RabbitMQ events correctly.
In your example, microservice A is responsible for starting P1, P3 and P4, so you should have 3 different event types, e.g., Start P1, Start P2 && Start P3 and the microservice A should subscribe to all of them and start each process accordingly.
After microservice A processes an event it should create a new one, e.g., if it processed Start P1, then it should create an event, e.g., P1 Step1 Completed which will then be subscribed by microservice B.
This way you can discriminate easily each process. Note that I am using dummy names, in a real world scenario your event names should be concise and clear.
Related
I have a microservice distributed sequence of action. Service A needs to tell service B to do something and once that is complete it will tell service C. The sequence is important so I'm using the saga pattern as you can see.
My issue is that service B can scale and each instance needs to receive the message and complete the action. The action must happen on every service B instance. Then service C should only run once all the service B instances have completed their task.
It is a cache purge that must happen on each instance. I have no control over this architecture so the cache for service B is coupled to each instance. I would have a shared cache for the instances if I could.
I have come up with this orchestration solution but it requires maintaining state and lots of extra code to handle edge cases which I would like to avoid.
service A sends the same message to all service B instances which it knows about
all service B instances send success to service A
On the final service B success, service A messages service C
Is there a better alternative to this?
Assuming that you can't rearchitect service B, you've captured the essential complexity of the operation: A will have to track instances of service B and will have to deal with a ton of edge cases. The process is fundamentally stateful.
If the cache purge command is idempotent (i.e. you don't care if it happens multiple times in the process) you can simplify some of the edge case handling and can get away with the state being less durable (on failure you can start from the beginning instead of needing to reconstruct where you were in the process).
Each microservice sends an event to next microservice to run their local transactions in choreography based saga pattern. So, which microservice ends transaction? Last microservice which has no next ends and aprove transaction or last microservice sends OK to first starting microservice to commit overall process?
The choreography means that services knows about their counterparts and decides from what is outcome of other services. No communication with some "central coordinator".
If we say the A communicates to B which communicates to C (A -> B -> C) then A does some action asks for outcome of B, B does some action and asks for outcome of C and C does some action. The outcomes go in opposite way (A <- B <- C).
Let's expect that all service actions were successful. Thus C persists data and responds to B that the task was successful done and outcome is OK. B knows there is no other services that it waits for and the saga step can be validated as successful at B. Then it responds to A of success and A knows there is no other service to wait for and may ends its saga step with success as well.
The example of two services communicating over queue can be found at https://microservices.io/patterns/data/saga.html
I have doubt related to MicroServices. Suppose there are 5 Micro-Services, lets say M1, M2, M3, M3, M4 and M5. There are 4 databases which are connected/accessed by 4 micro-services.
For example, M2 connected to MySQL, M3 connected to Cassandra, M4 connected to MongoDB and M5 connected to Oracle.
Now
Step-1: M1 making a call to M2 to update some user data in MySQL and it updated successfully then finally it got success response from M2
Step-2: M1 making a call to M3 to update some data in Cassandra and it updated successfully then finally it got success response from M3
Step-3: M1 making a call to M4 to update some data in MongoDB and it failed due to some DB server problem or any other problem.
Here my requirement is, I want to rollback DB changes that happened to previous micro-services(M2 and M3)
What should we need to do to achieve this kind of rollback scenario?
This is a typical case of distributed transaction. Regardless of the fact that you use separate technology for your database or the same on different server you perform an operation which is transactional.
In order to handle a rollback on that type of transaction you can not relay on the database technology mechanism for transactions and rollbacks. You have to do it on your own.
Saga Pattern
Common solution for distributed transaction scenarios in micro-service architecture is the Saga pattern.
Distributed sagas is a pattern for managing failures in scenarios as the one that you have described.
Saga are created based on business process for example "Buy a Product in online shop". This process can involve multiple actions on multiple micro-services. Saga will control and manage this process execution and if one of the steps fail it will trigger actions to revert the actions done before the failing action.
There are multiple ways to implement sagas. It depends on your architecture and the way your micro-services communicate with each other. Do you use Commands and/or Events?
Example
"Buy a Product in online shop" business process. Lets say this business process has 3 simple steps done by 3 different micro-services:
Action 1 - Reserve Product in products-inventory-micro-service
Action 2 - Validate payment in payment-micro-service
Action 3 - Order a product in orders-micro-service
Using Events:
You can publish events to perform some action(or actions) and if one of the action fails you can publish a revert(or delete) event for that event. For the above business process lets say the 1. Action succeeded and the 2. Action failed. In this case in order to rollback the 1. Action you would publish an event like "RemoveReservationFromProduct" in order to remove the reservation and revert the state back to the state as it was before the transaction for that Business process started. This event would be picked up by a event handler which would go and revert that state in your database. Since it is an event you can implement retry mechanism for failures or just reapply it later if there is some bug in the code.
Using commands:
If you have direct calls to your micro-services as commands using some kind of rest api you could run some delete or update endpoints to revert the changes that you have done. For the above business process lets say the 1. Action succeeded and the 2. Action failed. In this case in order to rollback the 1. Action you would call the delete api to delete the reservation for a particular product in order to remove the reservation and revert the state back to the state as it was before the transaction for that Business process started.
You can take a look at this example how to implement the Saga pattern.
From what I understand, a Saga is what you are looking for.
The idea is to provide for every state altering operation an undo-operation, that has to be called if things went bad down stream.
You can make sure that you have #Transactional enabled in this entire sequence of Invocation.
Consider invocation of all microservices from M1 as single transaction.
Expose a rollback in following way:
While updating DB in M2, M3 and M4, place the values in Spring cache as well along
with DB.
Upon invoking /rollback in M2, M3 or M4, get the values from Spring Cache and undo
them from DB.
In the fallbackMethod of hysterix command, when M1 replies with error or some default output, invoke /rollback of other services.
This may not be a perfect solution, as it introduces another fault point as /rollback handling, but fastest one that can be implemented.
to answer your question lets add some business requirements
Case 1. M1 is doing all interaction with other microservices based on an event recieved like Order Placed
Now in this case M2 ... M5 update ,
requirement 1: if all of them are independent of each other.
first create 5 event from one event and then
in such a case you could add this event in a table mark this event as unprocessed and some timer reads unprocessed event and tries to do all the tasks in a Idempotent way, also you could have reporting if such tasks are failing and your team can check them and manually resolve them.
(you could implement a similar logic by using a failover queue - which sends the same event back to the original queue after some time)
requirement 2: if all are not independent
use a single event and still the same solution.
in the above solution the main benefit is even if your system restart in between the transactions you will alwayss eventually have the consistent system
Case 2. if the M1 api is invoked and M1 needs to do all tasks from multiple microservice and then give response to user.
we could create a started event in M1 microservice DB (sync_event_table)
try to do update in all microservice
after all complete , update the sync event table with completed
for those cases which are not completed - run a timer which checks for job which are not completed for > X min and then do the undo actions or whatever required,.
Essence:
So if you see all solutions suggests your system to turn all the diff. microservice update
by creating a job
checking job status
writing a undo/redo job feature
I am new to microservice architecture. Currently I am using spring boot for my microservices, in case one of the microservice is down how should fail over mechanism work ?
For Ex. if we have 3 microservices M1,M2,M3 . M1 is interacting with M2 and M2 is interacting with M3 . In case M2 microservice cluster is down how should we handle this situation?
When any one of the microservice is down, Interaction between services becomes very critical as isolation of failure, resilience and fault tolerance are some of key characteristics for any microservice based architecture.
Totally agreed what #jayant had answered, in your case Implementing proper fallback mechanism makes more sense and you can implement required logic you wanna write based on use case and dependencies between M1, M2 and M3.
you can also raise events in your fallback if needed.
Since you are new to microservice, you need to know below common techniques and architecture patterns for resilience and fault tolerance against the situation which you have raised in your question. And here you are using Spring-Boot, you can easily add Netflix-OSS in your microservices.
Netflix has released Hystrix, a library designed to control points of access to remote systems, services and 3rd party libraries, providing greater tolerance of latency and failure.
It include below important characteristics:
Importance of Circuit breaker and Fallback Mechanism:
Hystrix implements the circuit breaker pattern which is useful when a
service failure can cause cascading failure all the way up to the user.
When calls to a particular service exceed
circuitBreaker.requestVolumeThreshold (default: 20 requests) and the
failure percentage is greater than
circuitBreaker.errorThresholdPercentage (default: >50%) in a rolling
window defined by metrics.rollingStats.timeInMilliseconds (default: 10
seconds), the circuit opens and further calls are not made.
In cases of error and an open circuit, a fallback can be provided by the
developer. Fallbacks may be chained so that the first fallback makes
some other business call. check out Fallback Implementation of Hystrix
Retry:
When a request fails, you may want to have the request be retried
automatically. Ribbon does this job for us.
In distributed system, a microservices system retry can trigger multiple
other requests or retries and start a cascading effect
here are some properties to look of Ribbon
sample-client.ribbon.MaxAutoRetries=1
Max number of next servers to retry (excluding the first server)
sample-client.ribbon.MaxAutoRetriesNextServer=1
Whether all operations can be retried for this client
sample-client.ribbon.OkToRetryOnAllOperations=true
Interval to refresh the server list from the source
sample-client.ribbon.ServerListRefreshInterval=2000
More details :- ribbon properties
Bulkhead Pattern:
In general, the goal of the bulkhead pattern is to avoid faults in one
part of a system to take the entire system down. bulkhead pattern
The bulkhead implementation in Hystrix limits the number of concurrent
calls to a component. This way, the number of resources (typically
threads) that is waiting for a reply from the component is limited.
Assume you have a request based, multi threaded application (for example
a typical web application) that uses three different components, M1, M2,
and M3. If requests to component M3 starts to hang, eventually all
request handling threads will hang on waiting for an answer from M3.
This would make the application entirely non-responsive. If requests to
M3 is handled slowly we have a similar problem if the load is high
enough.
Implementation details can be found here
So, These are some factors you need to consider while handling microservice Interaction when one of the microservice is down.
As mentioned in the comment, there are many ways you can go about it,
case 1: all are independent services, trivial case, no need to do anything, call all the services in blocking or non-blocking way, calling service 2 will in both case result in timeout
case 2: services are dependent M2 depends on M1 and M3 depends on M2
option a) M1 can wait for service M2 to come back up, doing periodic pings or fetching details from registry or naming server if M2 is up or not
option b) use hystrix as a circuit breaker implementation and handle fallback gracefully in M3 or your orchestrator(guy who is calling these services i.e M1,M2,M3 in order)
Let's say that we have microservice A (MS A) and Microservice B (MS B).
MS B has data about Products. MS A needs the productnames of MS B.
Each time a product is added, updated or deleted, MS B puts a message on a message queue.
MS A is subscribed to that queue, so it can updated it's own internal state.
Now my question:
How do we fill the internal state of MS A when we deploy it to production the first time?
I couldn't find any documentation about the pros and cons of the possible solutions.
I could think of:
Export/import on database level.
Pros: not much work.
Cons: can miss data if during export/import changes to the data of MS A are made.
Implement calls for GetData and GetDataChangedSince
Pros: failsafe
Cons: a lot of work
Are there any other options? Are there any other pros/cons?
You could use the following workflow:
prepare the microservice B to push the events to the queue or stop it if it is already pushing to the queue; instead, it pushes to a circular buffer (a buffer that is rewritten when full) and waits for a signal from microservice A
deploy the microservice A into production servers but you don't reference it from nowhere; it just runs, waiting for events in the queue
run a script that get all product names from microservice B and push them into the queue as a simulated event; when it finishes the product names it signals the microservice B (optionally telling the date or sequence number or whatever de-duplication technique you have to detect duplicate events)
microservice B then copy the events from the buffer newer that the last pushed by microservice A (or it finds out itself from the queue what is the last one) into the queue and then ignores the buffer and continue to work as normally.
It sounds like there is a service/API call missing from you architecture. Moving a service into production should be no different than recovering from a a failure and should not require any additional steps. Perhaps the messages should be consumed from the queue by another service that can then be queried for the complete list of products.