Implement Spring batch circuit breaker - spring-boot

I am building a Spring batch job and in the Item processor step I am consuming an external end-point and saving the values to DB. The external point at times is very slow and takes more than 60 sec to respond. So, as a work around I implemented restTemplate timeout(15s) but, how to implement circuit breaker techniques here. As a result of this my transaction is timing out (even after implementing timeout). Are there any solutions to overcome this out of box in spring-batch.

how to implement circuit breaker techniques here
You can annotate the ItemProcessor#process with #CircuitBreaker (see attributes like maxAttempts, resetTimeout, etc) from the spring-retry library and add a recovery method that you annotate with #Recover.
Michael Minella gives a complete sample of this very scenario in his talk: Cloud Native Batch Processing. And you can find the code example here.

Related

Rate limiting on top of WebFlux retry

Rate limiting on top of WebFlux retry
I want to limit the number of retires from WebFlux. The use case is that if the service to be invoked goes down then i end up retrying for all read timeouts which in-turn creates double the load.
I figured out a way to write custom methods for checking the feasibility of retry but that looks to be more of a hack. Is there any cleaner approach to follow for this use case?
Based on the question tags, you already figured out what you need: circuit breaker.
Resilience4j circuit breaker has support for Project Reactor: https://resilience4j.readme.io/docs/examples-1#decorate-flowable-with-a-circuitbreaker

Schedule simple GET batch for each second or even less than one second - Should opt for Spring Cloud Task, Spring Batch or springframework.scheduling

Context: in my country there will be a new way to Instantly Payment previewed for November. Basically, the Central Bank will provide two endpoints: (1) one POST endpoint which we post a single money transfer and (2) one GET endpoint where we get the result of a money transfer sent before and it can be completely out of order. It will answer back only on Money Transfer result and in its header will inform if there is another result we must GET. It never informs how many results are available. If there is a result it gives back on Get response and only inform if it is the last one or there is remaining ones for next GET.
Top limitation: from the moment final user clicks Transfer button in his/her mobile app until final result showing in his mobile screen if it was successful or failed is 10 seconds.
Strategy: I want a schedule which triggers each second or even less than a second a Get to Central Bank. The Scheduler will basically evoke a simple function which
Calls the Get endpoint
Pushes it to a Kafka or persist in database and
If in the answer headers it is informed more results are available, start same function again.
Issue: Since we are Spring users/followers, I though my decision was between Spring Batch versus org.springframework.scheduling.annotation.SchedulingConfigurer/TaskScheduler. I have used successfully Spring Batch for while but never for a so short period trigger (never used for 1 second period). I stumbled in discussion that drove me to think if in my case, a very simple task but with very short period, I should consider Spring Cloud Data Flow or Spring Cloud Task instead of Spring Batch.
According to this answer "... Spring Batch is ... designed for the building of complex compute problems ... You can orchestrate Spring Batch jobs with Spring Scheduler if you want". Based on that, it seems I shouldn't use Spring Batch because it isn't complex my case. The challenge design decision is more regard a short period trigger and triggering another batch from current batch instead of transformation, calculation or ETL process. Nevertheless, as far as I can see Spring Batch with its tasklet is well-designed for restarting, resuming and retrying and fits well a scenario which never finishes while org.springframework.scheduling seems to be only a way to trigger an event based on period configuration. Well, this is my filling based on personal uses and studies.
According to an answer to someone asking about orchestration for composed tasks this answer "... you can achieve your design goals using Spring Cloud Data Flow along with the Spring Cloud Task/Spring Batch...". In my case, I don't see composed tasks. In my case, the second trigger doesn't depend on result from previous one. It sounds more as "chained" tasks instead of "composed". I have never used Spring Cloud Data Flow but it seems a nice candidate for Manage/View/Console/Dashboards the triggered task. Nevertheless, I didn't find anywhere informing limitations or rule of thumbs for short periods triggers and "chained" triggers.
So my straight question is: what is the current recommend Spring members for a so short period trigger? Assuming Spring Cloud Data Flow is used for manager/dashboard what is the trigger member from Spring recommended in so short trigger scenarios? It seems Spring Cloud Task is designed for calling complex functions and Spring Batch seems to add too much than I need and org.springframework.scheduling.* missing integration with Spring Cloud Data Flow. As an analogy and not as comparison, in AWS, the documentation clear says "don't use CloudWatch for less than one minute. If you want less than one minute, start CloudWatch for each minute that start another scheduler/cron each second". There might be a well-know rule of thumb for a simple task that needs to be trigger each second or even less than one second and take advantage of Spring family approach/concerns/experience.
This may be stupid answer. Why do you need scheduler here?. Wouldn't a never ending job will achieve the goal here?
You start a job, it does a GET request, push the result to kafka,
If the GET response indicated, it had more results, it immediately does a GET again, push the result to kafka
If the GET response indicated, there are no more results, sleep for 1 second, do the GET request again.

What is the difference between a circuit breaker and a bulkhead pattern?

Can we use both together in Spring Boot during the development of microservice?
These are fundamentally different patterns.
A circuit breaker pattern is implemented on the caller, to avoid overwhelming a service which may be struggling to handle calls. A sample implementation in Spring can be found here.
A bulkhead pattern is implemented on the service, to prevent a failure during the handling of a single incoming call impacting the handling of other incoming calls. A sample implementation in Spring can be found here.
The only thing these patters have in common is that they are both designed to increase the resilience of a distributed system.
While you can certainly use them together in the same service, you must understand that they are not related to each other, as one is concerned with making calls and the other is concerned with handling calls.
Yes, they can be used together, but it's not always necessary.
As #tom redfern said, circuit breaker is implemented on the caller side. So, if you are sending request to another service, you should wrap those requests into a circuit breaker specific to that service. Keep in mind that every other third party system or service should have it's own circuit breaker. Otherwise, the unavailability of one system will impact the requests that you are sending to the other by opening the circuit breaker.
More informations about circuit breaker can be found here: https://learn.microsoft.com/en-us/azure/architecture/patterns/circuit-breaker
Also, #tom redfern is right again in the case of bulkheading, this is a pattern which is implemented in the service that is called. So, if you are reacting to external requests by spanning other multiple requests or worloads, you should avoid doing all those worloads into a single unit (thread). Instead, separate the worloads into pieces (thread pools) for each request that you have spanned.
More information about bulkheading can be found here: https://learn.microsoft.com/en-us/azure/architecture/patterns/bulkhead
Your question was if it's possible to use both these patterns in the same microservice. The answer is: yes, you can and very often the situation implies this.

How to handle microservice Interaction when one of the microservice is down

I am new to microservice architecture. Currently I am using spring boot for my microservices, in case one of the microservice is down how should fail over mechanism work ?
For Ex. if we have 3 microservices M1,M2,M3 . M1 is interacting with M2 and M2 is interacting with M3 . In case M2 microservice cluster is down how should we handle this situation?
When any one of the microservice is down, Interaction between services becomes very critical as isolation of failure, resilience and fault tolerance are some of key characteristics for any microservice based architecture.
Totally agreed what #jayant had answered, in your case Implementing proper fallback mechanism makes more sense and you can implement required logic you wanna write based on use case and dependencies between M1, M2 and M3.
you can also raise events in your fallback if needed.
Since you are new to microservice, you need to know below common techniques and architecture patterns for resilience and fault tolerance against the situation which you have raised in your question. And here you are using Spring-Boot, you can easily add Netflix-OSS in your microservices.
Netflix has released Hystrix, a library designed to control points of access to remote systems, services and 3rd party libraries, providing greater tolerance of latency and failure.
It include below important characteristics:
Importance of Circuit breaker and Fallback Mechanism:
Hystrix implements the circuit breaker pattern which is useful when a
service failure can cause cascading failure all the way up to the user.
When calls to a particular service exceed
circuitBreaker.requestVolumeThreshold (default: 20 requests) and the
failure percentage is greater than
circuitBreaker.errorThresholdPercentage (default: >50%) in a rolling
window defined by metrics.rollingStats.timeInMilliseconds (default: 10
seconds), the circuit opens and further calls are not made.
In cases of error and an open circuit, a fallback can be provided by the
developer. Fallbacks may be chained so that the first fallback makes
some other business call. check out Fallback Implementation of Hystrix
Retry:
When a request fails, you may want to have the request be retried
automatically. Ribbon does this job for us.
In distributed system, a microservices system retry can trigger multiple
other requests or retries and start a cascading effect
here are some properties to look of Ribbon
sample-client.ribbon.MaxAutoRetries=1
Max number of next servers to retry (excluding the first server)
sample-client.ribbon.MaxAutoRetriesNextServer=1
Whether all operations can be retried for this client
sample-client.ribbon.OkToRetryOnAllOperations=true
Interval to refresh the server list from the source
sample-client.ribbon.ServerListRefreshInterval=2000
More details :- ribbon properties
Bulkhead Pattern:
In general, the goal of the bulkhead pattern is to avoid faults in one
part of a system to take the entire system down. bulkhead pattern
The bulkhead implementation in Hystrix limits the number of concurrent
calls to a component. This way, the number of resources (typically
threads) that is waiting for a reply from the component is limited.
Assume you have a request based, multi threaded application (for example
a typical web application) that uses three different components, M1, M2,
and M3. If requests to component M3 starts to hang, eventually all
request handling threads will hang on waiting for an answer from M3.
This would make the application entirely non-responsive. If requests to
M3 is handled slowly we have a similar problem if the load is high
enough.
Implementation details can be found here
So, These are some factors you need to consider while handling microservice Interaction when one of the microservice is down.
As mentioned in the comment, there are many ways you can go about it,
case 1: all are independent services, trivial case, no need to do anything, call all the services in blocking or non-blocking way, calling service 2 will in both case result in timeout
case 2: services are dependent M2 depends on M1 and M3 depends on M2
option a) M1 can wait for service M2 to come back up, doing periodic pings or fetching details from registry or naming server if M2 is up or not
option b) use hystrix as a circuit breaker implementation and handle fallback gracefully in M3 or your orchestrator(guy who is calling these services i.e M1,M2,M3 in order)

Spring batch JMS writer/reader example

Anybody know of a good resource for a detailed (more so than the Spring Batch docs) look at the uses of JMS Item Writer/Reader in Spring Batch?
Specifically, and because I'm being tasked with trying to reuse an existing system whose only interface is asynchronous over a queue, I'm wondering if the following is possible:
Step 1: read some data and build a message.
Step 2: Drop message on queue using JMSItemWriter.
Step 3: Wait for message to come back using JMSItemReader on the response queue.
Step 4: Do some other stuff
...
Rinse and repeat, a few thousand times a day.
Or in other words, essentially using Spring Batch to force synchronous interaction with an asynchronous resource. I'd like to make sure before I get further in research, that this is A) possible, and B) not shameless abuse of the framework that will cause major headaches down the road.
Thanks in advance for any info.

Resources