Hystrix circuit not closing after downstream service recovers

Hystrix circuit not closing after downstream service recovers - spring-boot

I'm playing with the bootiful-microservice project by Josh Long. (Brixton subproject)
On the reservation-service I have added a simple status method that can sleep a configurable amount of time to simulate load:
#RequestMapping(method = RequestMethod.GET, value = "/status")
public String status(){
System.out.println("Checking status");
try {
Thread.sleep((long) (rand.nextDouble()*sleepTime));
} catch (InterruptedException e) {
e.printStackTrace();
}
return "All is good";
}
The sleepTime variable is pulled from the Spring Config Server
On the reservation-client I have added an entry point in the gateway:
#FeignClient("reservation-service")
interface ReservationReader {
#RequestMapping(method = RequestMethod.GET, value = "/reservations")
Resources<Reservation> readReservations();
#RequestMapping(method = RequestMethod.GET, value = "/status")
String status();
}
}
and I'm using an HystrixCommand
#HystrixCommand(fallbackMethod = "statusFallback")
#RequestMapping(method = RequestMethod.GET, value = "/status")
public String status(){
return reader.status();
}
public String statusFallback(){
return "Bad";
}
This all works well.
I set the sleeping time to 1500ms so that some request will be above the Hystrix default threshold (1000ms).
When I start hitting the API I get some failures due to timeout. If I hit long enough (50 times seems to work) the circuit breaker trigger and the circuit becomes open:
My understanding is that as the downstream service becomes healthy again Hystrix will try to route 1 call and use it as a health check. If the call is successful circuit should be closed again.
However this is not happening here. The circuit will remain open even after changing the sleeping time to a smaller value (let's say 500ms). None of my calls are routed towards the reservation-services and the fallback is used on every call. The only way I can get the circuit to close again is to restart the reservation-client service.
Did I miss something? Is it an issue with Hystrix? Or with the Spring Integration?
UPDATE
I did further testing and I can confirm that the circuit will remain close forever, even after the sleeping has been reduced.
However if I use a route in the Zuul configuration I get the expected behaviour. The circuit closes itself if it sees a request that doesn't time out.
I have noticed another difference between forwarding by route compare to manually doing it in Spring. If I create a filter my /status/ call on the client does not trigger the filter. When I setup a route (eg. /foos/status => /status) it will trigger the filter and Hystrix behaves properly.
Is that a bug in Spring?

Related

minimumNumberOfCalls not working in resilience4j

Using springboot 2.4 and resilience4j 1.5,
i have configured my yaml file,
resilience4j:
circuitbreaker:
configs:
default:
registerHealthIndicator: true
slidingWindowSize: 10
minimumNumberOfCalls: 5
permittedNumberOfCallsInHalfOpenState: 3
automaticTransitionFromOpenToHalfOpenEnabled: true
waitDurationInOpenState: 50s
failureRateThreshold: 50
eventConsumerBufferSize: 10
instances:
movieCatalog:
baseConfig: default
and in the movieCatalog instance,
#RequestMapping("/{userId}")
#CircuitBreaker(name = CATALOG_SERVICE, fallbackMethod="fallBackCatalog")
public List<CatalogItem> getCatalog(#PathVariable("userId") String userId) {
UserRating ratings = restTemplate.getForObject("http://ratings-data-service/ratingsdata/users/"+userId, UserRating.class);
return ratings.getUserRatings().stream()
.map(rating -> {
Movie movie = restTemplate.getForObject("http://movie-info-sevice/movies/" + rating.getMovieId(), Movie.class);
return new CatalogItem(movie.getName(), movie.getDescription(), rating.getRating());
})
.collect(Collectors.toList());
}
private List<CatalogItem> fallBackCatalog(Exception e) {
List<CatalogItem> fallBack = new ArrayList<>();
fallBack.add(new CatalogItem("movie1", "movie desc", 3));
return fallBack;
}
i see that when i get an exception in the above getCatalog method the i get the fallback result immediately on the first call. my understanding is for the first 5 calls i should see an exception and from the 6 th call since more than 50% of the calls are exceptions(100% errors) i should see the fallback result in the 6th call. i have several errors before configuring fall back method, is there a cache which records the previous calls, and i guess if at all there is cache it should be cleared when the spring boot app is restarted right ? please explain if im missing something. any pointers are greatly appreciated.

The fallback mechanism is like a try/catch. It's independent of your CircuitBreaker configuration.
If you only want to execute a fallback method when the CircuitBreaker is open, then narrow down the scope from Exception to CallNotPermittedException.

Resilience4J will fail-fast by throwing a CallNotPermittedException until the state changes to closed or according to our configuration.
So with a fallback method, when the circuit breaker trips to Open state, it will no longer throw a CallNotPermittedException but instead will return the response INTERNAL_SERVER_ERROR.
I also agree with #Robert Winkler

Keep calling 3rd party until it returns expected response with Hystrix

I am looking for a way to call 3rd party service from my code(Spring Boot app), and in case it is unresponsive, I would like to repeat the call x amount of times and then provide a default fallback. I found an example pseudocode that would probably work in my case with Hystrix
public class ExampleClass {
#HystrixCommand(fallbackMethod = "example_Fallback")
public String myMethod() {
// third party service
String response = httpClient.execute();
return "OK";
}
private String example_Fallback() {
return "ERROR HAPPENED";
}
}
However, I would also like to repeat the call to same third-party service x amount of times if it returns a normal response that's unexpected.(treat that specific response as if the third party is unresponsive). The reason for that is because, third party might not be able to serve the request and I can only check that in the response. Could someone point me in the right direction or provide an example how this could be solved with Hystrix ?

...I would like to repeat the call x amount of times and then provide
a default fallback.
Configuring circuitBreaker.requestVolumeThreshold may help here. Take a look at the other Hystrix properties as well.
#HystrixCommand(fallbackMethod = "example_Fallback", commandProperties = {
#HystrixProperty(name = "circuitBreaker.requestVolumeThreshold", value = "5"),
#HystrixProperty(name = "metrics.rollingStats.timeInMilliseconds", value = "2000")
}
)
public String myMethod() {
...
}
Notice that circuitBreaker.requestVolumeThreshold (quote) "...sets the minimum number of requests in a rolling window that will trip the circuit". The rolling window duration - metrics.rollingStats.timeInMilliseconds - defaults to 10 seconds.
There's also the #Retryable annotation in Spring.

Consuming from Camel queue every x minutes

Attempting to implement a way to time my consumer to receive messages from a queue every 30 minutes or so.
For context, I have 20 messages in my error queue until x minutes have passed, then my route consumes all messages on queue and proceeds to 'sleep' until another 30 minutes has passed.
Not sure the best way to implement this, I've tried spring #Scheduled, camel timer, etc and none of it is doing what I'm hoping for. I've been trying to get this to work with route policy but no dice in the correct functionality. It just seems to immediately consume from queue.
Is route policy the correct path or is there something else to use?

The route that reads from the queue will always read any message as quickly as it can.
One thing you could do is start / stop or suspend the route that consumes the messages, so have this sort of set up:
route 1: error_q_reader, which goes from('jms').
route 2: a timed route that fires every 20 mins
route 2 can use a control bus component to start the route.
from('timer?20mins') // or whatever the timer syntax is...
.to("controlbus:route?routeId=route1&action=start")
The tricky part here is knowing when to stop the route. Do you leave it run for 5 mins? Do you want to stop it once the messages are all consumed? There's probably a way to run another route that can check the queue depth (say every 1 min or so), and if it's 0 then shutdown route 1, you might get it to work, but I can assure you this will get messy as you try to deal with a number of async operations.
You could also try something more exotic, like a custom QueueBrowseStrategy which can fire an event to shutdown route 1 when there are no messages on the queue.

I built a customer bean to drain a queue and close, but it's not a very elegant solution, and I'd love to find a better one.
public class TriggeredPollingConsumer {
private ConsumerTemplate consumer;
private Endpoint consumerEndpoint;
private String endpointUri;
private ProducerTemplate producer;
private static final Logger logger = Logger.getLogger( TriggeredPollingConsumer.class );
public TriggeredPollingConsumer() {};
public TriggeredPollingConsumer( ConsumerTemplate consumer, String endpoint, ProducerTemplate producer ) {
this.consumer = consumer;
this.endpointUri = endpoint;
this.producer = producer;
}
public void setConsumer( ConsumerTemplate consumer) {
this.consumer = consumer;
}
public void setProducer( ProducerTemplate producer ) {
this.producer = producer;
}
public void setConsumerEndpoint( Endpoint endpoint ) {
consumerEndpoint = endpoint;
}
public void pollConsumer() throws Exception {
long count = 0;
try {
if ( consumerEndpoint == null ) consumerEndpoint = consumer.getCamelContext().getEndpoint( endpointUri );
logger.debug( "Consuming: " + consumerEndpoint.getEndpointUri() );
consumer.start();
producer.start();
while ( true ) {
logger.trace("Awaiting message: " + ++count );
Exchange exchange = consumer.receive( consumerEndpoint, 60000 );
if ( exchange == null ) break;
logger.trace("Processing message: " + count );
producer.send( exchange );
consumer.doneUoW( exchange );
logger.trace("Processed message: " + count );
}
producer.stop();
consumer.stop();
logger.debug( "Consumed " + (count - 1) + " message" + ( count == 2 ? "." : "s." ) );
} catch ( Throwable t ) {
logger.error("Something went wrong!", t );
throw t;
}
}
}
You configure the bean, and then call the bean method from your timer, and configure a direct route to process the entries from the queue.
from("timer:...")
.beanRef("consumerBean", "pollConsumer");
from("direct:myRoute")
.to(...);
It will then read everything in the queue, and stop as soon as no entries arrive within a minute. You might want to reduce the minute, but I found a second meant that if JMS as a bit slow, it would time out halfway through draining the queue.
I've also been looking at the sjms-batch component, and how it might be used with with a pollEnrich pattern, but so far I haven't been able to get that to work.

I have solved that by using my application as a CronJob in a MicroServices approach, and to give it the power of gracefully shutting itself down, we may set the property camel.springboot.duration-max-idle-seconds. Thus, your JMS consumer route keeps simple.
Another approach would be to declare a route to control the "lifecycle" (start, sleep and resume) of your JMS consumer route.
I would strongly suggest that you use the first approach.

If you use ActiveMQ you can leverage the Scheduler feature of it.
You can delay the delivery of a message on the broker by simply set the JMS property AMQ_SCHEDULED_DELAY to the number of milliseconds of the delay. Very easy in the Camel route
.setHeader("AMQ_SCHEDULED_DELAY", 60000)
It is not exactly what you look for because it does not drain a queue every 30 minutes, but instead delays every individual message for 30 minutes.
Notice that you have to enable the schedulerSupport in your broker configuration. Otherwise the delay properties are ignored.
<broker brokerName="localhost" dataDirectory="${activemq.data}" schedulerSupport="true">
...
</broker>

You should consider Aggregation EIP
from(URI_WAITING_QUEUE)
.aggregate(new GroupedExchangeAggregationStrategy())
.constant(true)
.completionInterval(TIMEOUT)
.to(URI_PROCESSING_BATCH_OF_EXCEPTIONS);
This example describes the following rules: all incoming in URI_WAITING_QUEUE objects will be grouped into List. constant(true) is a grouping condition (wihout any). And every TIMEOUT period (in millis) all grouped objects will be passed into URI_PROCESSING_BATCH_OF_EXCEPTIONS queue.
So the URI_PROCESSING_BATCH_OF_EXCEPTIONS queue will deal with List of objects to process. You can apply Split EIP to split them and to process one by one.

Tracking response times of API calls in springboot

I'm looking to track the response times of API calls.
I then want to plot the response times of the calls( GET, PUT, POST DELETE) on a graph afterwards to compare the time differences.
This is what I'm currently doing to find the response time of a GET call but I'm not quite sure if it's right.
#RequestMapping(value="/Students", method = RequestMethod.GET)
public ResponseEntity<List<Students>> getStudents()
{
long beginTime = System.currentTimeMillis();
List<Students> students = (List<Students>) repository.findAll();
if(students.isEmpty())
{
return new ResponseEntity(HttpStatus.NO_CONTENT);
}
long responseTime = System.currentTimeMillis() - beginTime;
logger.info("Response time for the call was "+responseTime);
return new ResponseEntity(students, HttpStatus.OK);
}
I believe I am returning the response time before I actually return the data to the client which is the whole point of this but I wouldn't be able to put it after the return statement as it would be unreachable code.
Are there any better ways of trying to track the times of the calls?

You can use Around Advice of springboot and in the advice you can log the time. The way it works is once a call is made to the controller, the Around Advice intercepts it and starts a Timer(to record the time taken). From the advice we proceed to the main controller using jointPoint.proceed() method. Once the controller returns the value you can log the timer value. return the Object.
Here is the sample code:
in build.grale include
compile("org.aspectj:aspectjweaver:1.8.8")
Create a Component Class and put around #Aspect
#Component
#Aspect
public class advice{
#Around(("#annotation(logger)")
public Object loggerAspect(ProceedingJoinPoint joinPoint){
// start the Timer
Object object = jointPoint.proceed();
// log the timer value
return object;
}
}
Add annotation #logger in the controller method. you are good to go.
Hope this helps.
you can refer the link for full explanation.

Spring Sleuth - Tracing Failures

In a microservice environment I see two main benefits from tracing requests through all microservice instances over an entire business process.
Finding latency gaps between or in service instances
Finding roots of failures, whether technical or regarding the business case
With Zipkin there is a tool, which addresses the first issue. But how can tracing be used to unveil failures in your microservice landscape? I definitely want to trace all error afflicted spans, but not each request, where nothing went wrong.
As mentioned here a custom Sampler could be used.
Alternatively, you may register your own Sampler bean definition and programmatically make the decision which requests should be sampled. You can make more intelligent choices about which things to trace, for example, by ignoring successful requests, perhaps checking whether some component is in an error state, or really anything else.
So I tried to implement that, but it doesn't work or I used it wrong.
So, as the blog post suggested I registered my own Sampler:
#Bean
Sampler customSampler() {
return new Sampler() {
#Override
public boolean isSampled(Span span) {
boolean isErrorSpan = false;
for(String tagKey : span.tags().keySet()){
if(tagKey.startsWith("error_")){
isErrorSpan = true;
}
}
return isErrorSpan ;
}
};
}
And in my controller I create a new Span, which is being tagged as an error if an exception raises
private final Tracer tracer;
#Autowired
public DemoController(Tracer tracer) {
this.tracer = tracer;
}
#RequestMapping(value = "/calc/{i}")
public String calc(#PathVariable String i){
Span span = null;
try {
span = this.tracer.createSpan("my_business_logic");
return "1 / " + i + " = " + new Float(1.0 / Integer.parseInt(i)).toString();
}catch(Exception ex){
log.error(ex.getMessage(), ex);
span.logEvent("ERROR: " + ex.getMessage());
this.tracer.addTag("error_" + ex.hashCode(), ex.getMessage());
throw ex;
}
finally{
this.tracer.close(span);
}
}
Now, this doesn't work. If I request /calc/a the method Sampler.isSampled(Span) is being called before the Controller method throws a NumberFormatException. This means, when isSampled() checks the Span, it has no tags yet. And the Sampler method is not being called again later in the process. Only if I open the Sampler and allow every span to be sampled, I see my tagged error-span later on in Zipkin. In this case Sampler.isSampled(Span) was called only 1 time but HttpZipkinSpanReporter.report(Span) was executed 3 times.
So what would the use case look like, to transmit only traces, which have error spans ? Is this even a correct way to tag a span with an arbitrary "error_" tag ?

The sampling decision is taken for a trace. That means that when the first request comes in and the span is created you have to take a decision. You don't have any tags / baggage at that point so you must not depend on the contents of tags to take this decision. That's a wrong approach.
You are taking a very custom approach. If you want to go that way (which is not recommended) you can create a custom implementation of a SpanReporter - https://github.com/spring-cloud/spring-cloud-sleuth/blob/master/spring-cloud-sleuth-core/src/main/java/org/springframework/cloud/sleuth/SpanReporter.java#L30 . SpanReporter is the one that is sending spans to zipkin. You can create an implementation that will wrap an existing SpanReporter implementation and will delegate the execution to it only when some values of tags match. But from my perspective it doesn't sound right.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Hystrix circuit not closing after downstream service recovers - spring-boot

Related

minimumNumberOfCalls not working in resilience4j

Keep calling 3rd party until it returns expected response with Hystrix

Consuming from Camel queue every x minutes

Tracking response times of API calls in springboot

Spring Sleuth - Tracing Failures

Categories

Resources