Resilience Circuit breaker config update dynamically - spring

I have configured the C.B for Client_Service which calls to Remote_Service, in Fallback situation, it calls to Health_Service and this (called service) provides you the Future_Timestamp of Remote_Service to come-back as up & running state.
Now I want to configure my C.B as per this Future_Timestamp, if it is 2 hours later then I want to keep my C.B as Open state (so for 2 hours, no calls to Remote_Service) and post 2 hours, request can call to Remote_Service if it succed C.B can be change into Half-Open then to Closed or if it fails again, Fallback method will call to Health_Service and same pattern will follow.
Is there anyway out for this problem statement or any alternative solution?

Related

Form Recognizer Heavy Workload

My use case is the following :
Once every day I upload 1000 single page pdf to Azure Storage and process them with Form Recognizer via python azure-form-recognizer latest client.
So far I’m using the Async version of the client and I send the 1000 coroutines concurrently.
tasks = {asyncio.create_task(analyse_async(doc)): doc for doc in documents}
pending = set(tasks)
# Handle retry
while pending:
# backoff in case of 429
time.sleep(1)
# concurrent call return_when all completed
finished, pending = await asyncio.wait(
pending, return_when=asyncio.ALL_COMPLETED
)
# check if task has exception and register for new run.
for task in finished:
arg = tasks[task]
if task.exception():
new_task = asyncio.create_task(analyze_async(doc))
tasks[new_task] = doc
pending.add(new_task)
Now I’m not really comfortable with this setup. The main reason being the unpredictable successive states of the service in the same iteration. Can be up then throw 429 then up again. So not enough deterministic for me. I was wondering if another approach was possible. Do you think I should rather increase progressively the transactions. Start with 15 (default TPS) then 50 … 100 until the queue is empty ? Or another option ?
Thx
We need to enable the CORS and make some changes to that CORS to make it available to access the heavy workload.
Follow the procedure to implement the heavy workload in form recognizer.
Make it for page blobs here for higher and best performance.
Redundancy is also required. Make it ZRS for better implementation.
Create a storage account to upload the files.
Go to CORS and add the URL required.
Set the Allowed origins to https://formrecognizer.appliedai.azure.com
Go to containers and upload the documents.
Upload the documents. Use the container and blob information to give as the input for the recognizer. If the case is from Form Recognizer studio, the size of the total documents is considered and also the number of characters limit is there. So suggested to use the python code using the container created as the input folder.

Resilience4J Circuit Breaker does not switch back to CLOSED state

I have implemented a circuit breaker for a service that is called once every 5 minutes.
Looking at the below screenshots:
Captures when the outage occurred in the service
Captures when the circuit breaker opens
All the calls to the service, after the calls that triggered the circuit breaker to OPEN, are going successfully. (Notice the timeout count = 2 for the third bar from starting)
I have the following circuit breaker configurations:
circuit-breaker:
automatic-transition-from-open-to-half-open-enabled: ${AMA_AUTOMATIC_TRANSITION_FROM_OPEN_TO_HALF_OPEN_ENABLED:true}
failure-rate-threshold: ${AMA_FAILURE_RATE_THRESHOLD:40}
max-wait-duration-in-half-open-state: ${AMA_MAX_WAIT_DURATION_IN_HALF_OPEN_STATE:5s}
permitted-number-of-calls-in-half-open-state: ${AMA_PERMITTED_NUM_OF_CALLS_IN_HALF_OPEN_STATE:10}
register-health-indicator: ${AMA_REGISTER_HEALTH_INDICATOR:true}
sliding-window-size: ${AMA_SLIDING_WINDOW_SIZE:5}
sliding-window-type: ${AMA_SLIDING_WINDOW_TYPE:COUNT_BASED}
slow-call-duration-threshold: ${AMA_SLOW_CALL_DURATION_THRESHOLD:5s}
slow-call-rate-threshold: ${AMA_SLOW_CALL_RATE_THRESHOLD:40}
wait-duration-in-open-state: ${AMA_WAIT_DURATION_IN_OPEN_STATE:5s}
writable-stack-trace-enabled: ${AMA_WRITABLE_STACK_TRACE_ENABLED:true}
minimum-number-of-calls: ${AMA_MINIMUM_NUM_OF_CALL:5}
timeout-duration: ${AMA_TIMEOUT_DURATION:5s}
Firstly, according to the configuration, at least 5 calls should fail for the circuit breaker to switch to OPEN, but from the graph, we can see that only 2 failures switch it to OPEN which is not correct.
Secondly, Why it is still in an OPEN state after 2 hours. It should have switched back to CLOSED by now.

Circuit Breaker for an orchestrator

I have a facade which calls 3 different services for some type of requests and finally orchestrates the responses before sending the response back to the client. Here, it is mandatory that all 3 services are up and serving as expected. The client request can not be served even one of them is down.
I am looking for a circuit breaker to solve this problem. The circuit breaker should respond with error code even one of the service is down. I was checking the resilence4j circuit breaker and it doesnt fit for my problem.
https://resilience4j.readme.io/docs/circuitbreaker
Is there any other open source available?
Why doesn't it fit to you problem?
You can protect every service with a CircuitBreaker. As soon one of the CircuitBreakers is open, you can short circuit and directly return an error response to your client.
CircuitBreaker Works on protected function as below –
Thread <—> CircuitBreaker <—> Protected_Function
So a Protected_Function can call 1 or more microservices, Mostly we use 1 Protected_Function for 1 external micro service call because we have can tune resilience based on the profile or behavior of that particular micro-service. But as your requirement is different so we can have 3 calls under 1 Protected_Function.
So as per your explanation above, you Façade is calling 3 micro-services (assume in series). What you can do is to call you Façade or all 3 services through or inside a Protected Function –
#CircuitBreaker(name = "OVERALL_PROTECTION")
public Your_Response Protected_Function (Your_Request) {
Call_To_Service_1;
Call_To_Service_2;
Call_To_Service_3;
return Orchestrate_Your_Response;
}
Further you can add resilience for OVERALL_PROTECTION in your YAML property file as below (I have used Count based Sliding Window) –
resilience4j.circuitbreaker:
backends:
OVERALL_PROTECTION:
registerHealthIndicator: true
slidingWindowSize: 100 # start rate calc after 100 calls
minimumNumberOfCalls: 100 # minimum calls before the CircuitBreaker can calculate the error rate.
permittedNumberOfCallsInHalfOpenState: 10 # number of permitted calls when the CircuitBreaker is half open
waitDurationInOpenState: 10s # time that the CircuitBreaker should wait before transitioning from open to half-open
failureRateThreshold: 50 # failure rate threshold in percentage
slowCallRateThreshold: 100 # consider all transactions under interceptor for slow call rate
slowCallDurationThreshold: 2s # if a call is taking more than 2s then increase the error rate
recordExceptions: # increment error rate if following exception occurs
- org.springframework.web.client.HttpServerErrorException
- java.io.IOException
- org.springframework.web.client.ResourceAccessException
You can also use time based slidingWindow instead of count based if you wish, Rest I have mentioned #Comment for self explanation in front of each parameter in configuration.
resilience4j.retry:
instances:
OVERALL_PROTECTION:
maxRetryAttempts: 5
waitDuration: 100
retryExceptions:
- org.springframework.web.client.HttpServerErrorException
- java.io.IOException
- org.springframework.web.client.ResourceAccessException
Above configuration will perform a retry for 5 times if Exceptions under retryExceptions occurs.
resilience4j.ratelimiter:
instances:
OVERALL_PROTECTION:
timeoutDuration: 100ms #The default wait time a thread waits for a permission
limitRefreshPeriod: 1000 #The period of a limit refresh. After each period the rate limiter sets its permissions count back to the limitForPeriod value
limitForPeriod: 25 #The number of permissions available during one limit refresh period
Above configuration will allow maximum up to 25 transactions in 1 second.

Spring-retry restarts after maxAttempts reached when no recover method is provided

Playing with spring-retry with spring-boot 1.5.21 and noticing that spring-retry restarts when maxAttempts is reached when there is no recover method implemented.
Works as expected if proper recover method is implemented. If no recover method, retry doesnt stop at maxAttempts, but restarts again. # of restarts is equal to configured maxAttempts. Eg, max attempts =3, retry will execute 9 times (running 3 retries * 3 restarts)
Using annotations to setup the retry block
#Retryable(include= {ResourceAccessException.class}, maxAttemptsExpression = "${retry.maxAttempts}", backoff = #Backoff(delayExpression = "${retry.delay}", multiplierExpression = "${retry.delay-multiplier}"))
expected results with a maxAttempts =3 is retry stops after 3 attempts
actual results is retry will restart the 3 attempts 3 more times, for a total of 9 retries.
The above occurs ONLY when no recover method is provided. Based on documentation, recover method is optional and i have no need for one, since there is no valid recovery in my case for a failed REST service call. (no redundant service available)
If there is no recoverer, the final exception is thrown.
If the source of the call is a listener container (e.g. RabbitMQ, JMS) then the delivery will be retried.
That's the whole point of a recoverer.

Regulating / rate limiting ruby mechanize

I need to regulate how often a Mechanize instance connects with an API (once every 2 seconds, so limit connections to that or more)
So this:
instance.pre_connect_hooks << Proc.new { sleep 2 }
I had thought this would work, and it sort of does BUT now every method in that class sleeps for 2 seconds, as if the mechanize instance is touched and told to hold 2 seconds. I'm going to try a post connect hook, but it is obvious I need something a bit more elaborate, but what I don't know what at this point.
Code is more explanation so if you are interested following along: https://github.com/blueblank/reddit_modbot, otherwise my question concerns how to efficiently and effectively rate limit a Mechanize instance to within a specific time frame specified by an API (where overstepping that limit results in dropped requests and bans). Also, I'm guessing I need to better integrate a mechanize instance to my class as well, any pointers on that appreciated as well.
Pre and post connect hooks are called on every connect, so if there is some redirection it could trigger many times for one request. Try history_added which only gets called once:
instance.history_added = Proc.new {sleep 2}
I use SlowWeb to rate limit calls to a specific URL.
require 'slowweb'
SlowWeb.limit('example.com', 10, 60)
In this case calls to example.com domain are limited to 10 requests every 60 seconds.

Resources