zuul (without eureka) - always ends up in "Forwarding error" - spring-boot

I have configured zuul with 2 instances using ribbon (without eureka) as below:
zuul.retryable=true
zuul.routes.simple-ms-app.serviceId: client
client.ribbon.listOfServers=http://localhost:7788,http://localhost:8877
When both the instances 7788 & 8877 are up and running, everything goes fine.
When the first instance in the listOfServers is down, then the request ends up in the below error:
com.netflix.zuul.exception.ZuulException: Forwarding error
I am using the below version configuration:
spring-boot : 2.0.7.RELEASE
spring-cloud: Finchley.SR2
If anyone had faced similar issue and managed to figure out a solution, please share it here.
Thank you.

By default, Zuul throws exception (instead of throwing 503/404) when upstream service is not available. This behavior has been discussed in detail in Zuul swallows 503 exceptions from upstream microservices GitHub thread.
To handle this case and configure Zuul to retry on (current and next ) available instances, you need to do two things:
Extend ErrorFilter and handle the exception with custom behavior
Configure retry for Zuul
Extend ErrorFilter and provide custom logic to return 404 or 503 status code. Some of the approaches to deal with this exception is explain in this SO thread: Customizing Zuul Exception.
Retry in Zuul can be configured using following application properties:
zuul:
retryable: true
ribbon:
MaxAutoRetries: 1
MaxAutoRetriesNextServer: 3
OkToRetryOnAllOperations: true
yourApplication:
ribbon:
listOfServers: instance-1-url, instance-2-url
Please note that Spring retry is a dependency for retry in Zuul.

Related

CAS Actuator Health Endpoints Return 403 Intermittently

I recently upgraded CAS to 6.4.6.x and noticed that the liveness/readiness probes will intermittently throw 403 error codes. It appears to be a threading issue in the Spring Security Filter Chain. I have validated with the barebone CAS images that this does not happen in the 6.3.x version but can repeat it rather easily with the 6.4.x version. My configuration has not changed after the upgrade and I'm following the documentation.
Endpoint Configuration:
# allow all by default
cas.monitor.endpoints.endpoint.defaults.access[0]=PERMIT
# enable the health endpoint
management.endpoints.enabled-by-default=true
management.endpoints.web.base-path=/actuator
management.endpoints.web.exposure.include=health
management.endpoint.health.enabled=true
Running load tests against the instance if I send 1 request at a time I get 200 responses. If I bump up the concurrency to 2 or more I'm able to reproduce the threading issue and some of the responses return with a 403 after getting picked up by the Spring Default Error Controller.
Setting a breakpoint on the Error Controller, I'm able to see the same thread in the logs essentially jump to two different points in the code path.
I've gone through the Pull Requests from 6.3.x to 6.4.x and nothing jumped out to me that might be causing this issue. I haven't seen any issues raised up in Spring Boot around the Actuator Health Points failing. I've bumped up Spring and Tomcat to the latest patch versions. Any thoughts on what could be causing this or other things I could try to determine how to fix it?

feign.RetryableException: Read timed out executing GET

I have below architecture in my project
My UI Service(Port 8080) making Feign call to Gateway Service(Port 8085).
My Get call from UI service is " http://localhost:8080/invoice-list?startDate=2018-08-05&endDate=2018-10-05 "
Similar call from Gateway Service "http://localhost:8085/invoice-download-service/invoice-list?startDate=2018-08-05&endDate=2018-10-05"
When i make this GET call from UI service i get below error within minute
is feign.RetryableException: Read timed out executing GET http://localhost:8085/invoice-download-service/invoice-list?startDate=2018-08-05&endDate=2018-10-05] with root cause
java.net.SocketTimeoutException: Read timed out
But when i make direct call from Gateway Server to microservice, i dont get error.
Application.properties file of Gateway service
hystrix.command.default.execution.isolation.thread.timeoutInMilliseconds=160000000
ribbon.OkToRetryOnAllOperations=true
ribbon.ReadTimeout=5000000
ribbon.ConnectTimeout=5000000
ribbon.MaxAutoRetries=3
ribbon.MaxAutoRetriesNextServer=3
zuul.host.socket-timeout-millis= 5000000
zuul.host.connect-timeout-millis= 5000000
Here i have set readtimeout and connecttimeout property around 8 to 10 min, hence i am not getting error.
Application.properties file of UI service
spring.application.name=external-ui-service
server.port=8080
Here in UI service i dont have timeout property. I tried above properties here but not working.
Obviously this UI service is not using ribbon,zuul etc. This is just an making Feign call to gateway.
So what should i do to increase timeout in UI service?
Added below properties in UI Service's application.propeties file.
feign.client.config.default.connectTimeout: 160000000
feign.client.config.default.readTimeout: 160000000
This issue might also be caused by default laodbalancer implementation of Spring Cloud Gateway in case you make use of Eureka Server and run your microservices undockerized on windows. Services are running on localhost, but Eureka says to the loadbalancer of the gateway to route the request to host.docker.internal.
The links down below give a couple of solutions:
https://localcoder.org/spring-boot-cloud-eurka-windows-10-eurkea-returns-host-docker-internal-for-clien
https://dimitr.im/fix-eureka-localhost

Enabling zuul retry breaks Eureka routing on PCF

I'm trying to enable retry capability within a Zuul gateway, and am able to get things working locally, but when I deploy the gateway to PCF, I get the following error when zuul.retryable=true:
{
"timestamp": 1524669167094,
"status": 500,
"error": "Internal Server Error",
"exception": "com.netflix.zuul.exception.ZuulException",
"message": "COMMAND_EXCEPTION"
}
The related logs give me the following exception details:
com.netflix.zuul.exception.ZuulException: Forwarding error
Caused by: com.netflix.hystrix.exception.HystrixRuntimeException: spring-demo failed and no fallback available.
Caused by: org.apache.http.NoHttpResponseException: spring-demo.example.com:443 failed to respond
I've tested spring-demo.example.com and it responds correctly (200) within 200 ms and Zuul is also able to get a valid response when I remove zuul.retryable property (although then it doesn't retry any error status codes or timeouts).
When I run locally, I can see the RibbonLoadBalancedRetryPolicy try the different instances on timeout or when getting a 500 so it's only in PCF that I'm getting the error. I've verified that the instances show up in the PCF Eureka and also tried increasing the connect/read/hystrix timeouts.
Here's the service layout:
2 instances of "working" app connected to Eureka as "spring-demo"
2 instances of "broken" app connected to Eureka as "spring-demo" (times out or returns 500)
Zuul connected to Eureka
Zuul application.yml:
zuul:
ignoredServices: '*'
ignoredPatterns: '/**/actuator/**'
retryable: true
routes:
spring-demo: '/spring-demo/**'
ribbon:
retryableStatusCodes: 404, 500
MaxAutoRetries: 1
MaxAutoRetriesNextServer: 5
OkRetryOnConnectionErrors: true
Gradle dependency versions:
Spring Boot 1.5.12.RELEASE
Spring Cloud Edgware.SR3
Pivotal Services 1.6.3.RELEASE
spring-boot-starter-web
spring-boot-starter-actuator
spring-cloud-starter-netflix-zuul
spring-retry
spring-cloud-services-starter-service-registry
spring-cloud-services-starter-circuit-breaker

Spring Boot actuator health issue with Consul

We are running consul in OpenShift cluster. All services have been developed by Spring Boot/Cloud APIs and they have been registered successfully in consul. There is a health point exposed using SpringBoot actuator. The health point itself works just fine when try to hit using curl.. sometimes we are just getting HTTP 200 status code and do not see any response. So which is causing Consul to throw below errors frequently which causes issues in discovering the service.
Any suggestions would be great help..
2016/08/05 05:57:15 [WARN] agent: http request failed 'http://10.1.0.18:9080/health': Get http://10.1.0.18:9080/health: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Discovered this after a long time, my solution was increasing the timeouts for the probes, not sure if this helps after 2 years but worth a shot

spring cloud FeignRibbonClient retryhandler retry configuration

I use spring cloud and the FeignRibbonClient to access remote services. The problem is, that this client ignores the retry configuration given by the properties:
example-client.ribbon.MaxAutoRetries=5
example-client.ribbon.MaxAutoRetriesNextServer=5
example-client.ribbon.OkToRetryOnAllOperations=true.
The retryHandlers are created without any configuration. What I want to get is to retry the next server after ConnectException. What I get is a RetryableException caused by a ConnectException.
Does anybody knows how to get the client call to the next server in case of a ConnectException?
Thanx
Lutz

Resources