Enabling zuul retry breaks Eureka routing on PCF - spring-boot

I'm trying to enable retry capability within a Zuul gateway, and am able to get things working locally, but when I deploy the gateway to PCF, I get the following error when zuul.retryable=true:
{
"timestamp": 1524669167094,
"status": 500,
"error": "Internal Server Error",
"exception": "com.netflix.zuul.exception.ZuulException",
"message": "COMMAND_EXCEPTION"
}
The related logs give me the following exception details:
com.netflix.zuul.exception.ZuulException: Forwarding error
Caused by: com.netflix.hystrix.exception.HystrixRuntimeException: spring-demo failed and no fallback available.
Caused by: org.apache.http.NoHttpResponseException: spring-demo.example.com:443 failed to respond
I've tested spring-demo.example.com and it responds correctly (200) within 200 ms and Zuul is also able to get a valid response when I remove zuul.retryable property (although then it doesn't retry any error status codes or timeouts).
When I run locally, I can see the RibbonLoadBalancedRetryPolicy try the different instances on timeout or when getting a 500 so it's only in PCF that I'm getting the error. I've verified that the instances show up in the PCF Eureka and also tried increasing the connect/read/hystrix timeouts.
Here's the service layout:
2 instances of "working" app connected to Eureka as "spring-demo"
2 instances of "broken" app connected to Eureka as "spring-demo" (times out or returns 500)
Zuul connected to Eureka
Zuul application.yml:
zuul:
ignoredServices: '*'
ignoredPatterns: '/**/actuator/**'
retryable: true
routes:
spring-demo: '/spring-demo/**'
ribbon:
retryableStatusCodes: 404, 500
MaxAutoRetries: 1
MaxAutoRetriesNextServer: 5
OkRetryOnConnectionErrors: true
Gradle dependency versions:
Spring Boot 1.5.12.RELEASE
Spring Cloud Edgware.SR3
Pivotal Services 1.6.3.RELEASE
spring-boot-starter-web
spring-boot-starter-actuator
spring-cloud-starter-netflix-zuul
spring-retry
spring-cloud-services-starter-service-registry
spring-cloud-services-starter-circuit-breaker

Related

Getting Spring Boot Internal Server Error Status 500 for around one minute

I have following spring-boot setup.
Client/Postman is calling API gateway (which is also acting as load balancer).
The API gateway and Albums are spring boot application with are registered with Eureka Discovery Service (Also Spring Boot Application).
I run the applications in following order: Eureka discovery service, API Gateway, Albums
When I try to access the Albums resource, which is behind the API Gateway, I get the following error (shown below). This happens for around first < 1 Minutes and then I am able to access the applications successfully.
I have tried number of links but could not solve this issue.
Why am I getting an Apache Proxy 503 error?
Tomcat application not responding with no logs
What java.security.egd option is for?
Spring Cloud Gateway not able to load balance and gives error 500
Any help / pointer will be appreciated. Thanks in advance.
Error In Postman
{
"timestamp": "2021-07-21T07:06:15.840+00:00",
"path": "/products/status",
"status": 500,
"error": "Internal Server Error",
"message": "Connection refused: no further information: centos/192.168.0.104:60788",
"requestId": "ffb03cbf-22",
"trace": "io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: no further information: centos/192.168.0.104:60788\r\n\tSuppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException: \nError has been observed at the following site(s):\n\t|_ checkpoint ⇢ org.springframework.cloud.gateway.filter.WeightCalculatorWebFilter [DefaultWebFilterChain]\n\t|_ checkpoint ⇢ org.springframework.boot.actuate.metrics.web.reactive.server.MetricsWebFilter [DefaultWebFilterChain]\n\t|_ checkpoint ⇢ HTTP GET "/products/status" [ExceptionHandlingWebHandler]\nStack trace:\r\nCaused by: java.net.ConnectException: Connection refused: no further information\r\n\tat java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)\r\n\tat java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:779)\r\n\tat io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330)\r\n\tat io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)\r\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:702)\r\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)\r\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)\r\n\tat io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)\r\n\tat io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)\r\n\tat io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\r\n\tat io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\r\n\tat java.base/java.lang.Thread.run(Thread.java:834)\r\n"
}
I mentioned in the original post as follows:
I run the applications in following order: Eureka discovery service,
API Gateway, Albums
After some thought and R & D, I realized that I need to start Albums before API Gateway. So the final order is Eureka discovery service, Albums, API Gateway.
The reason being as follows:
If I start API Gateway first, then it will not fetch the address of Albums Microservice from Eureka because the Albums Microservice has not started yet.
Just by changing the order solved my problem. As soon as the API gateway start I am able to access the Albums (without having to wait).

zuul (without eureka) - always ends up in "Forwarding error"

I have configured zuul with 2 instances using ribbon (without eureka) as below:
zuul.retryable=true
zuul.routes.simple-ms-app.serviceId: client
client.ribbon.listOfServers=http://localhost:7788,http://localhost:8877
When both the instances 7788 & 8877 are up and running, everything goes fine.
When the first instance in the listOfServers is down, then the request ends up in the below error:
com.netflix.zuul.exception.ZuulException: Forwarding error
I am using the below version configuration:
spring-boot : 2.0.7.RELEASE
spring-cloud: Finchley.SR2
If anyone had faced similar issue and managed to figure out a solution, please share it here.
Thank you.
By default, Zuul throws exception (instead of throwing 503/404) when upstream service is not available. This behavior has been discussed in detail in Zuul swallows 503 exceptions from upstream microservices GitHub thread.
To handle this case and configure Zuul to retry on (current and next ) available instances, you need to do two things:
Extend ErrorFilter and handle the exception with custom behavior
Configure retry for Zuul
Extend ErrorFilter and provide custom logic to return 404 or 503 status code. Some of the approaches to deal with this exception is explain in this SO thread: Customizing Zuul Exception.
Retry in Zuul can be configured using following application properties:
zuul:
retryable: true
ribbon:
MaxAutoRetries: 1
MaxAutoRetriesNextServer: 3
OkToRetryOnAllOperations: true
yourApplication:
ribbon:
listOfServers: instance-1-url, instance-2-url
Please note that Spring retry is a dependency for retry in Zuul.

feign.RetryableException: Read timed out executing GET

I have below architecture in my project
My UI Service(Port 8080) making Feign call to Gateway Service(Port 8085).
My Get call from UI service is " http://localhost:8080/invoice-list?startDate=2018-08-05&endDate=2018-10-05 "
Similar call from Gateway Service "http://localhost:8085/invoice-download-service/invoice-list?startDate=2018-08-05&endDate=2018-10-05"
When i make this GET call from UI service i get below error within minute
is feign.RetryableException: Read timed out executing GET http://localhost:8085/invoice-download-service/invoice-list?startDate=2018-08-05&endDate=2018-10-05] with root cause
java.net.SocketTimeoutException: Read timed out
But when i make direct call from Gateway Server to microservice, i dont get error.
Application.properties file of Gateway service
hystrix.command.default.execution.isolation.thread.timeoutInMilliseconds=160000000
ribbon.OkToRetryOnAllOperations=true
ribbon.ReadTimeout=5000000
ribbon.ConnectTimeout=5000000
ribbon.MaxAutoRetries=3
ribbon.MaxAutoRetriesNextServer=3
zuul.host.socket-timeout-millis= 5000000
zuul.host.connect-timeout-millis= 5000000
Here i have set readtimeout and connecttimeout property around 8 to 10 min, hence i am not getting error.
Application.properties file of UI service
spring.application.name=external-ui-service
server.port=8080
Here in UI service i dont have timeout property. I tried above properties here but not working.
Obviously this UI service is not using ribbon,zuul etc. This is just an making Feign call to gateway.
So what should i do to increase timeout in UI service?
Added below properties in UI Service's application.propeties file.
feign.client.config.default.connectTimeout: 160000000
feign.client.config.default.readTimeout: 160000000
This issue might also be caused by default laodbalancer implementation of Spring Cloud Gateway in case you make use of Eureka Server and run your microservices undockerized on windows. Services are running on localhost, but Eureka says to the loadbalancer of the gateway to route the request to host.docker.internal.
The links down below give a couple of solutions:
https://localcoder.org/spring-boot-cloud-eurka-windows-10-eurkea-returns-host-docker-internal-for-clien
https://dimitr.im/fix-eureka-localhost

Spring Boot actuator health issue with Consul

We are running consul in OpenShift cluster. All services have been developed by Spring Boot/Cloud APIs and they have been registered successfully in consul. There is a health point exposed using SpringBoot actuator. The health point itself works just fine when try to hit using curl.. sometimes we are just getting HTTP 200 status code and do not see any response. So which is causing Consul to throw below errors frequently which causes issues in discovering the service.
Any suggestions would be great help..
2016/08/05 05:57:15 [WARN] agent: http request failed 'http://10.1.0.18:9080/health': Get http://10.1.0.18:9080/health: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Discovered this after a long time, my solution was increasing the timeouts for the probes, not sure if this helps after 2 years but worth a shot

ZUUL Forwarding Error when invoking the service

I have a EUREKA , ZUUL and a sample ATOMIC Service deployed to WAS Liberty Profile.
When I try to hit the proxy url for ATOMIC Service I get Forwarding error.
com.netflix.zuul.exception.ZuulException: Forwarding error
at org.springframework.cloud.netflix.zuul.filters.route.RibbonRoutingFilter.forward (RibbonRoutingFilter.java:145)
Caused by: java.lang.RuntimeException: SRV.8.2: RequestWrapper objects must extend ServletRequestWrapper or HttpServletRequestWrapper
at com.ibm.wsspi.webcontainer.util.ServletUtil.unwrapRequest(ServletUtil.java:91)
at com.ibm.wsspi.webcontainer.util.ServletUtil.unwrapRequest(ServletUtil.java:63)
The complete stack trace is at,
https://github.com/bsridhar123/ZUUL/blob/master/logs/PROXY_ATOMIC.txt
I have shared the code for these services in the github repos.
--For ZUUL Reverse Proxy Service
https://github.com/bsridhar123/ZUUL
Can someone please help me on this.

Resources