Spring Cloud Eureka Server self preservation and renew threshold

Spring Cloud Eureka Server self preservation and renew threshold - spring

(It is basically two Eureka server and three Eureka client microservices)
I want to remove the following message:
EMERGENCY! EUREKA MAY BE INCORRECTLY CLAIMING INSTANCES ARE UP WHEN THEY'RE NOT. RENEWALS ARE LESSER THAN THRESHOLD AND HENCE THE INSTANCES ARE NOT BEING EXPIRED JUST TO BE SAFE.
error image
Eureka Server1:
spring.application.name=ms-service-discovery-1
server.port=8761
eureka.client.register-with-eureka=false
eureka.client.fetch-registry=false
eureka.server.enableSelfPreservation= true
eureka.instance.leaseRenewalIntervalInSeconds=1
eviction-interval-timer-in-ms: 1000
eureka.server.wait-time-in-ms-when-sync-empty: 1000
eureka.server.responseCacheUpdateIntervalMs: 1000
Eureka Server2:
spring.application.name=ms-service-discovery-2
server.port=8761
eureka.client.register-with-eureka=false
eureka.client.fetch-registry=false
eureka.server.enableSelfPreservation= true
eureka.instance.leaseRenewalIntervalInSeconds=1
eviction-interval-timer-in-ms: 1000
eureka.server.wait-time-in-ms-when-sync-empty: 1000

It's due to Eureka's self-preservation mode. Eureka servers will stop the eviction of all instances if the number of heartbeats renewals is below the expected threshold. The warning in your Eureka servers shows this situation happened.
Please try to adjust the below property. 85% is the default value. First, just try to lower the below property like 0.50
eureka.renewalPercentThreshold=0.85
Alternatively, you can disable self-preservation mode with the below property.
eureka.enableSelfPreservation=false
Note
Basically, all Eureka client unregister itself when it shuts down. If it unregisters itself successfully, the above problem will not happen.
Unfortunately, sometimes the Spring Cloud based Eureka client doesn't unregister itself when it shuts down and the symptom is different based on the Spring Cloud version. Dalston and Edgware release unregister itself well in most cases. But Finchley releases seem not to unregister itself now.
Also, you start to run MANY instances in your Eureka environment, the above message will disappear because just one or two instances shutdown will not reach the threshold.
You can find more information about self-preservation mode here

Related

How to restart kubernetes pod when issue because of Rabbit MQ connectivity in logs

I have a Spring Boot 2 standalone application( not REST service) which connect to rabbit MQ and process message. The application is deployed in kubernetes. While it work great, but when Rabbit MQ remain down for little longer and in logs I see hearbeat exception 60sec and eventually connection get drop even if the rabbit mq comes up after certain time:
Automatic retry connection to broker by spring-rabbitmq
https://www.rabbitmq.com/heartbeats.html
While I try to manage above issue by increasing number of retry :https://stackoverflow.com/questions/45385119/how-configure-timeouts-retries-or-max-attempts-in-differents-queues-with-spring
but after expiry of retry still above issue comes.
How can I reboot/delete-recreate pod if I see above issue in logs from kubernetes.

The easiest way is to use actuator, which has a /actuator/health endpoint. (Note that the recent version also add /actuator/health/liveness and /actuator/health/readiness).
You can assign the endpoint to livenessProbe property of k8s. Then it will automatically restart when it is necessary. You can parameterize, when your app is down if necessary.
See the docs:
Kubernetes liveness probe
Spring actuator health

Spring Boot Eureka - Faster offline detection

I am using Spring Boot with Eureka and it works really good. But since a few hours, I wanted to detect offline Eureka instances/clients more quickly and I found no good documentation about Eureka's configuration properties. And I'm not even sure if it's possible because Eureka seems to presume that clients send their updates every 30 seconds.
I started to deactivate self-preservation mode and to increase the speed of renewal and interval updates and lower expiration durations but my Eureka server still needs two minutes to discover its offline clients.
After changing the renewal percent threshold the Eureka server didn't remove offline clients ever.
Is there any way to detect offline Eureka clients more quickly?
Server configuration:
eureka:
client:
registerWithEureka: false
fetchRegistry: false
server:
enableSelfPreservation: false
eviction-interval-timer-in-ms: 10000
response-cache-update-interval-ms: 5000
Client configuration:
eureka:
client:
serviceUrl:
defaultZone: ${EUREKA_URI:http://localhost:8761/eureka}
healthcheck:
enabled: true
instance:
lease-renewal-interval-in-seconds: 5
lease-expiration-duration-in-seconds: 15
edit:
Even health check url is not called more often. It is still called every 30 seconds.

Did you check Ribbon configuration. Ribbon can cache servers configuration upfront during startup and make use of it when Eureka-sever is down. Please check if ribbon is enabled on eureka-client app

The correct option to change the schedule was:
eureka.client.instance-info-replication-interval-seconds

How to delay Eureka client registration with Eureka Server?

I have a Spring Boot application which is also a Eureka client. The normal behavior of the application is to register with Eureka server on start up as UP. I have a requirement that the application shouldn't register with the Eureka server until smoke testing is completed during deployment.
Is there a way to delay the registration with Eureka Server or register as OUT_OF_SERVICE with some type of configuration changes? I am aware of the Eureka REST endpoints to register, unregister, and change status.

Setting eureka.instance.initial-status=OUT_OF_SERVICE will register the service with that status.

I had a similar case where I needed my service to perform some preprocessing before indicating that it was available. I did this by implementing a custom HealthIndicator that starts with a state of OUT_OF_SERVICE and transitions to UP once all of the preprocessing completed. This always seemed a hack to me but it works. Hopefully, Spencer can provide better guidance since he is an author in the Spring ecosystem.

Are you looking for;
eureka.client.initialInstanceInfoReplicationIntervalSeconds=<some N seconds>
This property specifies the initial delay before the clients send health status i.e. UP etc.

Eureka First Discovery & Config Client Retry with Docker Compose

We've three Spring Boot applications:
Eureka Service
Config Server
Simple Web Service making use of Eureka and Config Server
I've set up the services so that we use a Eureka First Discovery, i.e. the simple web application finds out about the config server from the eureka service.
When started separately (either locally or by starting them as individual docker images) everything is ok, i.e. start config server after discovery service is running, and the Simple web service is started once the config server is running.
When docker-compose is used to start the services, they obviously start at the same time and essentially race to get up and running. This isn't an issue as we've added failFast: true and retry values to the simple web service and also have the docker container restarting so that the simple web service will eventually restart at a time when the discovery service and config server are both running but this doesn't feel optimal.
The unexpected behaviour we noticed was the following:
The simple web service reattempts a number of times to connect to the discovery service. This is sensible and expected
At the same time the simple web service attempts to contact the config server. Because it cannot contact the discovery service, it retries to connect to a config server on localhost, e.g. logs show retries going to http://localhost:8888. This wasn't expected.
The simple web service will eventually successfully connect to the discovery service but the logs show it stills tries to establish communication to the config server by going to http://localhost:8888. Again, this wasn't ideal.
Three questions/observations:
Is it a sensible strategy for the config client to fall back to trying localhost:8888 when it has been configured to use discovery to find the config server?
When the eureka connections is established, should the retry mechanism not now switch to trying the config server endpoint as indicated by Eureka? Essentially putting in higher/longer retry intervals and periods for the config server connection is pointless in this case as it's never going to connect to it if it's looking at localhost so we're better just failing fast.
Are there any properties that can override this behaviour?
I've created a sample github repo that demonstrates this behaviour:
https://github.com/KramKroc/eurekafirstdiscovery/tree/master

Spring Cloud Turbine - Unable to handle multiple clients?

I’m having a bit of trouble getting Turbine to work in Spring Cloud. In a nutshell, I can’t determine how to configure it to aggregate circuits from more than one application at a time.
I have 6 separate services, a eureka server, and a turbine server running in standalone mode. I can see from my Eureka server that all of the services are registered, including turbine. My turbine server is up and running, and I can see its /hystrix page without issue. But when I try to use it to examine turbine.stream, I only see the FIRST server that is listed in turbine.appConfig, the rest are ignored.
This is my Turbine server’s application.yml, or at least the relevant parts:
---
eureka:
client:
serviceUrl:
defaultZone: http://localhost:8010/eureka/
server:
port: 8030
info:
component: Turbine
turbine:
clusterNameExpression: new String(“default”)
appConfig: sentence,subject,verb,article,adjective,noun
management:
port: 8990
When I run this and access the hystrix dashboard on my turbine instance, asking for the turbine.stream, the ONLY circuit breakers listed in the output are for the first service listed in appConfig, the “sentence” service in this case. Curiously, if I re-arrange the order of these services and put another one first (like “noun”), I see only the circuits for THAT service. Only the first service in the list is displayed.
I’ll admit to being a little confused on some of the terminology, like streams, clusters, etc., so I could be missing some basic concept here, but my understanding is that Turbine could digest streams from more than one service and aggregate them in a single display. Suggestions would be appreciated.

I don't have enough reputation to comment, so I have to write this in an answer :)
I had the exactly same problem:
There are two services "test-service" and "other-service", each with it's own working hystrix-stream
and there is one Turbine-Application, which is configured like this:
turbine:
clusterNameExpression: new String("default")
appConfig: test-service,other-service
All of my services are running on my local machine.
Result is: My Hystrix-Dashboard just shows the metrics from "test-service".
Reason:
It seems to be, that a Turbine-Client which is configured the described way doesn't handle multiple services when they are running at the same host.
This is explained here:
https://github.com/Netflix/Hystrix/issues/117#issuecomment-14262713
Turbine maintains state of all these instances in order to maintain persistent connections to them and it does rely on the "hostname" and if the host name is the same then it won't instantiate a new connection to that same server (on a different port).
So the main point is, that all of your services must be registered with different hostnames. How you could do this on your local machine is described below.
UPDATE 2015-06-12/2016-01-23: Workaround for local testing
Change your hostfile:
# ...
127.0.0.1 localhost
127.0.0.1 localdomain1
127.0.0.1 localdomain2
# ...
127.0.0.1 localdomainx
And then set the hostname for your clients each to a different domain-entry like this:
application.yml:
eureka:
instance:
hostname: localdomainx

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio