Eureka client sometimes registrates with wrong host name - cluster-computing

I have a question about eureka like this question but the solution of this issue were of no help at all. See the similar issue here:
Another similar issue
Well, in my issue, I'm trying to construct a graceful release module based on eureka. By pull down any service in eureka before actually shut them down to ensure there is no loadbalance exception when the specified application is closed.
I have tested the situations to set eureka.instance.preferIpAddressto false and true.
while eureka.instance.preferIpAddress=false,ribbon will not recognize those applications registered with machine name and will throw a no loadbalancer exception.
while eureka.instance.preferIpAddress=false,ribbon will recognize those applications registered with machine name and everything is going right. That means, ribbon can get the real ip address of those applications.
Here is my case, I need to not only figure out why in both situations, the instanceId of applications in eureka will still showing with machine name, but also the same application will
get chance to have different instanceId even after simple restart!
Here is what I observed:
Server IP is 192.168.24.201 with hosts setting it's name to localhost
restart the same application in several times It can be seen that sometimes the instanceId of this application will change between localhost:applicationName:8005 and 192.168.24.201:applicationName:8005.
But both instanceId have the same IP address. And that means both of them won't lead to a loadbalance exception. It only makes my manually controlling of eureka server more difficult. And that is also acceptable.
The biggest problem is, sometimes the instanceId of different server will be localhost:applicationName:8005 and that leads to conflicts! By restart the application, the situation will solve in chance but not all the times! So if I'm using eureka as a cluster of several server, I cannot ensure my application is correctly registrate into eureka!
Here is the eureka client setting of application8005:
eureka:
instance:
lease-renewal-interval-in-seconds: ${my-config.eureka.instance.heartbeatInterval:5}
lease-expiration-duration-in-seconds: ${my-config.eureka.instance.deadInterval:15}
preferIpAddress: true
client:
service-url:
defaultZone: http://192.168.24.201:8008/eureka/
registry-fetch-interval-seconds: ${my-config.eureka.client.fetchRegistryInterval:20}
Here is the eureka server setting of EurekaServer:
eureka:
server:
eviction-interval-timer-in-ms: ${my-config.eureka.server.refreshInterval:5000}
enable-self-preservation: false
responseCacheUpdateIntervalMs: 5000
I don't know why applications' instanceId will sometimes not using IP as beginning string but using localhost.

The problem was solved by using prefer-ip-address: true and instance-id: ${spring.cloud.client.ip-address}:${spring.application.name}:${server.port}:${spring.cloud.nacos.config.group}
I have ruled that each server can run only one same app.
In this case each instance will have it's own unique id in this way.

Related

Spring Data Couchbase - Connection problem with single server

I am getting started with Spring Boot and Spring Data Couchbase and I am having problems to connect to my couchbase server.
I using IntelliJ and I have used the Spring Initialzr to create my project.
Here's my configuration (I am using Kotlin):
#Configuration
class Config : AbstractCouchbaseConfiguration() {
override fun getBootstrapHosts(): List<String> = Collections.singletonList("10.0.0.10")
override fun getBucketName(): String = "cwp"
override fun getBucketPassword(): String = "password"
}
But instead of "just connecting" to the given ip there seems to be some reverse dns and so on in place which resolves wrong ips (due to routers and vpn) and so I am getting the following errors:
[CWSRV.fritz.box:8091][ConfigEndpoint]: Socket connect took longer than specified timeout: connection timed out: CWSRV.fritz.box/10.0.0.112:8091
The name of my server is CWSRV and I am using a vpn between my routers (Fritzboxes)
To omit such problems I want to use just the ip without any mishmash.
Any help would be appreciated!
I figured it out myself:
It seems that the Java SDK does a reverse DNS lookup if it gets an IP address. Since I had not reverse zone created in my DNS server it resolved to the router on the server side which return cwsrv.fritz.box. That resolved to 10.0.0.112 (instead of 10.0.0.10 - my server could have had this ip address assigned from the router any time in the past) and there no Couchbase server responded).
I created an entry of the server in my DNS and it works.
Resolution: Since the Couchbase (Java) SDK seems to rely on properly configured DNS make sure that Forward and Reverse lookups work as expected! :)

Syncing Spring Boot Contexts Startup

We have two ports are exposed 8081 and 8080 in our application. This creates two contexts in the same applicaiton which causes problems such as when the 8080 port is not ready, 8081 port can respond to requests. I want to know whether is there some smart ways to sync those ports so I can rely that application has successfully started whether 8080 or 8081 port responds? In my some situation I want to respond to ping request OK if my cache is loaded correctly.
So we solved this problem by listening to ApplicationReadyEvent at each of the endpoints we wanted it to not respond with 200, instead we respond with 4xx status code until the event is received.

Eureka slow to remove instances

I am doing a POC with Eureka. When I shut down the service instance, it is currently taking about 3 minutes to no longer show in the Eureka console. I am presuming (but not confident) this also means this downed instance can still be discovered by clients?
With debugging on, I can see server running the evict task several times before it determines the lease is expired on the instance I shut down.
My settings are client:
eureka.client.serviceUrl.defaultZone=http://localhost:8761/eureka/
eureka.instance.statusPageUrlPath=${management.context-path}/info
eureka.instance.healthCheckUrlPath=${management.context-path}/health
eureka.instance.leaseRenewalIntervalInSeconds=5
eureka.client.healthcheck.enabled=true
eureka.client.lease.duration=2
eureka.client.leaseExpirationDurationInSeconds=5
logging.level.com.netflix.eureka=DEBUG
logging.level.com.netflix.discovery=DEBUG
Server:
server.port=8761
eureka.client.register-with-eureka=false
eureka.client.fetch-registry=false
logging.level.com.netflix.eureka=DEBUG
logging.level.com.netflix.discovery=DEBUG
eureka.server.enableSelfPreservation=false
I have also tried these settings on server:
eureka.server.response-cache-update-interval-ms: 500
eureka.server.eviction-interval-timer-in-ms: 500
These seem to increase the checking but do not decrease the time for server to recognize instance is down.
Am I missing a setting? Is there a best practice to shutting down instances in production to get this instantaneous?
Thanks!

Understanding Spring Cloud Eureka Server self preservation and renew threshold

I am new to developing microservices, although I have been researching about it for a while, reading both Spring's docs and Netflix's.
I have started a simple project available on Github. It is basically a Eureka server (Archimedes) and three Eureka client microservices (one public API and two private). Check github's readme for a detailed description.
The point is that when everything is running I would like that if one of the private microservices is killed, the Eureka server realizes and removes it from the registry.
I found this question on Stackoverflow, and the solution passes by using enableSelfPreservation:false in the Eureka Server config. Doing this after a while the killed service disappears as expected.
However I can see the following message:
THE SELF PRESERVATION MODE IS TURNED OFF.THIS MAY NOT PROTECT INSTANCE
EXPIRY IN CASE OF NETWORK/OTHER PROBLEMS.
1. What is the purpose of the self preservation? The doc states that with self preservation on "clients can get the instances that do not exist anymore". So when is it advisable to have it on/off?
Furthermore, when self preservation is on, you may get an outstanding message in the Eureka Server console warning:
EMERGENCY! EUREKA MAY BE INCORRECTLY CLAIMING INSTANCES ARE UP WHEN
THEY'RE NOT. RENEWALS ARE LESSER THAN THRESHOLD AND HENCE THE
INSTANCES ARE NOT BEING EXPIRED JUST TO BE SAFE.
Now, going on with the Spring Eureka Console.
Lease expiration enabled true/false
Renews threshold 5
Renews (last min) 4
I have come across a weird behaviour of the threshold count: when I start the Eureka Server alone, the threshold is 1.
2. I have a single Eureka server and is configured with registerWithEureka: false to prevent it from registering on another server. Then, why does it show up in the threshold count?
3. For every client I start the threshold count increases by +2. I guess it is because they send 2 renew messages per min, am I right?
4. The Eureka server never sends a renew so the last min renews is always below the threshold. Is this normal?
renew threshold 5
rewnews last min: (client1) +2 + (client2) +2 -> 4
Server cfg:
server:
port: ${PORT:8761}
eureka:
instance:
hostname: localhost
client:
registerWithEureka: false
fetchRegistry: false
serviceUrl:
defaultZone: http://${eureka.instance.hostname}:${server.port}/eureka/
server:
enableSelfPreservation: false
# waitTimeInMsWhenSyncEmpty: 0
Client 1 cfg:
spring:
application:
name: random-image-microservice
server:
port: 9999
eureka:
client:
serviceUrl:
defaultZone: http://localhost:8761/eureka/
healthcheck:
enabled: true
I got the same question as #codependent met, I googled a lot and did some experiment, here I come to contribute some knowledge about how Eureka server and instance work.
Every instance needs to renew its lease to Eureka Server with frequency of one time per 30 seconds, which can be define in eureka.instance.leaseRenewalIntervalInSeconds.
Renews (last min): represents how many renews received from Eureka instance in last minute
Renews threshold: the renews that Eureka server expects received from Eureka instance per minute.
For example, if registerWithEureka is set to false, eureka.instance.leaseRenewalIntervalInSeconds is set to 30 and run 2 Eureka instance. Two Eureka instance will send 4 renews to Eureka server per minutes, Eureka server minimal threshold is 1 (written in code), so the threshold is 5 (this number will be multiply a factor eureka.server.renewalPercentThreshold which will be discussed later).
SELF PRESERVATION MODE: if Renews (last min) is less than Renews threshold, self preservation mode will be activated.
So in upper example, the SELF PRESERVATION MODE is activated, because threshold is 5, but Eureka server can only receive 4 renews/min.
Question 1:
The SELF PRESERVATION MODE is design to avoid poor network connectivity failure. Connectivity between Eureka instance A and B is good, but B is failed to renew its lease to Eureka server in a short period due to connectivity hiccups, at this time Eureka server can't simply just kick out instance B. If it does, instance A will not get available registered service from Eureka server despite B is available. So this is the purpose of SELF PRESERVATION MODE, and it's better to turn it on.
Question 2:
The minimal threshold 1 is written in the code. registerWithEureka is set to false so there will be no Eureka instance registers, the threshold will be 1.
In production environment, generally we deploy two Eureka server and registerWithEureka will be set to true. So the threshold will be 2, and Eureka server will renew lease to itself twice/minute, so RENEWALS ARE LESSER THAN THRESHOLD won't be a problem.
Question 3:
Yes, you are right. eureka.instance.leaseRenewalIntervalInSeconds defines how many renews sent to server per minute, but it will multiply a factor eureka.server.renewalPercentThreshold mentioned above, the default value is 0.85.
Question 4:
Yes, it's normal, because the threshold initial value is set to 1. So if registerWithEureka is set to false, renews is always below threshold.
I have two suggestions for this:
Deploy two Eureka server and enable registerWithEureka.
If you just want to deploy in demo/dev environment, you can set eureka.server.renewalPercentThreshold to 0.49, so when you start up a Eureka server alone, threshold will be 0.
I've created a blog post with the details of Eureka here, that fills in some missing detail from Spring doc or Netflix blog. It is the result of several days of debugging and digging through source code. I understand it's preferable to copy-paste rather than linking to an external URL, but the content is too big for an SO answer.
You can try to set renewal threshold limit in your eureka server properties. If you have around 3 to 4 Microservices to register on eureka, then you can set it to this:
eureka.server.renewalPercentThreshold=0.33
server:
enableSelfPreservation: false
if set to true, Eureka expects service instances to register themselves and to continue to send registration renewal requests every 30 s. Normally, if Eureka doesn’t receive a renewal from a service for three renewal periods (or 90 s), it deregisters that instance.
if set to false, in this case, Eureka assumes there’s a network problem, enters self-preservation mode, and won’t deregister service instances.
Even if you decide to disable self-preservation mode for development, you should leave it enabled when you go into production.

ELB Health Check not checking web instance after booting up

We have a web instance (nginx) behind a ELB which we manually power on when required.
The web app starts up quickly and returns a successful 200 response when we run wget locally.
However the website will not load as the ELB isn't sending healthcheck requests to the instance. I can confirm this by viewing the nginx access logs.
The workaround I've been using is to remove the web instance from the ELB and add it back in.
This seem to activate the healthchecks again and they are visible from our access logs.
I've edited our Healthcheck settings to allow a longer timeout and raise the Unhealthy Threshold to 3 but this has made no difference.
Currently our Health Check Config is:
Ping Target: HTTPS:443/login
Timeout: 10 sec
Interval: 12 sec
Unhealthy: 2
Healthy: 2
Listener:
HTTPS 443 to HTTPS 443 SSL Cert
The ELB and web instance are both on the same public VPC Security Group which has http/https opened to 0.0.0.0/0
Can anyone help me figure out why the ELB Health checks aren't kicking in as soon as the web instance has started? Is this by design or is there a way of automatically initiating the checks? Thank you.
Niall
Does your instance come up with a different IP address each time you start it?
Elastic Load Balancing registers your load balancer with your EC2 instances using the IP addresses that are associated with your instances. When an instance is stopped and then restarted, the IP address associated with your instance changes. Your load balancer cannot recognize the new IP address, which prevents it from routing traffic to your instances.
— http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/TerminologyandKeyConcepts.html#registerinstance
It would seem like the appropriate approach to getting the instance re-associated would be for code running on the web server instance to programmatically register itself with the load balancer via the API when the startup process determines that the instance is ready for web traffic.
Update:
Luke#AWS: "You should be de-registering from your ELB during a stop/start."
— https://forums.aws.amazon.com/thread.jspa?messageID=463835
I'm curious what the console shows as the reason why the instance isn't active in the ELB. There does appear to be some kind of interaction between ELB and EC2 where ELB has some kind of awareness of an instance's EC2 state (e.g. "stopped") that goes beyond just the health checks. This isn't well-documented, but I would speculate that ELB, based on that awareness, decides that it isn't worth bothering with the health checks, and the console may provide something useful to at least confirm this.
It's possible that, given sufficient time, ELB might become aware that the instance is running again and start sending health checks, but it's also possible that instances have a hidden global meta-identifier separate from i-xxxxxx and that a stopped and restarted instance is, from the perspective of this identifier, a different instance.
...but the answer seems to be that stopping an instance and restarting it requires re-registration with the ELB.

Resources