Spring/Netflix Eureka resilience and failover - spring

I have two Eureka servers with peer awareness enabled. Multiple clients connect to these two and the config file looks like this:
eureka:
client:
service-url:
defaultZone: https://server1/eureka/,https://server2/eureka/
Everything works just fine until one of the Eureka servers go down. If it's server2 then the clients start to throw exceptions, but nothing breaks. I can even restart the clients and they will boot up just fine. However, if the server1 goes down I can no longer restart clients as they seem to prefer the first Eureka server in the list and if it's down the client just won't start at all throwing org.springframework.web.client.ResourceAccessException: I/O error on GET request for ... exception.
Is there any way to make the clients at least try to use the second server (or third, fourth, etc) upon startup if the first one is down?

Related

rsocket - how to balance the load

rsocket seems to be a cool idea. I have this question and I could not find a better answer.
Lets consider this initial set up. client sends multiple requests to the server-1 sequentially.
client --> server-1
server-1 is doing some compute intensive tasks. So auto-scaler created another instance of server which is server-2 after some time. Now the setup became like this.
client --> server-1
server-2
As per my understanding client-->server-1 connection is established and kept alive. We use this connection for all the client-server communication. How to make use of another server - server-2 to share some of the client requests like this.
client --> server-1
|-----> server-2
Otherwise it will be a sequential processing.
I use spring boot with rsocket.
There are two options for this:
use low-level load balancer on client side from rsocket-load-balancer
deploy RSocket broker between client and server, some options are Netifi Broker or Alibaba RSocket Broker

Will spring eureka client send heartbeat to all discovery servers?

Setup: 2 eureka servers replicate. and 1 eureka client set the default zone in application.yml as localhost:8761, localhost:8762.
Question:
1.Will eureka client send heartbeat to both 8761 and 8762?
Thanks,
Young
No, it will not send the heart beat to both 8761 and 8762. How it works is- We provide the list of Eureka servers. All the clients will pick first server from the list (in your case 8761) and start sending the heartbeat. All the clients will switch to other server only in case first Eureka server dies.
Second server will always get a copy of registry from first server.

TCP connection debug log for Tomcat

One of my services is calling another service. I can't figure out why but sometimes I get a SocketException: Connection reset. There are multiple layers between these 2 services. In order to know for sure what's happening, I need to make sure the destination service is not closing the connection. Both services are tomcat/spring boot apps.
How can I make Tomcat log when a TCP connection handshake took place?
I already tried setting org.apache.catalina on debug but doesn't seem to work.

Kafka server configuration - listeners vs. advertised.listeners

To get Kafka running, you need to set some properties in config/server.properties file. There are two settings I don't understand.
Can somebody explain the difference between listeners and advertised.listeners property?
The documentation says:
listeners: The address the socket server listens on.
and
advertised.listeners:
Hostname and port the broker will advertise to producers and consumers.
When do I have to use which setting?
listeners is what the broker will use to create server sockets.
advertised.listeners is what clients will use to connect to the brokers.
The two settings can be different if you have a "complex" network setup (with things like public and private subnets and routing in between).
Since I cannot comment yet I will post this as an "answer", adding on to M.Situations answer.
Within the same document he links there is this blurb about which listener is used by a KAFKA client (https://cwiki.apache.org/confluence/display/KAFKA/KIP-103%3A+Separation+of+Internal+and+External+traffic):
As stated previously, clients never see listener names and will make metadata requests exactly as before. The difference is that the list of endpoints they get back is restricted to the listener name of the endpoint where they made the request.
This is important as depending on what URL you use in your bootstrap.servers config that will be the URL* that the client will get back if it is mapped in advertised.listeners (do not know what the behavior is if the listener does not exist).
Also note this:
The exception is ZooKeeper-based consumers. These consumers retrieve the broker registration information directly from ZooKeeper and will choose the first listener with PLAINTEXT as the security protocol (the only security protocol they support).
As an example broker config (for all brokers in cluster):
advertised.listeners=EXTERNAL://XXXXX.compute-1.amazonaws.com:9990,INTERNAL://ip-XXXXX.ec2.internal:9993
inter.broker.listener.name=INTERNAL
listener.security.protocol.map=EXTERNAL:SSL,INTERNAL:PLAINTEXT
If the client uses XXXXX.compute-1.amazonaws.com:9990 to connect, the metadata fetch will go to that broker. However, the returning URL to use with the Group Coordinator or Leader could be 123.compute-1.amazonaws.com:9990* (a different machine!). This means that the match is done on the listener name as advertised by KIP-103 irrespective of the actual URL (node).
Since the protocol map for EXTERNAL is SSL this would force you to use an SSL keystore to connect.
If on the other hand you are within AWS lets say, you can then issue ip-XXXXX.ec2.internal:9993 and the corresponding connection would be plaintext as per the protocol map.
This is especially needed in IaaS where in my case brokers and consumers live on AWS, whereas my producer lives on a client site, thus needing different security protocols and listeners.
EDIT:
Also adding Inbound Rules is much easier now that you have different ports for different clients (brokers, producers, consumers).
EDIT2:
This article is a great in depth guide if the above is still not clear: https://rmoff.net/2018/08/02/kafka-listeners-explained/
There's so much confusion or little information in answers provided here for the question. So posting my elaborate answer for clarity.
listeners - Used by the embedded jetty web server in kafka to bind to. This jetty web server is used to provide REST API that provides the control plane for Kafka Connect workers. The hostname in this setting can be left empty if you want kafka to bind to localhost (it does by calling InetAddress.getLocalHost().getCanonicalHostName() java api)
advertised.listeners: This address is published to zookeeper by every kafka broker. If this setting is not set, then value of listeners will be used here and published to zookeeper. That's the only purpose of this setting for notifying others. Kafka Clients use the 'advertised.listeners' setting published to zookeeper (as /brokers/ids/<id>/ # endpoints) to talk to Kafka broker.
Now the question is why to have two setting? Why not a single setting? Let's say your kafka broker is sitting behind a proxy. And all the kafka clients have to talk to the proxy to reach the broker. In this case, we want kafka's embedded jetty server to bind to localhost and local port, but we can't publish this to zookeeper as clients can't use it. So kafka admin can set the setting advertised.listeners to the proxy host and port.
Also, in some of our production hosts, InetAddress.getLocalHost().getCanonicalHostName() returns empty and so listeners setting's hostname was empty which was fine for jetty to bind. But advertised.listeners was published to zookeeper as NULL:9092 since it took the same value as listeners by default. Now all the brokers tried to publish in this way to zookeeper and so brokers got the error java.lang.IllegalArgumentException: requirement failed: Configured end points null:14092 as advertised.listeners as NULL:9092 is already registered by broker 101. The fix was to change the advertised.listeners setting to have hostname in it.
Listeners are all the addresses the Kafka broker listens on (it can be more than 1 address) whereas advertised listeners are the addresses other agents (producers, consumers, or brokers) need to connect to if they want to talk to the current broker.
The 2 lists should be the same if all are running on the same machine (can connect using localhost:9092 or 127.0.0.1:9092) but if consumers, producers, or other brokers do not stay on the same machine or Docker instance, they must use different addresses (that's why we have advertised listeners). Two examples:
Saying we use Docker to run 2 Kafka instances named kafka and kafka2. kafka2 for sure cannot connect to kafka using localhost:29092. It must use kafka:9092 instead. So for kafka, listener = localhost:29092, advertised listener = kafka:9092
Producer from host machine cannot connect to kafka using kafka:9092. It must use localhost:29092 instead.
Let use the following docker-compose config to understand more about the startup process of a Kafka broker:
# config/docker-compose.yml
kafka:
image: docker.io/bitnami/kafka:3
ports:
- "29092:29092"
- "9092:9092"
environment:
- KAFKA_CFG_ZOOKEEPER_CONNECT=zookeeper:2181
- ALLOW_PLAINTEXT_LISTENER=yes
- KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=CLIENT:PLAINTEXT,EXTERNAL:PLAINTEXT
- KAFKA_CFG_LISTENERS=CLIENT://:9092,EXTERNAL://:29092
- KAFKA_CFG_ADVERTISED_LISTENERS=CLIENT://kafka:9092,EXTERNAL://localhost:29092
- KAFKA_CFG_INTER_BROKER_LISTENER_NAME=CLIENT
depends_on:
- zookeeper
With this config, Docker will start 1 Kafka broker instance which listens on 2 ports:
9092 with name CLIENT
29092 with name EXTERNAL
The broker then connects to Zookeeper at zookeeper:2181 and registers its 2 addresses: kafka:9092 and localhost:29092. Also, with KAFKA_CFG_INTER_BROKER_LISTENER_NAME=CLIENT, it wants Zookeeper to tell other brokers to connect to kafka:9092 if want to talk to it.
But why need 2 ports? Read more here
References:
My notes while learning Kafka
Kafka Listeners – Explained
From this link: https://cwiki.apache.org/confluence/display/KAFKA/KIP-103%3A+Separation+of+Internal+and+External+traffic
During the 0.9.0.0 release cycle, support for multiple listeners per
broker was introduced. Each listener is associated with a security
protocol, ip/host and port. When combined with the advertised
listeners mechanism, there is a fair amount of flexibility with one
limitation: at most one listener per security protocol in each of the
two configs (listeners and advertised.listeners).
In some environments, one may want to differentiate between external
clients, internal clients and replication traffic independently of the
security protocol for cost, performance and security reasons. A few
examples that illustrate this:
Replication traffic is assigned to a separate network interface so that it does not interfere with client traffic.
External traffic goes through a proxy/load-balancer (security, flexibility) while internal traffic hits the brokers directly
(performance, cost).
Different security settings for external versus internal traffic even though the security protocol is the same (e.g. different set of
enabled SASL mechanisms, authentication servers, different keystores,
etc.)
As such, we propose that Kafka brokers should be able to define
multiple listeners for the same security protocol for binding (i.e.
listeners) and sharing (i.e. advertised.listeners) so that internal,
external and replication traffic can be separated if required.
So,
listeners - Comma-separated list of URIs we will listen on and their protocols.
Specify hostname as 0.0.0.0 to bind to all interfaces.
Leave hostname empty to bind to default interface.
Examples of legal listener lists:
PLAINTEXT://myhost:9092,TRACE://:9091
PLAINTEXT://0.0.0.0:9092, TRACE://localhost:9093
advertised.listeners - Listeners to publish to ZooKeeper for clients to use, if different than the listeners above.
In IaaS environments, this may need to be different from the interface to which the broker binds. If this is not set, the value for listeners will be used.

WebSocket connection with OpenShift DIY cartridge is dropped every 2 min

My application consists of two pieces: WebSocket server - which is hosted on OpenShift DIY cartridge; WebSocket client - which connects to my server from home PC. WebSocket server is written using embedded Jetty and its library for WebSockets. Client side is written using JAVA and Tyrus library. It works pretty well except for one glitch that I cannot explain.
When running WebSocket server on OpenShift DIY cartridge, WebSocket connection gets dropped every 2 min. Connection drops happen quite precisely so obviously it is not related to potential network outages. Besides I have tested exactly the same application on Heroku and there were no connection drop. Moreover onClose(...) method receives NORMAL_CLOSURE close code.
I am almost sure that OpenShift Apache layer closes idle WebSocket connections every 2 min. even though WebSocket client sends Ping messages and receives Pong messages from the server. Has anyone experienced this type of WebSocket connection drops? Are there are parameters I can use to prevent connection drops?
Thank you in advance.
Update: I added a dedicated thread on the server side to send Pong messages to the client (Jetty does not support Pong handlers yet so I cannot use Ping messages) and drops disappeared. It seems like OpenShift Apache layer started treating connection as "alive" and does not close it. Then I noticed one more strange behavior: someone ping my server side application via HTTPS every hour. HTTP headers look like this:
HTTP/1.1 HEAD /
Accept: /
User-Agent: Ruby
X-Forwarded-Proto: https
X-Forwarded-Host: ....rhcloud.com
Connection: keep-alive
X-Request-Start: t=1409771442217677
X-Forwarded-For: 10.158.21.225
Host: wsproxy-gimes4dieni.rhcloud.com
X-Forwarded-Port: 443
X-Client-IP: 10.158.21.225
X-Forwarded-SSL-Client-Cert: (null)
X-Forwarded-Server: localhost
I do not use Ruby, I am using only HTTP and IP address is different from my regular requests. Does anybody has a clue whether this is some sort of OpenShift "service" of this is coming from the Internet?
SSH into your project, open ~/haproxy/conf/haproxy.cfg with a text editor such as vi and edit timeout queue,timeout client, and timeout server to whatever you want. I set mine to 5m, which is 5 minutes. After you have made the changes, exit and run
~/haproxy/bin/control restart
Now your websocket timeout should be set.

Resources