Kafka server configuration - listeners vs. advertised.listeners - amazon-ec2

To get Kafka running, you need to set some properties in config/server.properties file. There are two settings I don't understand.
Can somebody explain the difference between listeners and advertised.listeners property?
The documentation says:
listeners: The address the socket server listens on.
and
advertised.listeners:
Hostname and port the broker will advertise to producers and consumers.
When do I have to use which setting?

listeners is what the broker will use to create server sockets.
advertised.listeners is what clients will use to connect to the brokers.
The two settings can be different if you have a "complex" network setup (with things like public and private subnets and routing in between).

Since I cannot comment yet I will post this as an "answer", adding on to M.Situations answer.
Within the same document he links there is this blurb about which listener is used by a KAFKA client (https://cwiki.apache.org/confluence/display/KAFKA/KIP-103%3A+Separation+of+Internal+and+External+traffic):
As stated previously, clients never see listener names and will make metadata requests exactly as before. The difference is that the list of endpoints they get back is restricted to the listener name of the endpoint where they made the request.
This is important as depending on what URL you use in your bootstrap.servers config that will be the URL* that the client will get back if it is mapped in advertised.listeners (do not know what the behavior is if the listener does not exist).
Also note this:
The exception is ZooKeeper-based consumers. These consumers retrieve the broker registration information directly from ZooKeeper and will choose the first listener with PLAINTEXT as the security protocol (the only security protocol they support).
As an example broker config (for all brokers in cluster):
advertised.listeners=EXTERNAL://XXXXX.compute-1.amazonaws.com:9990,INTERNAL://ip-XXXXX.ec2.internal:9993
inter.broker.listener.name=INTERNAL
listener.security.protocol.map=EXTERNAL:SSL,INTERNAL:PLAINTEXT
If the client uses XXXXX.compute-1.amazonaws.com:9990 to connect, the metadata fetch will go to that broker. However, the returning URL to use with the Group Coordinator or Leader could be 123.compute-1.amazonaws.com:9990* (a different machine!). This means that the match is done on the listener name as advertised by KIP-103 irrespective of the actual URL (node).
Since the protocol map for EXTERNAL is SSL this would force you to use an SSL keystore to connect.
If on the other hand you are within AWS lets say, you can then issue ip-XXXXX.ec2.internal:9993 and the corresponding connection would be plaintext as per the protocol map.
This is especially needed in IaaS where in my case brokers and consumers live on AWS, whereas my producer lives on a client site, thus needing different security protocols and listeners.
EDIT:
Also adding Inbound Rules is much easier now that you have different ports for different clients (brokers, producers, consumers).
EDIT2:
This article is a great in depth guide if the above is still not clear: https://rmoff.net/2018/08/02/kafka-listeners-explained/

There's so much confusion or little information in answers provided here for the question. So posting my elaborate answer for clarity.
listeners - Used by the embedded jetty web server in kafka to bind to. This jetty web server is used to provide REST API that provides the control plane for Kafka Connect workers. The hostname in this setting can be left empty if you want kafka to bind to localhost (it does by calling InetAddress.getLocalHost().getCanonicalHostName() java api)
advertised.listeners: This address is published to zookeeper by every kafka broker. If this setting is not set, then value of listeners will be used here and published to zookeeper. That's the only purpose of this setting for notifying others. Kafka Clients use the 'advertised.listeners' setting published to zookeeper (as /brokers/ids/<id>/ # endpoints) to talk to Kafka broker.
Now the question is why to have two setting? Why not a single setting? Let's say your kafka broker is sitting behind a proxy. And all the kafka clients have to talk to the proxy to reach the broker. In this case, we want kafka's embedded jetty server to bind to localhost and local port, but we can't publish this to zookeeper as clients can't use it. So kafka admin can set the setting advertised.listeners to the proxy host and port.
Also, in some of our production hosts, InetAddress.getLocalHost().getCanonicalHostName() returns empty and so listeners setting's hostname was empty which was fine for jetty to bind. But advertised.listeners was published to zookeeper as NULL:9092 since it took the same value as listeners by default. Now all the brokers tried to publish in this way to zookeeper and so brokers got the error java.lang.IllegalArgumentException: requirement failed: Configured end points null:14092 as advertised.listeners as NULL:9092 is already registered by broker 101. The fix was to change the advertised.listeners setting to have hostname in it.

Listeners are all the addresses the Kafka broker listens on (it can be more than 1 address) whereas advertised listeners are the addresses other agents (producers, consumers, or brokers) need to connect to if they want to talk to the current broker.
The 2 lists should be the same if all are running on the same machine (can connect using localhost:9092 or 127.0.0.1:9092) but if consumers, producers, or other brokers do not stay on the same machine or Docker instance, they must use different addresses (that's why we have advertised listeners). Two examples:
Saying we use Docker to run 2 Kafka instances named kafka and kafka2. kafka2 for sure cannot connect to kafka using localhost:29092. It must use kafka:9092 instead. So for kafka, listener = localhost:29092, advertised listener = kafka:9092
Producer from host machine cannot connect to kafka using kafka:9092. It must use localhost:29092 instead.
Let use the following docker-compose config to understand more about the startup process of a Kafka broker:
# config/docker-compose.yml
kafka:
image: docker.io/bitnami/kafka:3
ports:
- "29092:29092"
- "9092:9092"
environment:
- KAFKA_CFG_ZOOKEEPER_CONNECT=zookeeper:2181
- ALLOW_PLAINTEXT_LISTENER=yes
- KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=CLIENT:PLAINTEXT,EXTERNAL:PLAINTEXT
- KAFKA_CFG_LISTENERS=CLIENT://:9092,EXTERNAL://:29092
- KAFKA_CFG_ADVERTISED_LISTENERS=CLIENT://kafka:9092,EXTERNAL://localhost:29092
- KAFKA_CFG_INTER_BROKER_LISTENER_NAME=CLIENT
depends_on:
- zookeeper
With this config, Docker will start 1 Kafka broker instance which listens on 2 ports:
9092 with name CLIENT
29092 with name EXTERNAL
The broker then connects to Zookeeper at zookeeper:2181 and registers its 2 addresses: kafka:9092 and localhost:29092. Also, with KAFKA_CFG_INTER_BROKER_LISTENER_NAME=CLIENT, it wants Zookeeper to tell other brokers to connect to kafka:9092 if want to talk to it.
But why need 2 ports? Read more here
References:
My notes while learning Kafka
Kafka Listeners – Explained

From this link: https://cwiki.apache.org/confluence/display/KAFKA/KIP-103%3A+Separation+of+Internal+and+External+traffic
During the 0.9.0.0 release cycle, support for multiple listeners per
broker was introduced. Each listener is associated with a security
protocol, ip/host and port. When combined with the advertised
listeners mechanism, there is a fair amount of flexibility with one
limitation: at most one listener per security protocol in each of the
two configs (listeners and advertised.listeners).
In some environments, one may want to differentiate between external
clients, internal clients and replication traffic independently of the
security protocol for cost, performance and security reasons. A few
examples that illustrate this:
Replication traffic is assigned to a separate network interface so that it does not interfere with client traffic.
External traffic goes through a proxy/load-balancer (security, flexibility) while internal traffic hits the brokers directly
(performance, cost).
Different security settings for external versus internal traffic even though the security protocol is the same (e.g. different set of
enabled SASL mechanisms, authentication servers, different keystores,
etc.)
As such, we propose that Kafka brokers should be able to define
multiple listeners for the same security protocol for binding (i.e.
listeners) and sharing (i.e. advertised.listeners) so that internal,
external and replication traffic can be separated if required.
So,
listeners - Comma-separated list of URIs we will listen on and their protocols.
Specify hostname as 0.0.0.0 to bind to all interfaces.
Leave hostname empty to bind to default interface.
Examples of legal listener lists:
PLAINTEXT://myhost:9092,TRACE://:9091
PLAINTEXT://0.0.0.0:9092, TRACE://localhost:9093
advertised.listeners - Listeners to publish to ZooKeeper for clients to use, if different than the listeners above.
In IaaS environments, this may need to be different from the interface to which the broker binds. If this is not set, the value for listeners will be used.

Related

I need to build a Vert.x virtual host server that channels traffic to other Vert.x apps. How is this kind of inter-app communication accomplished?

As illustrated above, I need to build a Vert.x Java app that will be an HTTP server/virtual host (TLS Http traffic, Web socket traffic) that will redirect/channel specific domain traffic to other Vert.x Java apps running on the same server, each in it's own JVM.
I have been reading for days but I remain uncertain as to how to approach all aspects of the task.
What I DO know or have experience with:
Creating an HTTP server, etc
Using a Vert.x VirtualHost handler to "handle" incoming traffic for a
specific domain
What I DO NOT know:
How do I "re-direct" a domain's traffic to another Vert.x app (this
other Vert.x app would also be running on the same server, in its own
JVM).
- Naturally this "other" Vert.x app would need to respond to HTTP
requests, etc. What Vert.x mechanisms do I employ to accomplish this
aspect of the task?
Are any of the following concepts part of the solution? I'm unfamiliar with these concepts and how they may or may not form part of the solution.:
Running each Vert.x app using -cluster option?
Vert.x Streams?
Vert.x Pumps?
There are multiple ways to let your microservices communicate with each other, the fact that all your apps are running on the same server doesn't change much, but it makes number 2.) easy to configure
1.) Rest based client - server communication
Both host and apps have a webserver
When you handle the incoming requests on the host, you simply call another app with a HttpClient
Typically all services find each others address via service discovery.
Eg: each service registers his address in a central registry then other services use this central registry to find the addresses.
Note: this maybe an overkill for you and you can just configure the addresses of the other services.
2.) You start the vertx microservices in clustered mode
the eventbus is then shared among the services
For all incoming requests you send a broadcast on the eventbus
the responsible app replies to the message
For further reading you can checkout https://vertx.io/docs/vertx-hazelcast/java/#configcluster. You start your projects with -cluster option and define the clustering in an xml configuration. I think by default it finds the services via local broadcast.
3.) You use a message broker like RabbitMq etc.
All your apps connect to a central message broker
When a new request comes in to the host, it sends a message to the message broker
The responible app then listens to the relevant messages and replies
The host receives the reply from the message broker
There are already many existing vertx clients for certain message brokers like kafka, camel, zeromq:
https://github.com/vert-x3/vertx-awesome#integration

Can you have a backup durable subscriber for ActiveMQ

I have a master/slave AMQ broker setup for JMS messaging. I have two servers that I would like to setup as a master/slave durable consumers using Apache Camel. We've been achieving this by having both servers attempt to connect with the same client ID. One node handles all of the work but if it goes down the other node connects and picks right back up on the work. This has been working fine for having a single consumer at a time but it makes noise in disconnected server's log files with the message
ERROR org.apache.camel.component.jms.DefaultJmsMessageListenerContainer]
(Camel (spring-context) thread #0 - JmsConsumer[global.topic.event]) Could
not refresh JMS Connection for destination 'global.topic.event' - retrying
using FixedBackOff{interval=5000, currentAttempts=12,
maxAttempts=unlimited}. Cause: Broker: broker - Client: client already
connected from tcp://xxx.xx.xx.xxx:xxxx
Is there a proper way to get the functionality that I'm looking to achieve? I was considering having the slave server ping the master to coordinate which one is connected but I'd like to keep the implementation as simple as possible.
Convert your usage of topics on the consumer side to Virtual Topics. Virtual Topics allow you to continue to have existing message flows produce and consume from the topic, but also have consumers listen on specially named queues.
Once you are consuming from a queue, you can implement all the consumer patterns-- exclusive consumer (which allows that hot-standby backup consumer), message groups, parallel consumers, etc.

How to setup a singleton broker in zeromq?

I'm trying to design a system using ZeroMQ where I have a bunch of consumers that each want to use a service, and I need a way for the consumers to start the desired service if it's not started yet. I can visualize having a broker do this (useful to do so, since both consumers and service are dynamic, and their endpoints are TCP ports that can't be known before runtime), but then I have to figure out how to start the broker if it hasn't started yet.
How do you reliably start a single broker using ZeroMQ? The call to bind() seems to accept more than one socket.

Can an MDB listen to multiple listener ports

Can I set up one MDB to listen to more than one listener port? Each listener port will be connected to one particular queue.
If not, why is the restriction that one MDB can listen to only one port?
No. An MDB may only be associated with one listener port (or one activation specification).
As a possible workaround to this limitation, you can configure your MDB multiple times so that each one can be bound to a different queue (listener port / activation spec).
MDBs are deployed to an application server. The application server is typically only listening on one port. You can build a simple java application that creates different connections to different servers; in a configurable way though. Just not as MDBs.
MDB's are a layer(probably several) of abstraction(s) above the concept of a port. Most messaging implementations will broker traffic across a single port,but likely a combination of data/control ports.
Think of the broker like a mail depot the letters come in and the broker puts them in the right mailbox while providing a number of other services(peer failover/communication, persistence, guaranteed delivery, message acknowledgment, etc.).
MDBs are the agents that subscribe to those abstract mailboxes. They really have no understanding of the underlying architecture. As far as they are concerned things are all happening locally in memory. Their only job is to adhere to the EJB standard and the container (typically through the application of more low level standards like JCA layered again over raw sockets) takes care of making sure the messages arrive at their destinations.
Maybe it would be helpful if you further elaborated on why you are concerned about how your MDBs relate to ports

ActiveMQ network of brokers connectivity scheme

I need to scale up my ActiveMQ solution so I have defined a network of brokers.
I'm tring to figure out how to connect my producers and consumers to the cluster.
does each producer has to be connected to a single broker (with the failover uri for availability)? in this case how can I guarentry the distribution of traffic accross the brokers? do I need to configure the producers to connect each to a diffrent broker?
should I apply the same schema for the consumers?
This makes the application aware of the cluster topology, which I hope can be avoided by a discent cluster
Tx
Tomer
I strongly suggest you carefully read through the documentation from activemq.apache.org on clustering ActiveMQ. There are a lot of very helpful tips.
From what you have written I suggest you pay special attention to this. At the bottom of the page it details how you can control from server side the failover/failback configuration for your producers.
For example:
updateClusterClients - if true pass information to connected clients about changes in the topology of the broker cluster
rebalanceClusterClients - if true, connected clients will be asked to rebalance across a cluster of brokers when a new broker joins the network of brokers
updateURIsURL - A URL (or path to a local file) to a text file containing a comma separated list of URIs to use for reconnect in the case of failure
In a production active system then I would think that making use of updateURIsURL would make it a lot less painful scaling out.

Resources