Requiring public IP address for kafka running on EC2 - amazon-ec2

We have kafka and zookeeper installed on a single AWS EC2 instance. We have kafka producers and consumers running on separate ec2 instances which are on the same VPC and have the same security group as that of kafka instance. In the producer or consumer config we are using the internal IP address of the kafka server to connect to it.
But we have noticed that we need to mention the public IP address of the EC2 server as advertised.listeners for letting the producers and consumers connect to the Kafka server:
advertised.listeners=PLAINTEXT://PUBLIC_IP:9092
Also we have to whitelist the public ip addresses and open traffic on 9092 port of each of our ec2 servers running producers and consumers.
We want the traffic to flow using internal IP addresses. Is there a way we need not whitelist the public ip addresses and open traffic on 9092 port for each one of our servers running producer or consumer?

If you don't want to open access to all for either one of your servers, I would recommend adding a proper high performance web server like nginx or Apache HTTPD in front of your applications' servers acting as a reverse proxy. This way you could also add SSL encryption and your server stays on a private network while only the web server would be exposed. It’s very easy and you can find many tutorials on how to set it up. Like this one: http://webapp.org.ua/sysadmin/setting-up-nginx-ssl-reverse-proxy-for-tomcat/

Because of the variable nature of the ecosystem that kafka may need to work in, it only makes sense that you are explicit in declaring the locations which kafka can use. The only way to guarantee that external parts of any system can be reached via an ip address is to ensure that you are using external ip addresses.

Related

How does gRPC know service ip addresses for Microservices

I was starting with the Google Cloud Platform's microservice demo. And I was curious how gRPC stubs work when the services are deployed in containers.
As far as my understanding goes, the container of a particular service are addressed by the Service IP specified in the YAML configuration file. So the gRPC server of a service must listen at the service IP? But I came across the following snippet of code:
l, err := net.Listen("tcp", fmt.Sprintf(":%s", port))
if err != nil {
log.Fatal(err)
}
I am wondering how does the server listen to an address without an IP?
:{port} isn't an "address without an IP".
The documentation for Listen includes "if the host in the address parameter is empty or a literal unspecified IP address, Listen listens on all available unicast and anycast IP addresses of the local system".
So, in this case, without a host address, the effective address would be 0.0.0.0 which corresponds to all interfaces. Corollary a common mistake people make when using containers is to bind their code to localhost (127.0.0.1) which cannot be accessed from outside the container.
Using 0.0.0.0 is a common (good) practice, particularly when using containers, as it effectively delegates address binding to the container runtime.
So, your app runs on {port} on all interfaces within the container. The container runtime then binds (one of more of) these interfaces to the host's interface(s) and your e.g. client code connects to the host's IP address.
When your container is being managed by Kubernetes, Kubernetes assigns IP address(es) to the containers running your app and these are often exposed to other services using a Kubernetes Service resource which not only has an IP address but also a cluster DNS.
The Kubernetes YAML probably specifies a Service DNS.
Kubernetes resolves requests to the DNS name to a selected container (IP and port)
The container runtime routes incoming requests on the host's port to the container's port
Your gRPC server will accept traffic from the container runtime on any interface that's on the {port} that you've defined it to net.Listen on.

Kafka Client Bind IP (Secondary NIC)

I have a .NET Kafka client (using librdkafka via a Confluent's .NET client) running on a physical server with two network interfaces active. One is 10G and the other is 1G, both of them have static IP addresses assigned. Our networking team handles the configurations and is unlikely to change their practices for one application so I'd like to handle this client-side. I should also mention that the 1G interface and 10G interfaces are on the same network.
Since my Kafka cluster (3-node) is all 10G, I would like to require my application's consumer to bind to the 10G IP address. Looking through all of the documentation, I can't find anything about defining this on the client.
I would like to avoid any "hacky" solutions like setting Kafka to deny any non-whitelisted IP addresses or DNS tomfoolery.
Thanks in advance!
Just to be sure.., Do you know if your server is doing interface bonding (means the traffic will load balance between each interface, though, it's unlikely to do binding on different speed interfaces..)?
If not, as your two interfaces are on the same network, it means you will only use one interface to reach the network (except if you have exotic routing config). This interface will be defined by your default route.
If it's a Linux server, you can do as follows :
ip route
default via X.X.X.X dev YOURDEFAULTINTERFACE
If it's the 10G, you have nothing to do and you can be sure it will use this interface.
If not, you cannot do anything Kafka side, as it's purely OS settings side. Your Kernel will forward any traffic through this default interface.
Again.. I insist on the fact that this is because both your interfaces are in the same network.
If you have any doubts with this, please share your network configuration in details ( result of ip addr and ip route)
Yannick

Kafka Server Properties - unable to connect to broker

Lets say Kafka is running as a single node broker on an AWS EC2 instance. The instance has the internal private IP 10.0.0.1. I want to connect to that broker directly from the same EC2 instance and from another EC2 instance in the same VPC and subnet. The security groups are allowing the connection.
Which settings do I have to use to get the connection running?
I tried listeners=PLAINTEXT://0.0.0.0:9092 and advertised.listeners=PLAINTEXT://0.0.0.0:9092. With that setting I can connect to the broker from local (the same instance where the broker is running), but I can't reach the broker from the second EC2 instance.
Does anybody have any idea?
If you are trying to connect to the Kafka instance inside of AWS from one EC2 instance to another the internal ip address should work.
The producer and consumers should make use of the internal private ip addresses as well for both the broker and zookeeper.
Additionally, you may need to verify that the IP Tables at the OS level aren't blocking the communication.

Can Marathon assign the same randomly selected host_port across instances?

For my containerized application, I want to Marathon to allocate the same host_port for the container's bridge network endpoint for all instances of that application. Specifying the host port runs the risk of resource exhaustion. Not specifying it will cause a random port to be picked for each instance.
I dont mind a randomly picked port so long as it is identical across all instances of my application. Is there a way to request Marathon to pick such a host port for my container endpoint.
I think what you are really after is service discovery / load balancing. Have a look at the Marathon docs at
https://mesosphere.github.io/marathon/docs/service-discovery-load-balancing
to get an overview.
Also, see the Docker networking docs at
https://mesosphere.github.io/marathon/docs/native-docker.html
You can probably either make use of the hostPort or the more general ports properties.

How to set up EC2 with public IP for connections from itself?

I have an EC2 instance (running kafka) which needs to access itself via public IPs, but I would like to not open the network ACLs to the whole world.
The rationale is that when a connection is made to a kafka broker, the broker advertises which kafka nodes are available. As kafka will be used inside and outside EC2, the only common option is for the broker to advertise its public IP.
My setup:
an instance, with public IP (not an elastic IP)
a vpc
a security group, allowing access to the kafka ports from my work network
an internet gateway
a route allowing external access via the gateway
The security group is as follow:
Custom TCP Rule, proto=TCP, port=9092, src=<my office network>
Custom TCP Rule, prtot=TCP, port=2181, src=<my office network>
In short, all works fine inside the instance if I use localhost.
All works fine outside the instance if I use the public IP.
What I now want is to use kafka from inside the instance with the public IP.
If I open the kafka ports to the whole world:
Custom TCP Rule, proto=TCP, port=9092, src=0.0.0.0/0
Custom TCP Rule, prtot=TCP, port=2181, src=0.0.0.0/0
It works, as expected, but it does not feel safe.
How could I setup the network ACL to accept inbound traffic from my local instance/subnet/vpv (does not matter which) without opening too much?
Well, this is not clean, but it has the added advantage of not having to pay for external bandwidth.
I did not find a way as I expected (via the security groups), but just by updating the /etc/hosts on my ec2 instance, and actually using a hostname instead of an IP, all works as expected.
For instance, if I give the instance the hostname kafka.example.com, then by having the following line in /etc/hosts:
127.0.0.1 kafka.example.com
I can use the name kafka.example.com everywhere, even if it actually points to a different IP depending on where the call is made.

Resources