How to deploy Kafka connector on AWS with auto-scale group - apache-kafka-connect

it looks like if we like to set up Kafka connector in distributed mode, we will need to have a unique hostname at CONNECT_REST_ADVERTISED_HOST_NAME. However, if we deploy the connector on AWS with an auto-scale group, there is no known hostname for that, not sure how can I do the setup?

You can achieve this scenario using the following steps in Pre-created EC2 instance.
1. Configure rest.advertised.host.name property connect-distributed.properties file
rest.advertised.host.name=hostname
2. Create kafka-connect.service file
nano /etc/systemd/system/kafka-connect.service
[Unit]
Description=Kakfka-connect
After=network.target
[Service]
User=ubuntu
Group=ubuntu
Environmet="KAFKA_HEAP_OPTS=-Xmx4G -Xms2G"
Environment="KAFKA_OPTS=-javaagent:/home/ubuntu/prometheus/jmx_prometheus_javaagent-0.16.1.jar=8080:/home/ubuntu/prometheus/kafka-connect.yml"
ExecStart=/home/ubuntu/kafka/kafka_2.13-2.7.0/bin/connect-distributed.sh /home/ubuntu/config/connect-distributed.properties
[Install]
WantedBy=multi-user.target
3. Now take the snapshot of this EC2 instance's volume.
4. Create launch configuration
Use the snapshot as volume. and use following User data.
#!/bin/bash
apt-get update
apt-get -y upgrade
sed -i "s/hostname/$(hostname -I)/g" /home/ubuntu/config/connect-distributed.properties
systemctl start kafka-connect
systemctl enable kafka-connect
sed command will replace the 'hostname' string in connect-distributed.properties file for rest.advertised.host.name with instance's private IP when new instance starts from Auto-scaling group.

Related

Unable to setup external etcd cluster in Kubernetes v1.15 using kubeadm

I'm trying to setup Kubernetes cluster with multi master and external etcd cluster. Followed these steps as described in kubernetes.io. I was able to create static manifest pod files in all the 3 hosts at /etc/kubernetes/manifests folder after executing Step 7.
After that when I executed command 'sudo kubeadmin init', the initialization got failed because of kubelet errors. Also verified journalctl logs, the error says misconfiguration of cgroup driver which is similar to this SO link.
I tried as said in the above SO link but not able to resolve.
Please help me in resolving this issue.
For installation of docker, kubeadm, kubectl and kubelet, I followed kubernetes.io site only.
Environment:
Cloud: AWS
EC2 instance OS: Ubuntu 18.04
Docker version: 18.09.7
Thanks
After searching few links and doing few trails, I am able to resolve this issue.
As given in the Container runtime setup, the Docker cgroup driver is systemd. But default cgroup driver of Kubelet is cgroupfs. So as Kubelet alone cannot identify cgroup driver automatically (as given in kubernetes.io docs), we have to provide cgroup-driver externally while running Kubelet like below:
cat << EOF > /etc/systemd/system/kubelet.service.d/20-etcd-service-manager.conf
[Service]
ExecStart=
ExecStart=/usr/bin/kubelet --cgroup-driver=systemd --address=127.0.0.1 --pod->manifest-path=/etc/kubernetes/manifests
Restart=always
EOF
systemctl daemon-reload
systemctl restart kubelet
Moreover, no need to run sudo kubeadm init, as we are providing --pod-manifest-path to Kubelet, it runs etcd as Static POD.
For debugging, logs of Kubelet can be checked using below command
journalctl -u kubelet -r
Hope it helps. Thanks.

Kubernetes remote cluster setup

How I can setup and securely access a kubernetes cluster on EC2 instance from my laptop? I want it to be a single-node cluster, like running only one instance. Have tried run minikube at EC2 instance, but can't config laptop to connect to it.
So, in the result, I want to run like 10 services/pods in EC2 instance and just debug run on my dev laptop.
Thanks!
You can use KOPS (Kubernetes Ops) to Accomplish this. Its a really handy tool. There's a whole section for configuring a cluster on AWS. I use it on a couple of projects and id really recommend it. Its an easy to understand setup and straight forward.
After the cluster is up you can use kubectl proxy to proxy locally and interact with the cluster. Or use kubectl with config files to set up services and pods.
It does not create a new instance per service or pod it creates a pod on the node(s) that is already existing on the cluster.
In your case you could have a single master and a single node in whatever size that suits your needs.t.2 micro or otherwise
A command to accomplish that would look like:
kops create cluster \
--cloud aws \
--state $KOPS_STATE_STORE \
--node-count $NODE_COUNT \
--zones $ZONES \
--master-zones $MASTER_ZONES \
--node-size $NODE_SIZE \
--master-size $MASTER_SIZE \
-v $V_LOG_LEVEL \
--ssh-public-key $SSH_KEY_PATH \
--name=$CLUSTER_NAME
Where the $NODE_COUNT would be 1 thus having a single Node or EC2 Instance and another instance as the master
To connect to it locally you can also deploy the kubernetes dashboard on your cluster.
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/master/src/deploy/recommended/kubernetes-dashboard.yaml
To access Dashboard from your local workstation you must create a secure channel to your Kubernetes cluster. Run the following command:
kubectl proxy
Now you can access the Dashboard at:
http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/

How do I restart hadoop services on dataproc cluster

I may be searching with the wrong terms, but google is not telling me how to do this. The question is how can I restart hadoop services on Dataproc after changing some configuration files (yarn properties, etc)?
Services have to be restarted on a specific order throughout the cluster. There must be scripts or tools out there, hopefully in the Dataproc installation, that I can invoke to restart the cluster.
Configuring properties is a common and well supported use case.
You can do this via cluster properties, no daemon restart required. Example:
dataproc clusters create my-cluster --properties yarn:yarn.resourcemanager.client.thread-count=100
If you're doing something more advanced, like updating service log levels, then you can use systemctl to restart services.
First ssh to a cluster node and type systemctl to see the list of available services. For example to restart HDFS NameNode type sudo systemctl restart hadoop-hdfs-namenode.service
If this is part of initialization action then sudo is not needed.
On master nodes:
sudo systemctl restart hadoop-yarn-resourcemanager.service
sudo systemctl restart hadoop-hdfs-namenode.service
on worker nodes:
sudo systemctl restart hadoop-yarn-nodemanager.service
sudo systemctl restart hadoop-hdfs-datanode.service
After that, you can use systemctl status <name> to check the service status, also check logs in /var/log/hadoop.

How can I change WebUI interface port 8080 for RethinkDB on Linux?

I'm running RethinkDB on Amazon Linux AMI. I already have services running on 8080 so I need to change the port for the WebUI interface. How would I do that?
I happen to find the documentation here https://www.rethinkdb.com/docs/cli-options/
$ rethinkdb --bind all --http-port 9090
Karthick here has the right answer if you are running your instance of RethinkDB from the command line and daemonizing it.
In case you are running the default system configuration of RethinkDB after say sudo apt-get install rethinkdb and want to change it there you have to change the configuration file by following these steps:
You want to look under the directory /etc/rethinkdb and find the RethinkDB configuration file and change the http-port value to the new port you'd like it to be on.
Then if your system uses init.d you should be able to restart with sudo service rethinkdb restart. If your system usessystemdthen you'll do something like thissudo systemctl [restart|stop/start] rethinkdb`
These two links will be a good resource to you in this case:
https://www.rethinkdb.com/docs/config-file/
https://www.rethinkdb.com/docs/start-on-startup/

RabbitMQ settings disappear on restart. Why?

This is on EC2. I have an init script that does some basic setup like installing rabbitmq, creating a virtual host, user, setting permissions, etc. So basically it goes:
sudo yum --enablerepo=epel install rabbitmq-server
/etc/init.d/rabbitmq-server start
rabbitmqctl add_user username password
rabbitmqctl add_vhost vhost
rabbitmqctl set_permissions -p vhost username ".*" ".*" ".*"
rabbitmqctl stop
Then I exit the shell, and create an EBS image from the instance. Amazon automatically reboots the server to create the image.
Now the weird part...after a reboot everything was still set except the permissions.
Then when I started a new instance from the image, there was no username or host in rabbitmq.
Is there something that needs to be done in rabbitmq to save changes?
If the settings disappear when you "stop" and "restart" the instance opposed to rebooting it, it is because the ip address is changing and RabbitMQ settings are bound to the ip.
See RabbitMQ on Amazon EC2 instances
I think it maybe be this, from http://www.rabbitmq.com/ec2.html
Persistent data on EBS device
RabbitMQ writes data to the following directories on Ubuntu:
/var/lib/rabbitmq/ to store persistent data like the messages or queues
/var/log/rabbitmq/ to store logs
If you want to use EBS block device to store RabbitMQ data, just link these directories to your EBS device. Stop RabbitMQ before making any changes to the data directory:
$ /etc/init.d/rabbitmq-server stop

Resources