Hadoop to Kafka Rest proxy - hadoop

I am curious to know how to make rest calls from Hadoop cluster to Confluent Kafka cluster using Rest Proxy?
Any architecture reference would be helpful.

Related

How to get Kafka brokers in a cluster using Spring Boot Kafka?

I have a Spring Boot (2.3.3) service using spring-kafka to currently access a dedicated Kafka/Zookeeper configuration. I have been using the application.properties setting spring.kafka.bootstrap-servers=localhost:9092 to access my dev/test Apache Kafka service.
However, in production, we have a Cluster of Kafka Brokers (on many servers) configured in Zookeeper, and I have been asked to modify my service to query Zookeeper to get the list of brokers and use that list instead of the bootstrap servers configuration. Reason, our DevOps folks have been known to reconfigure servers/nodes and Kafka brokers.
Basically, I have been asked to make my service agnostic to where the Apache Kafka brokers are running. All my service needs to know is how to get the list of brokers (bootstrap server info including host and port) from Zookeeper.
Is there a way in spring-boot and spring-kafka to retrieve from Zookeeper the broker list and use that broker (aka bootstrap server) list in my service?
Spring delegates to the kafka-clients for all connections; for a long time now, the kafka-clients no longer connect to Zookeeper, only to the brokers themselves.
There is no built-in support in Spring for querying the Zookeeper to determine the broker list.
Furthermore, in a future Kafka version, Zookeeper is going away altogether; see KIP-500.

How do I send a REST request in Kafka without using confluent platform?

I'm trying to run kafka connect in distributed mode. Is REST the only way to start the connector? Otherwise, what commands should i use ?
The Kafka Connect API is part of Apache Kafka. You can run it standalone, or as part of Confluent Platform.
Information on how to configure it is here. If you are using distributed mode, you have to use the REST API to configure it.

data stream between Kerberized kafka cluster to hadoop cluster using Spring boot

I have a streaming use case to develop an Spring boot application where it should read data from kafka topic and put into hdfs path, I got two distinct cluster for kafka and hadoop.
Application worked fine without having kerberos authentication in kafka cluster and hadoop being kerberized.
Issues started when both cluster being kerberized, At the same time i could only authenticate into only one cluster.
I did few analysis/googling , i could not find much of help,
My theory is we could not login/authenticate into two kerberized cluster at same jvm instance because we need to set REALM and KDC details in code which are not client specific but jvm specific,
It might happen that i did not used proper APIs, I am very new to Spring boot.
I know we can do this by setting cross realm trust between clusters but i am looking for application level solutions if possible.
I got few questions
is it possible to login/authenticate two separate kerberized cluster at same jvm instance, if possible? please help me, use of Spring boot is preferred.
What would be the best solution to stream data from kafka cluster to hadoop cluster.
What would be the best solution to stream data from kafka cluster to hadoop cluster.
Kafka's Connect API is for streaming integration of sources and targets with Kafka, using just configuration files - no coding! The HDFS connector is what you want, and supports Kerberos authentication. It is open source and available standalone or as part of Confluent Platform.

Is there a CloudFormation template for DC/OS, ElasticSearch, Kafka Connect and Kafka Streams?

There are a lot of examples of the SMACK stack, but in my infrastructure I would like to use ElasticSearch and Confluent Kafka Connect and Kafka Streams.
There is a great tutorial on deploying a CloudFormation-based SMACK stack environment and another in creating an IoT pipeline with SMACK as well.
Since I am working on a Lambda architecture, I am first starting with my batch data using ElasticSearch (not Cassandra) and would like to know if there are CloudFormation templates that use Kafka Connect, ElasticSearch. Eventually we want to use Kafka Streams with InfluxDB?
DC/OS has AWS CloudFormation templates and install instructions. Once you have DC/OS installed you can install ElasticSearch and Kafka from the Mesosphere Universe as DC/OS packages.

Kafka-Connect vs Filebeat & Logstash

I'm looking to consume from Kafka and save data into Hadoop and Elasticsearch.
I've seen 2 ways of doing this currently: using Filebeat to consume from Kafka and send it to ES and using Kafka-Connect framework. There is a Kafka-Connect-HDFS and Kafka-Connect-Elasticsearch module.
I'm not sure which one to use to send streaming data. Though I think that if I want at some point to take data from Kafka and place it into Cassandra I can use a Kafka-Connect module for that but no such feature exists for Filebeat.
Kafka Connect can handle streaming data and is a bit more flexible. If you are just going to elastic, Filebeat is a clean integration for log sources. However, if you are going from Kafka to a number of different sinks, Kafka Connect is probably what you want. I'd recommend checking out the connector hub to see some examples of open source connectors at your disposal currently http://www.confluent.io/product/connectors/

Resources