Message Hub & Confluent Kafka Connect S3 - message-hub

I have requirement to consume messages from IBM MHub topic into IBM Object Storage.
I got it working with local Kafka server with Confluent Kafka Connect S3 plugin as standalone worker for sink Amazon S3 bucket and file. Both was a success.
If I configure Confluent Kafka Connect S3 as distributed worker for IBM MHub cluster I get no errors but still no messages end up to Amazon S3 bucket. I tried file sink also, no luck either.
Is it possible at all?

You could try using the Message Hub (now known as Event Streams) Cloud Object Storage bridge : https://console.bluemix.net/docs/services/MessageHub/messagehub115.html#cloud_object_storage_bridge
Seems to match your requirement?

From: https://kafka.apache.org/documentation/#connect_running
The parameters that are configured here are intended for producers and consumers used by Kafka Connect to access the configuration, offset and status topics. For configuration of Kafka source and Kafka sink tasks, the same parameters can be used but need to be prefixed with consumer. and producer. respectively. The only parameter that is inherited from the worker configuration is bootstrap.servers, which in most cases will be sufficient, since the same cluster is often used for all purposes. A notable exception is a secured cluster, which requires extra parameters to allow connections. These parameters will need to be set up to three times in the worker configuration, once for management access, once for Kafka sinks and once for Kafka sources.
So the solution was adding duplicate configuration with consumer. prefix into worker configuration so that required sasl_ssl settings took place instead of defaults on sink consumer.
IBM Cloud Object Storage also works. Requires credentials eg. env vars: AWS_ACCESS_KEY_ID="see cos credentials" & AWS_SECRET_ACCESS_KEY="see cos credentials"
Connector config:
{
"name": "s3-sink",
"config": {
"connector.class": "io.confluent.connect.s3.S3SinkConnector",
"tasks.max": "5",
"topics": "your-topic",
"s3.region": "eu-central-1",
"store.url": "https://s3.eu-geo.objectstorage.softlayer.net",
"s3.bucket.name": "your-bucket",
"s3.part.size": "5242880",
"flush.size": "1",
"storage.class": "io.confluent.connect.s3.storage.S3Storage",
"format.class": "io.confluent.connect.s3.format.json.JsonFormat",
"partitioner.class": "io.confluent.connect.storage.partitioner.DefaultPartitioner",
"schema.compatibility": "NONE",
"name": "s3-sink"
}
}

Related

How to get Kafka brokers in a cluster using Spring Boot Kafka?

I have a Spring Boot (2.3.3) service using spring-kafka to currently access a dedicated Kafka/Zookeeper configuration. I have been using the application.properties setting spring.kafka.bootstrap-servers=localhost:9092 to access my dev/test Apache Kafka service.
However, in production, we have a Cluster of Kafka Brokers (on many servers) configured in Zookeeper, and I have been asked to modify my service to query Zookeeper to get the list of brokers and use that list instead of the bootstrap servers configuration. Reason, our DevOps folks have been known to reconfigure servers/nodes and Kafka brokers.
Basically, I have been asked to make my service agnostic to where the Apache Kafka brokers are running. All my service needs to know is how to get the list of brokers (bootstrap server info including host and port) from Zookeeper.
Is there a way in spring-boot and spring-kafka to retrieve from Zookeeper the broker list and use that broker (aka bootstrap server) list in my service?
Spring delegates to the kafka-clients for all connections; for a long time now, the kafka-clients no longer connect to Zookeeper, only to the brokers themselves.
There is no built-in support in Spring for querying the Zookeeper to determine the broker list.
Furthermore, in a future Kafka version, Zookeeper is going away altogether; see KIP-500.

Spring boot connect to alibaba e-mapreduce kafka

I'm trying to connect spring boot kafka app to kafka on alibaba cloud.
The cloud is on e-mapreduce service.
However, I can't connect from boot, maybe due to some security credential that I need to provide?
I've already tried to set the boot properties as follows:
spring.kafka.properties.security.protocol=SSL
Get error : Connection to node -1 (/xx.xx.xx.xx:9092) terminated during authentication. This may happen due to any of the following reasons: (1) Authentication failed due to invalid credentials with brokers older than 1.0.0, (2) Firewall blocking Kafka TLS traffic (eg it may only allow HTTPS traffic), (3) Transient network issue.
spring.kafka.properties.security.protocol=SASL_SSL
Throws Caused by: java.lang.IllegalArgumentException: Could not find a 'KafkaClient' entry in the JAAS configuration. System property 'java.security.auth.login.config' is not set
Anybody has experience connect to kafka on alibaba cloud?
I believe Kafka Connect could solve your problems of connect spring boot kafka app to kafka on Alibaba cloud:
Step 1: Create Kafka clusters
Create a source Kafka cluster and a target Kafka cluster in E-MapReduce.
Step 2: Create a topic for storing the data to be migrated
Create a topic named connect in the source Kafka cluster.
Step 3: Create a Kafka Connect connector
Use Secure Shell (SSH) to log on to the header node of the source Kafka cluster.
Optional:Customize Kafka Connect configuration.
Step 4: View the status of the Kafka Connect connector and task node
View the status of the Kafka Connect connector and task node and make sure that they are in normal status.
Follow other steps as your job needs are.
Detail instructions may be find on Use Kafka Connect to migrate data link: https://www.alibabacloud.com/help/doc-detail/127685.htm
Hope this will help you,

NullPointerException when connecting Confluent Kafka and InfluxDB

I'm trying to use the Confluent InfluxDB Sink Connector to get data from a kafka topic into my InfluxDB.
Firstly, I transmit data to kafka topic from a log file by using nifi, and it works well. The kafka topic get the data, like below:
{
"topic": "testDB5",
"key": null,
"value": {
"timestamp": "2019-03-20 01:24:29,461",
"measurement": "INFO",
"thread": "NiFi Web Server-795",
"class": "org.apache.nifi.web.filter.RequestLogger",
"message": "Attempting request for (anonymous)
},
"partition": 0,
"offset": 0
}
Then, I create InfluxDB sink connector through the Kafka Connect UI , and I get the following exception:
org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:587)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:323)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:226)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:194)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
at io.confluent.influxdb.InfluxDBSinkTask.put(InfluxDBSinkTask.java:140)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:565)
... 10 more
But if I manually input data to another topic testDB1 by using
./bin/kafka-avro-console-producer --broker-list localhost:9092 --topic testDB1 --property value.schema='{"type":"record","name":"myrecord","fields":[{"name":"measurement","type":"string"},{"name":"timestamp","type":"string"}]}'
It works, my influxDB can get the data.
Here is the connect configuration:
connector.class=io.confluent.influxdb.InfluxDBSinkConnector
influxdb.url=http://myurl
tasks.max=1
topics=testDB5
the configuration of connecting topic testDB1 is the same except topics name.
Is there any problems in nifi ? But it can transmit data to topic well.
When you use Avro with Kafka Connect, the Avro deserialiser expects the data to have been serialised using the Avro serialiser. This is what the kafak-avro-console-producer uses, which is why your pipeline works when you use that.
This article gives a good background to Avro and the Schema Registry. See also Kafka Connect Deep Dive – Converters and Serialization Explained.
I'm not familiar with Nifi, but looking at the documentation it seems that AvroRecordSetWriter has the option to use Confluent Schema Registry. At a guess you'll also want to set Schema Write Strategy to Confluent Schema Registry Reference.
Once you can consume data from your topic with kafka-avro-console-consumer then you know that it is correctly serialised and will work with your Kafka Connect sink.
I found the reason. It's because in Nifi, I used PublishKafka_0_10 to publish the data to Kafka topic, but its version is to low!
When I make a query in ksql, it says that
Input record ConsumerRecord(..data..) has invalid (negative) timestamp.
Possibly because a pre-0.10 producer client was used to write this record to Kafka without embedding a timestamp,
or because the input topic was created before upgrading the Kafka cluster to 0.10+. Use a different TimestampExtractor to process this data.
So, I change it to PublishKafka_1_0 , and start again, and it works! My influxDB can get the data. I'm speechless.
And thanks Robin Moffatt for the reply, its very helpful to me.

MirrorMaker Kafka 0.9 connection to Kafka Brokers 0.10 (IBM Message Hub)

I am trying to connect my MirroMaker Kafka 0.9 to the Kafka Brokers 0.10 (IBM Message Hub) without success. The links I have followed are the followings, but they are mostly for Kafka clients 0.10:
https://console.bluemix.net/docs/services/MessageHub/messagehub050.html#kafka_using https://console.bluemix.net/docs/services/MessageHub/messagehub063.html#kafka_connect
Do you know the steps for Kafka clients 0.9 and how to use the MessageHubLoginModule and the jaas creation?
UPDATE
After different tests, the solution works correctly.
In order to connect to the IBM message hub with a cloudera mirror maker you must set within cloudera manager the property Source Kafka Cluster's Security Protocol: source.security.protocol as PLAINTEXT and pass the following properties as Kafka MirrorMaker Advanced Configuration Snippet (Safety Valve) for mirror_maker_consumers.properties:
security.protocol=SASL_SSL
sasl.mechanism=PLAIN
ssl.protocol=TLSv1.2
ssl.enabled.protocols=TLSv1.2
ssl.endpoint.identification.algorithm=HTTPS
It worked for me.
First you should not be building a new Message Hub application using Kafka 0.9.
We've deprecated the custom login module 0.9 requires and our newer clusters won't support it. You should be using a Kafka client >= 0.10.2 as they properly support Sasl Plain authentication which is required by Message Hub. The newer Kafka clients offer many more features and are just better.
In case you're absolutely stuck with 0.9, you need:
The following properties set:
security.protocol=SASL_SSL
ssl.protocol=TLSv1.2
ssl.enabled.protocols=TLSv1.2
A JAAS file containing:
KafkaClient {
com.ibm.messagehub.login.MessageHubLoginModule required
serviceName="kafka"
username="<USERNAME>"
password="<PASSWORD>";
};
The custom login module JAR in the path:
The file is available on Github: https://github.com/ibm-messaging/message-hub-samples/blob/master/kafka-0.9/message-hub-login-library/messagehub.login-1.0.0.jar
The java.security.auth.login.config Java property set:
It needs to point to your JAAS file and can be either:
in the java command line using -Djava.security.auth.login.config=<PATH TO JAAS> or
programmatically using System.setProperty("java.security.auth.login.config", "<PATH TO JAAS>");

Entry point in kafka cluster after sending the message

We have a kafka cluster and have build the cluster with kafka from this maven repository https://mvnrepository.com/artifact/org.apache.kafka/kafka_2.10/0.10.0.0
We are using another kafka client library to send messages into this cluster. I would like to log the messages that are sent to this cluster and send them to AWS cloudwatch (my cluster in running on AWS EC2).
How can I achieve this?
Also, once a producer has sent the message, what is the entry point of code that will be invoked in this kafka library?
-Thanks!

Resources