Output JSON messages with Debezium rather than Avro - apache-kafka-connect

Is it possible to generate JSON messages with Debezium rather than Avro? I have a Debezium Kafka Connect adapter and I'm trying to get it to output JSON messages into the Kafka topic.
Is this possible?

You should be able to add these to get JSON output
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter

Related

Formatting data to use Confluent JDBC Sink Connector via ksql

I'd like to use the Confluent JDBC Sink Connector via ksql to write to ClickHouse database.
I have a c# application that writes the data to Kafka topic. How can I format the message from my application, so that it is acceptable for sink to write to the database? I don't want to use the Schema Registry or other ksql constructs.
KSQL accepts JSON or CSV data, however ClickHouse has it's own Kafka Connector, so shouldn't need JDBC Sink, which will only work with a message with a schema (meaning you will need to use the Schema Registry, which is not only a KSQL construct and can be used in your C# code as well)

Kafka-Connect : Can source and sink use same topic?

I am using Kafka-MongoDB-Connect where
Source is Debezium Connector
Sink is Kafka-MongoDB-Connector by MongoDB
Can I use same topic and MongoDB collection for both Source and Sink Connector? Will it cause messages to go in infinite loop?

NullPointerException when connecting Confluent Kafka and InfluxDB

I'm trying to use the Confluent InfluxDB Sink Connector to get data from a kafka topic into my InfluxDB.
Firstly, I transmit data to kafka topic from a log file by using nifi, and it works well. The kafka topic get the data, like below:
{
"topic": "testDB5",
"key": null,
"value": {
"timestamp": "2019-03-20 01:24:29,461",
"measurement": "INFO",
"thread": "NiFi Web Server-795",
"class": "org.apache.nifi.web.filter.RequestLogger",
"message": "Attempting request for (anonymous)
},
"partition": 0,
"offset": 0
}
Then, I create InfluxDB sink connector through the Kafka Connect UI , and I get the following exception:
org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:587)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:323)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:226)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:194)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
at io.confluent.influxdb.InfluxDBSinkTask.put(InfluxDBSinkTask.java:140)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:565)
... 10 more
But if I manually input data to another topic testDB1 by using
./bin/kafka-avro-console-producer --broker-list localhost:9092 --topic testDB1 --property value.schema='{"type":"record","name":"myrecord","fields":[{"name":"measurement","type":"string"},{"name":"timestamp","type":"string"}]}'
It works, my influxDB can get the data.
Here is the connect configuration:
connector.class=io.confluent.influxdb.InfluxDBSinkConnector
influxdb.url=http://myurl
tasks.max=1
topics=testDB5
the configuration of connecting topic testDB1 is the same except topics name.
Is there any problems in nifi ? But it can transmit data to topic well.
When you use Avro with Kafka Connect, the Avro deserialiser expects the data to have been serialised using the Avro serialiser. This is what the kafak-avro-console-producer uses, which is why your pipeline works when you use that.
This article gives a good background to Avro and the Schema Registry. See also Kafka Connect Deep Dive – Converters and Serialization Explained.
I'm not familiar with Nifi, but looking at the documentation it seems that AvroRecordSetWriter has the option to use Confluent Schema Registry. At a guess you'll also want to set Schema Write Strategy to Confluent Schema Registry Reference.
Once you can consume data from your topic with kafka-avro-console-consumer then you know that it is correctly serialised and will work with your Kafka Connect sink.
I found the reason. It's because in Nifi, I used PublishKafka_0_10 to publish the data to Kafka topic, but its version is to low!
When I make a query in ksql, it says that
Input record ConsumerRecord(..data..) has invalid (negative) timestamp.
Possibly because a pre-0.10 producer client was used to write this record to Kafka without embedding a timestamp,
or because the input topic was created before upgrading the Kafka cluster to 0.10+. Use a different TimestampExtractor to process this data.
So, I change it to PublishKafka_1_0 , and start again, and it works! My influxDB can get the data. I'm speechless.
And thanks Robin Moffatt for the reply, its very helpful to me.

Does VoltDB's Connector Sink support Avro messages?

Does VoltDB's Kafka connector support reading avro messages Below is my attempt to configure the connector ?
{
"name":"KafkaSinkConnector",
"config":{
"connector.class":"org.voltdb.connect.kafka.KafkaSinkConnector",
"tasks.max":"1",
"voltdb.servers":"node1:21212,node2:21212,node3:21212",
"voltdb.procedure":"USER.insert",
"voltdb.connection.user":"test",
"voltdb.connection.password":"test",
"topics":"user",
"key.converter":"io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url":"http://kafka-schema-registry:8081",
"value.converter":"io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url":"http://kafka-schema-registry:8081",
"key.converter.schemas.enable":"true",
"value.converter.schemas.enable":"true",
}
}
The connector loads with no error messages are displayed but nothing is written to the DB.
Unfortunately, neither the Kafka importer nor the sink connector currently support Avro messages. This is a good feature request that could be added in the future, however.
Full disclosure: I work for VoltDB.
-Andrew

Kafka-Connect vs Filebeat & Logstash

I'm looking to consume from Kafka and save data into Hadoop and Elasticsearch.
I've seen 2 ways of doing this currently: using Filebeat to consume from Kafka and send it to ES and using Kafka-Connect framework. There is a Kafka-Connect-HDFS and Kafka-Connect-Elasticsearch module.
I'm not sure which one to use to send streaming data. Though I think that if I want at some point to take data from Kafka and place it into Cassandra I can use a Kafka-Connect module for that but no such feature exists for Filebeat.
Kafka Connect can handle streaming data and is a bit more flexible. If you are just going to elastic, Filebeat is a clean integration for log sources. However, if you are going from Kafka to a number of different sinks, Kafka Connect is probably what you want. I'd recommend checking out the connector hub to see some examples of open source connectors at your disposal currently http://www.confluent.io/product/connectors/

Resources