NullPointerException when connecting Confluent Kafka and InfluxDB - apache-nifi

I'm trying to use the Confluent InfluxDB Sink Connector to get data from a kafka topic into my InfluxDB.
Firstly, I transmit data to kafka topic from a log file by using nifi, and it works well. The kafka topic get the data, like below:
{
"topic": "testDB5",
"key": null,
"value": {
"timestamp": "2019-03-20 01:24:29,461",
"measurement": "INFO",
"thread": "NiFi Web Server-795",
"class": "org.apache.nifi.web.filter.RequestLogger",
"message": "Attempting request for (anonymous)
},
"partition": 0,
"offset": 0
}
Then, I create InfluxDB sink connector through the Kafka Connect UI , and I get the following exception:
org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:587)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:323)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:226)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:194)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
at io.confluent.influxdb.InfluxDBSinkTask.put(InfluxDBSinkTask.java:140)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:565)
... 10 more
But if I manually input data to another topic testDB1 by using
./bin/kafka-avro-console-producer --broker-list localhost:9092 --topic testDB1 --property value.schema='{"type":"record","name":"myrecord","fields":[{"name":"measurement","type":"string"},{"name":"timestamp","type":"string"}]}'
It works, my influxDB can get the data.
Here is the connect configuration:
connector.class=io.confluent.influxdb.InfluxDBSinkConnector
influxdb.url=http://myurl
tasks.max=1
topics=testDB5
the configuration of connecting topic testDB1 is the same except topics name.
Is there any problems in nifi ? But it can transmit data to topic well.

When you use Avro with Kafka Connect, the Avro deserialiser expects the data to have been serialised using the Avro serialiser. This is what the kafak-avro-console-producer uses, which is why your pipeline works when you use that.
This article gives a good background to Avro and the Schema Registry. See also Kafka Connect Deep Dive – Converters and Serialization Explained.
I'm not familiar with Nifi, but looking at the documentation it seems that AvroRecordSetWriter has the option to use Confluent Schema Registry. At a guess you'll also want to set Schema Write Strategy to Confluent Schema Registry Reference.
Once you can consume data from your topic with kafka-avro-console-consumer then you know that it is correctly serialised and will work with your Kafka Connect sink.

I found the reason. It's because in Nifi, I used PublishKafka_0_10 to publish the data to Kafka topic, but its version is to low!
When I make a query in ksql, it says that
Input record ConsumerRecord(..data..) has invalid (negative) timestamp.
Possibly because a pre-0.10 producer client was used to write this record to Kafka without embedding a timestamp,
or because the input topic was created before upgrading the Kafka cluster to 0.10+. Use a different TimestampExtractor to process this data.
So, I change it to PublishKafka_1_0 , and start again, and it works! My influxDB can get the data. I'm speechless.
And thanks Robin Moffatt for the reply, its very helpful to me.

Related

Dynamically set topic name and Kafka Brokers in PublishKafka2_0 processor in Nifi

I am using the PublishKafka processor in nifi to publish JSON data in Kafka. I want to set Kafka Server and topic name details from attribute values. I am not able to do this. It's giving the below error
PublishKafka_2_0[id=2cb333e9-5fd6-1254-a10c-d1084b230d03] Processing
halted: yielding [1 sec]: org.apache.kafka.common.KafkaException:
Failed to construct kafka producer
Caused by: org.apache.kafka.common.config.ConfigException: No resolvable bootstrap urls given in bootstrap.servers .
Only hardcoded values of the Kafka server and the topic name is working.
Am I missing something? Any help would be much appreciated!

Output JSON messages with Debezium rather than Avro

Is it possible to generate JSON messages with Debezium rather than Avro? I have a Debezium Kafka Connect adapter and I'm trying to get it to output JSON messages into the Kafka topic.
Is this possible?
You should be able to add these to get JSON output
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter

Does VoltDB's Connector Sink support Avro messages?

Does VoltDB's Kafka connector support reading avro messages Below is my attempt to configure the connector ?
{
"name":"KafkaSinkConnector",
"config":{
"connector.class":"org.voltdb.connect.kafka.KafkaSinkConnector",
"tasks.max":"1",
"voltdb.servers":"node1:21212,node2:21212,node3:21212",
"voltdb.procedure":"USER.insert",
"voltdb.connection.user":"test",
"voltdb.connection.password":"test",
"topics":"user",
"key.converter":"io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url":"http://kafka-schema-registry:8081",
"value.converter":"io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url":"http://kafka-schema-registry:8081",
"key.converter.schemas.enable":"true",
"value.converter.schemas.enable":"true",
}
}
The connector loads with no error messages are displayed but nothing is written to the DB.
Unfortunately, neither the Kafka importer nor the sink connector currently support Avro messages. This is a good feature request that could be added in the future, however.
Full disclosure: I work for VoltDB.
-Andrew

Message Hub & Confluent Kafka Connect S3

I have requirement to consume messages from IBM MHub topic into IBM Object Storage.
I got it working with local Kafka server with Confluent Kafka Connect S3 plugin as standalone worker for sink Amazon S3 bucket and file. Both was a success.
If I configure Confluent Kafka Connect S3 as distributed worker for IBM MHub cluster I get no errors but still no messages end up to Amazon S3 bucket. I tried file sink also, no luck either.
Is it possible at all?
You could try using the Message Hub (now known as Event Streams) Cloud Object Storage bridge : https://console.bluemix.net/docs/services/MessageHub/messagehub115.html#cloud_object_storage_bridge
Seems to match your requirement?
From: https://kafka.apache.org/documentation/#connect_running
The parameters that are configured here are intended for producers and consumers used by Kafka Connect to access the configuration, offset and status topics. For configuration of Kafka source and Kafka sink tasks, the same parameters can be used but need to be prefixed with consumer. and producer. respectively. The only parameter that is inherited from the worker configuration is bootstrap.servers, which in most cases will be sufficient, since the same cluster is often used for all purposes. A notable exception is a secured cluster, which requires extra parameters to allow connections. These parameters will need to be set up to three times in the worker configuration, once for management access, once for Kafka sinks and once for Kafka sources.
So the solution was adding duplicate configuration with consumer. prefix into worker configuration so that required sasl_ssl settings took place instead of defaults on sink consumer.
IBM Cloud Object Storage also works. Requires credentials eg. env vars: AWS_ACCESS_KEY_ID="see cos credentials" & AWS_SECRET_ACCESS_KEY="see cos credentials"
Connector config:
{
"name": "s3-sink",
"config": {
"connector.class": "io.confluent.connect.s3.S3SinkConnector",
"tasks.max": "5",
"topics": "your-topic",
"s3.region": "eu-central-1",
"store.url": "https://s3.eu-geo.objectstorage.softlayer.net",
"s3.bucket.name": "your-bucket",
"s3.part.size": "5242880",
"flush.size": "1",
"storage.class": "io.confluent.connect.s3.storage.S3Storage",
"format.class": "io.confluent.connect.s3.format.json.JsonFormat",
"partitioner.class": "io.confluent.connect.storage.partitioner.DefaultPartitioner",
"schema.compatibility": "NONE",
"name": "s3-sink"
}
}

GetKafka not getting messages in Apache Nifi

I'm trying to get the messages using a GetKafka processor but I'm not able to get it. I tested consuming messages using Kafka command line consumer and it works. I was also able to use PutKafka processor successfully to put the messages in the topic. Attached my settings where I set the Zookeeper Connection String and Topic Name. When I run the flow, I don't see any errors in the processors.
GetKafka Processor
I see an exception in nifi-app.log:
2016-08-03 09:34:33,722 WARN [70e1df87-6097-4ed0-9a40-7e36f9be6921_mydomain.com-1470231250839-1fbd0cfe-leader-finder-thread] kafka.client.ClientUtils$ Fetching topic metadata with correlation id 0 for topics [Set(test)] from broker [id:0,host:DataSlave1.CSE-RD.com,
port:9092] failed
java.nio.channels.ClosedByInterruptException: null
at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) ~[na:1.8.0_101]
at sun.nio.ch.SocketChannelImpl.poll(SocketChannelImpl.java:957) ~[na:1.8.0_101]
Kafka (0.8): 2.10-0.8.2.1, Nifi: 0.7.0
Am I missing anything? Thanks.
The exception went away after a restart. GetKafka is now able to get the messages as they are sent in the producer. It was not receiving the previous messages in the topic (something equivalent to --from-beginning in kafka console consumer). I don't see a setting for that in the processor.

Resources