Process messages with Null inside array in Kafka connect S3 connector - parquet

I'm using Kafka connect with 2 connectors:
debezium to pull data from Postgres to Kafka
S3 connector to save data from Kafka to S3
While running I got this error from the S3 connector
java.lang.NullPointerException: Array contains a null element at 0
I have found the related message that has the following as part of the message:
"some_key": [
"XCVB",
null
]
How can I process this message?
I have tried adding the following to the S3 connector config:
"behavior.on.null.values": "ignore",
"errors.tolerance": "all",
"errors.deadletterqueue.topic.name":"dlq_s3_sink"
to try and skip those messages and send them to DLQ, but it doesn't seems to be working and the task failed on this error. I also saw this in the log:
Set parquet.avro.write-old-list-structure=false to turn on support for arrays with null elements.
but not sure where should I add this? as part of the connector config?

Add parquet.avro.write-old-list-structure:false in the sink connector config.
Also use Version 10.1.0 or above.
Reference: https://docs.confluent.io/kafka-connectors/s3-sink/current/changelog.html#version-10-1-0:~:text=S3%20transient%20errors-,PR%2D485%20%2D%20CCMSG%2D1531%3A%20Support%20null%20items%20within%20arrays%20with%20Parquet%20writer,-PR%2D475%20%2D%20CCMSG

Related

Kafka Connect JDBC Source running even when query fails

I'm running a JDBC source connector and try to monitor its status somehow via the exposed JMX metrics and a prometheus exporter. However the status of the connector and all its tasks are still in the running state when the query fails or db can't be reached.
In earlier versions it seems that no value for source-record-poll-total in the source-task-metrics was exported when the query failed, in the versions I use (connect-runtime-6.2.0-ccs, confluentinc-kafka-connect-jdbc-10.2.0, jmx_prometheus_javaagent-0.14.0) even when failing the metric is exported with value 0.0.
Any ideas how I could detect such a failing query or db-connection?
This is resolved in version 10.2.4 of the jdbc connector. Tasks now fail when a SQLNonTransientException occurs and this can be detected using the exported metrics. See https://github.com/confluentinc/kafka-connect-jdbc/pull/1096

solace creates multi connection with spring boot

i try to use solace with spring boot.
here is a demo application and the log its create:
https://github.com/GreenRover/solace_spring_multiconnection/blob/master/problem.log
I wonder about this error (INFO) message
c.s.jcsmp.impl.SessionModeSupport .... - Error Response (400) - Already Exists
Is it normal to get this message or goes something wrong?
This message indicates that a queue fails to be created because a queue with the same name already exists.
This is expected since your sample code tries to create a queue with the same name more than once. It is ok to ignore this message.
However, if you want to avoid the message, the application has to make sure that only one queue is created with the name.

How to get the controller address/ID

I am using confluent Kafka Go for my project. When writing tests, because of the asynchronous nature of Kafka when creating a topic, I might be have errors (error code 3: UNKNOWN_TOPIC_OR_PARTITION) when create the topic then get back immediately.
As I understood, if I can query directly on the controller, I can always get the lastest meta data. So my question is: How can I get Kafka controller's IP or ID when using Confluent Kafka Go.

How can I bulk queue kafka messages to clickhouse

I am trying to bulk an Streaming queue kafka to clickhouse following the steps of official web page https://clickhouse.yandex/docs/en/table_engines/kafka.html and there is no way to make that it's run ok.
I've checked kafka configuration and it's ok because I've created a feeder for this queue and I've added to clickhouse configuration the zookeeper's host and port.
For example, the sentence from eclipse is :
System.out.println(ck.connection.createStatement().execute("CREATE TABLE IF NOT EXISTS test.clickhouseMd5 ("st1 String," + "st2 String," + "st3 String) ENGINE = Kafka('node2:2181', 'TestTopic', 'testConsumerGroup', 'JSONEachRow')"));
The result of System.out.println() is always false and there isn't exceptions.
Any ideas?
Thanks,
kind regards.
could you try run your query via commandline clickhouse-client
on clickhouse node?
cliclhouse-client -c "CREATE TABLE IF NOT EXISTS test.clickhouseMd5 (st1 String,st2 String, st3 String) ENGINE = Kafka('node2:2181', 'TestTopic', 'testConsumerGroup', 'JSONEachRow')"
You use port 2181 which is a default port for zookeeper.
But according to documentation you mentioned (https://clickhouse.yandex/docs/en/table_engines/kafka.html) you should specify in the first argument a comma-separated list of brokers (localhost:9092).
Also note that it may not work with old Kafka version. For example with 0.9.0.1. With this version CREATE TABLE command returns OK. But in kafka logs I have error like 'ERROR Processor got uncaught exception. (kafka.network.Processor) java.lang.ArrayIndexOutOfBoundsException: 18'.
But with the latest Kafka 0.11.0.2 it works fine for me.

Executing spring cloud task based on event into messaging source (i.e. RabbitMQ, Kafka)

I am new in Learning of Spring cloud Task and SCDF so asking this.
I wand to execute my SCT based on an event (say a message is posted into Rabbit MQ), so I am think it can be done in two ways:
Create a source which polls message from RabbitMQ and sends the data to stream, now create a sink which reads data from stream and as soon as data comes to sink (from source stream) Task will be launched.
create steam producer --definition "rabbitproducer | streamconsumer (This is #TaskEnabled)"
Not sure if this is possible?
Other way could be to use task launcher. Here task launcher will be configured with a stream and a listener will be polling message from rabbitMQ. so when a message is received then trigger will initiate the process and tasklauncher will launch the task. But here not sure how will i get the message data into my task? Do I have to add the data into TaskLaunchRequest?
create stream mystream --definition "rabbitmsgtrigger --uri:my task | joblauncher"
Launching a Task by an upstream event is already supported and there are few approaches to it - please review the reference guide (and the sample) for more details.
Here is the complete explanation about my question's answer. Here Sabby has helped me a lot for resolving my issue.
Problem: I was not able to trigger my task using tasklauncher/task-sink. In the log also I was not getting correct details and I even did not know how to set to set log level correctly.
Solution: With the help of Sabby and documentations provided on SCT site I could resolve this and have moved ahead in my POC work. Below are the detailed steps I did.
Started my SCDF with postgresql database by referring to the property file and setting log level change as
--logging.level.org.springframework.cloud=DEBUG
--spring.config.location=file://scdf.properties
Imported apps from bitly.
app import --uri [stream applications link][1]
Registered task sink app
app register --name task-sink --type sink --uri file://tasksink-1.1.0.BUILD-SNAPSHOT.jar
Created stream as:
stream create mytasklaunchertest --definition "triggertask --triggertask.uri=https://my-archiva/myproject-scdf-task1/0.0.1-SNAPSHOT/myproject-scdf-task1-0.0.1-20160916.143611-1.jar --trigger.fixed-delay=5 | task-sink"
Deployed Stream:
stream deploy foo --properties "app.triggertask.spring.rabbitmq.host=host,app.triggertask.spring.rabbitmq.username=user,app.triggertask.spring.rabbitmq.password=pass,app.triggertask.spring.rabbitmq.port=5672,app.triggertask.spring.rabbitmq.virtual-host=xxx"

Resources