Apache Kafka JDBC Connector - SerializationException: Unknown magic byte - jdbc

We are trying to write back the values from a topic to a postgres database using the Confluent JDBC Sink Connector.
connector.class=io.confluent.connect.jdbc.JdbcSinkConnector
connection.password=xxx
tasks.max=1
topics=topic_name
auto.evolve=true
connection.user=confluent_rw
auto.create=true
connection.url=jdbc:postgresql://x.x.x.x:5432/Datawarehouse
value.converter=io.confluent.connect.avro.AvroConverter
value.converter.schema.registry.url=http://localhost:8081
key.converter=io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url=http://localhost:8081
We can read the value in the console using:
kafka-avro-console-consumer --bootstrap-server localhost:9092 --topic topic_name
The schema exists and the value are correctly deserialize by kafka-avro-console-consumer because it gives no error but the connector gives those errors:
{
"name": "datawarehouse_sink",
"connector": {
"state": "RUNNING",
"worker_id": "x.x.x.x:8083"
},
"tasks": [
{
"id": 0,
"state": "FAILED",
"worker_id": "x.x.x.x:8083",
"trace": "org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler\n\tat org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:178)\n\tat org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:511)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:491)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:322)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:226)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:194)\n\tat org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)\n\tat org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:748)\nCaused by: org.apache.kafka.connect.errors.DataException: f_machinestate_sink\n\tat io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:103)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$0(WorkerSinkTask.java:511)\n\tat org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)\n\tat org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)\n\t... 13 more\nCaused by: org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id -1\nCaused by: org.apache.kafka.common.errors.SerializationException: Unknown magic byte!\n"
}
],
"type": "sink"
}
The final error is :
org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
The schema is registered in the schema registry.
Does the problem sit with the configuration file of the connector ?

The error org.apache.kafka.common.errors.SerializationException: Unknown magic byte! means that a message on the topic was not valid Avro and could not be deserialised. There are several reasons this could be:
Some messages are Avro, but others are not. If this is the case you can use the error handling capabilities in Kafka Connect to ignore the invalid messages using config like this:
"errors.tolerance": "all",
"errors.log.enable":true,
"errors.log.include.messages":true
The value is Avro but the key isn't. If this is the case then use the appropriate key.converter.
More reading: https://www.confluent.io/blog/kafka-connect-deep-dive-converters-serialization-explained/

This means that deserializer have checked the first 5 bytes of the message and found something unexpected. More about message packing with serializer here , check 'wire format' section. Just a guess that zero byte in message !=0

Related

kafka connect task support multi format messages

we are exploring fluentd, kafka and elasticsearch integration for our central logging platform.
at present we have individual topic for each deployments. for example if we have deployments customerA, customerB, customerC then we have separate topic on Kafka cluster for these deployments customerA, customerB, customerC .
as long as we push json logs to these topic our Elasticsearch sink connectors work fine but for log format other json tends to break connector task with error :
[2021-08-01 17:03:27,338] ERROR WorkerSinkTask{id=central-logs-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask)
org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:206)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:132)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:495)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:475)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:325)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:229)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:201)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:189)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:239)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.kafka.connect.errors.DataException: Converting byte[] to Kafka Connect data failed due to serialization error:
at org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:366)
at org.apache.kafka.connect.storage.Converter.toConnectData(Converter.java:87)
at org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$0(WorkerSinkTask.java:495)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:156)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:190)
... 13 more
Caused by: org.apache.kafka.common.errors.SerializationException: com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'status': was expecting (JSON String, Number, Array, Object or token 'null', 'true' or 'false')
at [Source: (byte[])"status-task-central-logs-3"; line: 1, column: 8]
Caused by: com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'status': was expecting (JSON String, Number, Array, Object or token 'null', 'true' or 'false')
at [Source: (byte[])"status-task-central-logs-3"; line: 1, column: 8]
at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1840)
at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:722)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidToken(UTF8StreamJsonParser.java:3560)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._handleUnexpectedValue(UTF8StreamJsonParser.java:2655)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._nextTokenNotInObject(UTF8StreamJsonParser.java:857)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:754)
at com.fasterxml.jackson.databind.ObjectMapper._readTreeAndClose(ObjectMapper.java:4247)
at com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:2734)
at org.apache.kafka.connect.json.JsonDeserializer.deserialize(JsonDeserializer.java:64)
at org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:364)
at org.apache.kafka.connect.storage.Converter.toConnectData(Converter.java:87)
at org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$0(WorkerSinkTask.java:495)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:156)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:190)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:132)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:495)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:475)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:325)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:229)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:201)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:189)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:239)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
and our Elasticsearch sink connector config is :
{
"name":"central-logs",
"config":{
"connector.class":"io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"type.name":"_doc",
"transforms.AddSuffix.topic.format":"${topic}-${timestamp}",
"tasks.max":"16",
"max.retries":"30",
"retry.backoff.ms":"10000",
"topics.regex": "kafka-(.*)",
"transforms.ConvertTimestamp.format":"yyyy-MM-dd HH:mm:ss",
"transforms":"AddSuffix,InsertTimestamp,ConvertTimestamp",
"transforms.ConvertTimestamp.type":"org.apache.kafka.connect.transforms.TimestampConverter$Value",
"key.ignore":"true",
"schema.ignore":"true",
"transforms.AddSuffix.timestamp.format":"yyyy.MM.dd",
"key.converter.schemas.enable":"false",
"transforms.AddSuffix.type":"org.apache.kafka.connect.transforms.TimestampRouter",
"value.converter.schemas.enable":"false",
"transforms.InsertTimestamp.type":"org.apache.kafka.connect.transforms.InsertField$Value",
"name":"central-logs",
"connection.url":"https://es",
"value.converter":"org.apache.kafka.connect.json.JsonConverter",
"transforms.InsertTimestamp.timestamp.field":"#timestamp",
"transforms.ConvertTimestamp.target.type":"string",
"read.timeout.ms":"30000",
"key.converter":"org.apache.kafka.connect.json.JsonConverter",
"transforms.ConvertTimestamp.field":"#timestamp"
}
}
is there a way to avoid such connector/task failures?
You have set the connector's converter to expect JSON:
"value.converter":"org.apache.kafka.connect.json.JsonConverter"
So any message in these topics which is not a valid JSON will fail when the JSON parser processes it. Is this parsing actually required? Looks like the ElasticSearch sink connector example does not do this.
Try removing this configuration.

Kafka HDFS Sink Connector error.[Top level type must be STRUCT..]

I'm testing the 2.7 version of Kafka with Kafka connect,
and i'm facing with problem that i don't understand.
I started distributed connector first with configuration like below.
bootstrap.servers=..:9092,...:9092, ...
group.id=kafka-connect-test
key.converter=org.apache.kafka.connect.converters.ByteArrayConverter
value.converter=org.apache.kafka.connect.converters.ByteArrayConverter
... some internal topic configuration
plugin.path=<plugin path>
This connector serviced with 8083 port.
And I want to write ORC format data with snappy codec at HDFS.
so i made new Kafka HDFS connector with REST API with json data like below.
and i don't use schema-registry.
curl -X POST <connector url:8083> \
-H Accept: application/json \
-H Content-Type: application/json \
-d
{
"name": "hdfs-sinkconnect-test",
"config": {
"connector.class": "io.confluent.connect.hdfs.HdfsSinkConnector",
"store.url": "hdfs:~",
"hadoop.conf.dir": "<my hadoop.conf dir>",
"hadoop.home": "<hadoop home dir>",
"tasks.max": "5",
"key.deserializer": "org.apache.kafka.common.serialization.StringDeserializer",
"value.deserializer": "org.apache.kafka.common.serialization.ByteArrayDeserializer",
"format.class": "io.confluent.connect.hdfs.orc.OrcFormat",
"flush.size": 1000,
"avro.codec": "snappy",
"topics": "<topic name>",
"topics.dir": "/tmp/connect-logs",
"partitioner.class": "io.confluent.connect.storage.partitioner.TimeBasedPartitioner",
"locale": "ko_KR",
"timezone": "Asia/Seoul",
"partition.duration.ms": "3600000",
"path.format": "'hour'=YYYYMMddHH/"
}
}
Then i have error message like this.
# connectDistributed.out
[2021-06-28 17:14:11,596] ERROR Exception on topic partition <topic name>-<partition number>: (io.confluent.connect.hdfs.TopicPartitionWriter:409)
org.apache.kafka.connect.errors.ConnectException: Top level type must be STRUCT but was bytes
at io.confluent.connect.hdfs.orc.OrcRecordWriterProvider$1.write(OrcRecordWriterProvider.java:98)
at io.confluent.connect.hdfs.TopicPartitionWriter.writeRecord(TopicPartitionWriter.java:742)
at io.confluent.connect.hdfs.TopicPartitionWriter.write(TopicPartitionWriter.java:385)
at io.confluent.connect.hdfs.DataWriter.write(DataWriter.java:333)
at io.confluent.connect.hdfs.HdfsSinkTask.put(HdfsSinkTask.java:126)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:586)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:329)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:232)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:201)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:185)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:234)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I think this error message have something to do with Schema Information.
Is Schema Registry essential for Kafka Connector?
Any ideas or solutions to solve this error messsage? Thanks.
Writing ORC files requires a Struct type.
The options provided by Confluent include plain JSON, JSONSchema, Avro, or Protobuf. The only option that doesn't require the Registry is the plain JsonConverter
Note that key.deserializer and value.deserializer are not valid Connect properties. You need to refer to your key.converter and value.converter properties instead
If you're not willing to modify the converter, you can attempt to use a HoistField transformer to create a Struct, and this will create an ORC file with a schema of only one field

Camel aws-s3 Source Connector Error - How should the config be changed

I am working on defining a Camel S3 Source connector with our Confluent (5.5.1) installation. After creating the connector and checking status as "RUNNING", I upload a file to my S3 bucket. Even if I do ls for the bucket, it is empty, which indicates the file is processed and deleted. But, I do not see messages in the topic. I am basically following this example trying a simple 4 line file, but instead of standalone kafka, doing it on a confluent cluster.
This is my configuration
{
"name": "CamelAWSS3SourceConnector",
"connector.class": "org.apache.camel.kafkaconnector.awss3.CamelAwss3SourceConnector",
"bootstrap.servers": "broker1-dev:9092,broker2-dev:9092,broker3-dev:9092",
"sasl.jaas.config": "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"user\" password=\"password\";",
"security.protocol": "SASL_SSL",
"ssl.truststore.location": "/config/client.truststore.jks",
"ssl.truststore.password": "password",
"ssl.keystore.location": "/config/client.keystore.jks",
"ssl.keystore.password": "password",
"ssl.key.password": "password",
"errors.log.enable": "true",
"errors.log.include.messages": "true",
"errors.tolerance": "all",
"offset.flush.timeout.ms": "60000",
"offset.flush.interval.ms": "10000",
"max.request.size": "10485760",
"flush.size": "1",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.camel.kafkaconnector.awss3.converters.S3ObjectConverter",
"camel.source.maxPollDuration": "10000",
"topics": "TEST-CAMEL-S3-SOURCE-POC",
"camel.source.path.bucketNameOrArn": "arn:aws:s3:::my-bucket",
"camel.component.aws-s3.region": "US_EAST_1",
"tasks.max": "1",
"camel.source.endpoint.useIAMCredentials": "true",
"camel.source.endpoint.autocloseBody": "true"
}
And I see these errors in the logs
[2020-12-23 09:05:01,876] ERROR WorkerSourceTask{id=CamelAWSS3SourceConnector-0} Failed to flush, timed out while waiting for producer to flush outstanding 1 messages (org.apache.kafka.connect.runtime.WorkerSourceTask:448)
[2020-12-23 09:05:01,876] ERROR WorkerSourceTask{id=CamelAWSS3SourceConnector-0} Failed to commit offsets (org.apache.kafka.connect.runtime.SourceTaskOffsetCommitter:116)
[2020-12-23 09:20:58,685] DEBUG [Worker clientId=connect-1, groupId=connect-cluster] Received successful Heartbeat response (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:1045)
[2020-12-23 09:20:58,688] DEBUG WorkerSourceTask{id=CamelAWSS3SourceConnector-0} Committing offsets (org.apache.kafka.connect.runtime.SourceTaskOffsetCommitter:111)
[2020-12-23 09:20:58,688] INFO WorkerSourceTask{id=CamelAWSS3SourceConnector-0} Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask:426)
[2020-12-23 09:20:58,688] INFO WorkerSourceTask{id=CamelAWSS3SourceConnector-0} flushing 1 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask:443)
And if I do a curl request for the status of the connector, I get this error for the status
trace: org.apache.kafka.connect.errors.ConnectException: OffsetStorageWriter is already flushing
at org.apache.kafka.connect.storage.OffsetStorageWriter.beginFlush(OffsetStorageWriter.java:111)
at org.apache.kafka.connect.runtime.WorkerSourceTask.commitOffsets(WorkerSourceTask.java:438)
at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:257)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:184)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:234)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I saw below solution in a couple of links, but that also didn't help. It suggested to add below keys to the config
"offset.flush.timeout.ms": "60000",
"offset.flush.interval.ms": "10000",
"max.request.size": "10485760",
Thank you
UPDATE
I cut the config to minimal, but still get the same error
{
"connector.class": "org.apache.camel.kafkaconnector.awss3.CamelAwss3SourceConnector",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.camel.kafkaconnector.awss3.converters.S3ObjectConverter",
"camel.source.maxPollDuration": "10000",
"topics": "TEST-S3-SOURCE-MINIMAL-POC",
"camel.source.path.bucketNameOrArn": "pruvpcaws003-np-use1-push-json-poc",
"camel.component.aws-s3.region": "US_EAST_1",
"tasks.max": "1",
"camel.source.endpoint.useIAMCredentials": "true",
"camel.source.endpoint.autocloseBody": "true"
}
Still get the same error
trace: org.apache.kafka.connect.errors.ConnectException: OffsetStorageWriter is already flushing
at org.apache.kafka.connect.storage.OffsetStorageWriter.beginFlush(OffsetStorageWriter.java:111)
at org.apache.kafka.connect.runtime.WorkerSourceTask.commitOffsets(WorkerSourceTask.java:438)
at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:257)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:184)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:234)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Not sure where else should I look too find the root cause

bulk data update from MSSQL vis kafka connector to elasticsearch with nested types is failing

MSSQL value : column name=prop --> value= 100 and column name=role --> value= [{"role":"actor"},{"role":"director"}]
NOTE: the column:role is saved in json format.
read from kafka topic :
{
"schema":{
"type":"struct",
"fields":[
{
"type":"int32",
"optional":false,
"field":"prop"
},
{
"type":"string",
"optional":true,
"field":"roles"
}
],
"optional":false
},
"payload":{ "prop":100, "roles":"[{"role":"actor"},{"role":"director"}]"}
failing with the reason :
Error was [{"type":"mapper_parsing_exception","reason":"object mapping for [roles] tried to parse field [roles] as object, but found a concrete value"}
Reason for failing is that the connector is not able to create schema as array for roles
The above input message is created by confluent JdbcSourceConnector and the sink connector used is confluent ElasticsearchSinkConnector
Configuration details :
sink config:
name=prop-test
connector.class=io.confluent.connect.elasticsearch.ElasticsearchSinkConnector
connection.url=<elasticseach url>
tasks.max=1
topics=test_prop
type.name=prop
#transforms=InsertKey, ExtractId
transforms.InsertKey.type=org.apache.kafka.connect.transforms.ValueToKey
transforms.InsertKey.fields=prop
transforms.ExtractId.type=org.apache.kafka.connect.transforms.ExtractField$Key
transforms.ExtractId.field=prop
Source config:
name=test_prop_source
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
tasks.max=1
connection.url=jdbc:sqlserver://*.*.*.*:1433;instance=databaseName=test;
connection.user=*****
connection.password=*****
query=EXEC <store proc>
mode=bulk
batch.max.rows=2000000
topic.prefix=test_prop
transforms=createKey,extractInt
transforms.createKey.type=org.apache.kafka.connect.transforms.ValueToKey
transforms.createKey.fields=prop
transforms.extractInt.type=org.apache.kafka.connect.transforms.ExtractField$Key
transforms.extractInt.field=prop
connect-standalone.properties :
bootstrap.servers=localhost:9092
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=true
value.converter.schemas.enable=true
need to understand how explicitly i can make the schema as ARRAY for roles and not a string.
The fact that the jdbc Source connector will always sees columnname as fieldname and columnvalue as value. This conversion from string to array is not possible with the existing jdbc source connector support, it has to be a custom transformation or custom plugin to enable it.
The best option available for getting the data transferred from the MSSQL and getting it inserted into the Elastic Search is by using logstash. It has rich filter plugins which can enable the data from MSSQL to flow in any desired format and to any desired JSON output environment ( logstash/kafka topic )
Flow : MSSQL --> logstash --> Kakfa Topic --> Kafka Elastic sink connector --> Elastic Search

Debezium CDC connector says no ocijdbc11 in java.library.path

I am trying to deploy debezium CDC connector to capture data from Oracle DB. We have prepared the Oracle DB and Xstream connector.
I am getting following exception:
{
"name": "test-debezium-2",
"connector": {
"state": "RUNNING",
"worker_id": "10.230.24.80:8084"
},
"tasks": [
{
"id": 0,
"state": "FAILED",
"worker_id": "10.230.24.80:8084",
"trace": "org.apache.kafka.connect.errors.ConnectException: An exception ocurred in the change event producer. This connector will be stopped.\n\tat
io.debezium.connector.base.ChangeEventQueue.throwProducerFailureIfPresent(ChangeEventQueue.java:170)\n\tat
io.debezium.connector.base.ChangeEventQueue.poll(ChangeEventQueue.java:151)\n\tat
io.debezium.connector.oracle.OracleConnectorTask.poll(OracleConnectorTask.java:110)\n\tat
org.apache.kafka.connect.runtime.WorkerSourceTask.poll(WorkerSourceTask.java:259)\n\tat
org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:226)\n\tat
org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:177)\n\tat
org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:227)\n\tat
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat
java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat
java.base/java.lang.Thread.run(Thread.java:834)\n
Caused by: java.lang.UnsatisfiedLinkError: no ocijdbc11 in java.library.path: [/usr/java/packages/lib, /usr/lib64, /lib64, /lib, /usr/lib]\n\tat
java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2660)\n\tat
java.base/java.lang.Runtime.loadLibrary0(Runtime.java:829)\n\tat
java.base/java.lang.System.loadLibrary(System.java:1867)\n\tat
oracle.jdbc.driver.T2CConnection$1.run(T2CConnection.java:3541)\n\tat
java.base/java.security.AccessController.doPrivileged(Native Method)\n\tat
oracle.jdbc.driver.T2CConnection.loadNativeLibrary(T2CConnection.java:3537)\n\tat
oracle.jdbc.driver.T2CConnection.logon(T2CConnection.java:269)\n\tat
oracle.jdbc.driver.PhysicalConnection.<init>(PhysicalConnection.java:553)\n\tat
oracle.jdbc.driver.T2CConnection.<init>(T2CConnection.java:165)\n\tat
oracle.jdbc.driver.T2CDriverExtension.getConnection(T2CDriverExtension.java:53)\n\tat
oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:528)\n\tat
java.sql/java.sql.DriverManager.getConnection(DriverManager.java:677)\n\tat
java.sql/java.sql.DriverManager.getConnection(DriverManager.java:228)\n\tat
io.debezium.connector.oracle.OracleConnectionFactory.connect(OracleConnectionFactory.java:25)\n\tat
io.debezium.jdbc.JdbcConnection.connection(JdbcConnection.java:756)\n\tat
io.debezium.jdbc.JdbcConnection.connection(JdbcConnection.java:751)\n\tat
io.debezium.connector.oracle.OracleConnection.setSessionToPdb(OracleConnection.java:46)\n\tat
io.debezium.connector.oracle.OracleSnapshotChangeEventSource.prepare(OracleSnapshotChangeEventSource.java:70)\n\tat
io.debezium.relational.RelationalSnapshotChangeEventSource.execute(RelationalSnapshotChangeEventSource.java:104)\n\tat
io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:83)\n\t... 5 more\n"
}
],
"type": "source"
}
I have verified the /usr/lib folder has the ocijdbc11.dll file. I have tried copying the file to /lib as well but I am getting the same error.
I encountered this problem.
First go to oracle download 'instantclient'
instantclient download
unzip------Believe you have copied the ojdbc.jar and xtream.jar to kafka lib.
Then Adding environment variables
vim /etc/profile
add:
exprot LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/you download file addr/libocijdbc11.so
save----wq!
ohhhhhh, remember --- resoure /etc/profile
run again

Resources