dropfield transform sink connector: (STRUCT) type doesn't have a mapping to the SQL database column type - jdbc

I created a sink connector from kafka to mysql.
After transform in sink connector's config and deleting some columns I get this error whereas whithout transform it works:
(STRUCT) type doesn't have a mapping to the SQL database column type
{
"name": "mysql-conf-sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "3",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://localhost:8081",
"topics": "mysql.cars.prices",
"transforms": "dropPrefix,unwrap",
"transforms.dropPrefix.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.dropPrefix.regex": "mysql.cars.prices",
"transforms.dropPrefix.replacement": "prices",
"transforms.timestamp.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
"transforms.timestamp.target.type": "Timestamp",
"transforms.timestamp.field": "date_time",
"transforms.timestamp.format": "yyyy-MM-dd HH:mm:ss",
"errors.tolerance": "all",
"errors.log.enable": "true",
"errors.log.include.messages": "true",
"connection.url": "jdbc:mysql://localhost:3306/product",
"connection.user": "kafka",
"connection.password": "123456",
"transforms": "ReplaceField",
"transforms.ReplaceField.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.ReplaceField.blacklist": "id, brand",
"insert.mode": "insert",
"auto.create": "true",
"auto.evolve": "true",
"batch.size": 50000
}
}

You have put "transforms" key more than once in your JSON, which isn't valid.
Try with one entry
"transforms": "unwrap,ReplaceField,dropPrefix",
You are getting the error because you have overrode the value, and unwrap, specifically, is no longer called, so you have nested Structs.
The blacklist property got renamed to exclude, by the way - https://docs.confluent.io/platform/current/connect/transforms/replacefield.html#properties

Related

Kafka connect Jdbc sink connector not auto creating tables

I am using docker images of kafka and kafka connect to test cdc using debezium, and the database is a standalone one
My sink connector config json looks like this,
{
"name": "jdbc-sink-test-oracle",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"dialect.name": "OracleDatabaseDialect",
"table.name.format": "TEST",
"topics": "oracle-db-source.DBZ_SRC.TEST",
"connection.url": "jdbc:oracle:thin:#hostname:1521/DB",
"connection.user": "DBZ_TARGET",
"connection.password": "DBZ_TARGET",
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones": "false",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "true",
"auto.create": "true",
"insert.mode": "upsert",
"delete.enabled": "true",
"pk.fields": "ID",
"pk.mode": "record_key"
}
}
and my source connector config json looks like this,
{
"name": "test-source-connector",
"config": {
"connector.class" : "io.debezium.connector.oracle.OracleConnector",
"tasks.max" : "1",
"database.server.name" : "oracle-db-source",
"database.hostname" : "hostname",
"database.port" : "1521",
"database.user" : "clogminer",
"database.password" : "clogminer",
"database.dbname" : "DB",
"database.oracle.version": "19",
"database.history.kafka.bootstrap.servers" : "kafka:9092",
"database.history.kafka.topic": "schema-changes.DBZ_SRC",
"database.connection.adapter": "logminer",
"table.include.list" : "DBZ_SRC.TEST",
"database.schema": "DBZ_SRC",
"errors.log.enable": "true",
"snapshot.lock.timeout.ms":"5000",
"include.schema.changes": "true",
"snapshot.mode":"initial",
"decimal.handling.mode": "double"
}
}
and I am getting this error for the above configurations,
Error : 942, Position : 11, Sql = merge into "TEST" using (select :1 "ID", :2 "NAME", :3 "DESCRIPTION", :4 "WEIGHT" FROM dual) incoming on("TEST"."ID"=incoming."ID") when matched then update set "TEST"."NAME"=incoming."NAME","TEST"."DESCRIPTION"=incoming."DESCRIPTION","TEST"."WEIGHT"=incoming."WEIGHT" when not matched then insert("TEST"."NAME","TEST"."DESCRIPTION","TEST"."WEIGHT","TEST"."ID") values(incoming."NAME",incoming."DESCRIPTION",incoming."WEIGHT",incoming."ID"), OriginalSql = merge into "TEST" using (select ? "ID", ? "NAME", ? "DESCRIPTION", ? "WEIGHT" FROM dual) incoming on("TEST"."ID"=incoming."ID") when matched then update set "TEST"."NAME"=incoming."NAME","TEST"."DESCRIPTION"=incoming."DESCRIPTION","TEST"."WEIGHT"=incoming."WEIGHT" when not matched then insert("TEST"."NAME","TEST"."DESCRIPTION","TEST"."WEIGHT","TEST"."ID") values(incoming."NAME",incoming."DESCRIPTION",incoming."WEIGHT",incoming."ID"), Error Msg = ORA-00942: table or view does not exist
at io.confluent.connect.jdbc.sink.JdbcSinkTask.getAllMessagesException(JdbcSinkTask.java:150)
at io.confluent.connect.jdbc.sink.JdbcSinkTask.put(JdbcSinkTask.java:102)
... 11 more
2022-07-28 15:01:58,644 ERROR || WorkerSinkTask{id=jdbc-sink-test-oracle-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted [org.apache.kafka.connect.runtime.WorkerTask]
org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:610)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:330)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:232)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:201)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:188)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:237)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.kafka.connect.errors.ConnectException: java.sql.SQLException: Exception chain:
java.sql.BatchUpdateException: ORA-00942: table or view does not exist
java.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist
I believe according to the configurations I gave, the table needs to be auto created but it says table doesnt exist.
But, it is working fine and is auto creating the table named 'TEST2' and is also exporting the data from source to this table for this sink connector configuration
{
"name": "jdbc-sink-test2-oracle",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"dialect.name": "OracleDatabaseDialect",
"table.name.format": "TEST2",
"topics": "oracle-db-source.DBZ_SRC.TEST",
"connection.url": "jdbc:oracle:thin:#hostname:1521/DB",
"connection.user": "DBZ_TARGET",
"connection.password": "DBZ_TARGET",
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones": "false",
"auto.create": "true",
"insert.mode": "upsert",
"delete.enabled": "true",
"pk.fields": "ID",
"pk.mode": "record_key"
}
}
Edit:
The sink connector is working fine if the target table with the same name as the source table is already created with same DDL, but the target table is not getting auto created if it is not present already.

Write same value to 2 different columns with JDBC Sink Connector

There are 2 columns that needs to be same in the table I want to sink on. Lets say columns are named as ID and PAYLOADID. But in the Kafka side, there are no seperate records for these columns. So, how can I configure my sink connector to write to these 2 columns from the same field in Kafka?
This is my connector config:
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"connection.user": "${file:/pass.properties:alm_user}",
"connection.password": "${file:/pass.properties:alm_pwd}",
"connection.url": "jdbc:oracle:thin:#(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=****)(PORT=****))(CONNECT_DATA=(SERVICE_NAME=****)))",
"table.name.format": "SCHEMA_NAME.TABLE_NAME",
"topics": "MY_TOPIC",
"transforms": "TimestampConverter1",
"transforms.TimestampConverter1.target.type": "Timestamp",
"transforms.TimestampConverter1.field": "RECORDDATE",
"transforms.TimestampConverter1.format": "MM.dd.yyyy hh:mm:ss",
"transforms.TimestampConverter1.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
"key.converter.schemas.enable": "false",
"value.converter.schemas.enable": "true",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"auto.create": "false",
"insert.mode": "insert",
"transforms": "rename",
"transforms.rename.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.rename.renames": "payload:PAYLOADID, type:TYPE"
You'd need to write your own transform, or otherwise pre-process the topic, such that you can copy and rename one field to another while keeping the same field-value.

Debezium Kafka Connect DROP Table not working

I have the following setup for implementing CDC using Debezium
Oracle -> Debezium Source Connector -> Kafka -> JDBC Sink Connector -> PostgreSQL
Source Connector config is
{
"name":"myfirst-connector",
"config":{
"connector.class":"io.debezium.connector.oracle.OracleConnector",
"tasks.max":"1",
"database.hostname":"192.168.29.102",
"database.port":"1521",
"database.user":"c##dbzuser",
"database.password":"dbz",
"database.dbname":"ORCLCDB",
"database.pdb.name":"ORCLPDB1",
"database.server.name":"oracle19",
"database.connection.adapter":"logminer",
"database.history.kafka.topic":"schema_changes",
"database.history.kafka.bootstrap.servers":"192.168.29.102:9092",
"database.tablename.case.insensitive":"true",
"snapshot.mode":"initial",
"tombstones.on.delete":"true",
"include.schema.changes": "true",
"sanitize.field.names":"true",
"key.converter":"org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable":"true",
"value.converter":"org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable":"true",
"time.precision.mode": "connect",
"database.oracle.version":19
} }
Sink connector config is
{
"name": "myjdbc-sink-testdebezium",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics.regex": "oracle19.C__DBZUSER.*",
"connection.url": "jdbc:postgresql://192.168.29.102:5432/postgres?user=puser&password=My19cPassword",
"key.converter":"org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable":"true",
"value.converter":"org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable":"true",
"dialect.name": "PostgreSqlDatabaseDialect",
"auto.create": "true",
"auto.evolve": "true",
"insert.mode": "upsert",
"delete.enabled": "true",
"transforms": "unwrap, RemoveString, TimestampConverter",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.delete.handling.mode": "none",
"transforms.RemoveString.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.RemoveString.regex": "(.*)\\.C__DBZUSER\\.(.*)",
"transforms.RemoveString.replacement": "$2",
"transforms.TimestampConverter.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
"transforms.TimestampConverter.target.type": "Timestamp",
"transforms.TimestampConverter.field": "dob",
"pk.mode": "record_key"
}
}
Now when I drop a table in Oracle I get an entry in schema_changes topic but the table is not dropped from PostgreSQL. Need help in figuring out the issue why drop is not getting propogated. Just FYI, all the other operations i.e. Create Table, Alter Table, Insert, Update, Delete are working fine. Only DROP is not working and I am not getting any exception in the logs either.

illegal_argument_exception while using kafka connector to push to elasticsearch

I am trying to push logs from kafka tops to elasticsearch.
My message in kafka:
{
"#timestamp": 1589549688.659166,
"log": "13:34:48.658 [pool-2-thread-1] DEBUG health check success",
"stream": "stdout",
"time": "2020-05-15T13:34:48.659166158Z",
"pod_name": "my-pod-789f8c85f4-mt62l",
"namespace_name": "services",
"pod_id": "600ca012-91f5-XXXX-XXXX-XXXXXXXXXXX",
"host": "ip-192-168-88-59.ap-south-1.compute.internal",
"container_name": "my-pod",
"docker_id": "XXXXXXXXXXXXXXXXX1435bb2870bfc9d20deb2c483ce07f8e71ec",
"container_hash": "myregistry",
"labelpod-template-hash": "9tignfe9r",
"labelsecurity.istio.io/tlsMode": "istio",
"labelservice": "my-pod",
"labelservice.istio.io/canonical-name": "my-pod",
"labelservice.istio.io/canonical-revision": "latest",
"labeltype": "my-pod",
"annotationkubernetes.io/psp": "eks.privileged",
"annotationsidecar.istio.io/status": "{\"version\":\"58dc8b12bb311f1e2f46fd56abfe876ac96a38d7ac3fc6581af3598ccca7522f\"}"
}
This is my connector config:
{
"name": "logs",
"config": {
"connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"connection.url": "http://es:9200",
"connection.username": "username",
"connection.password": "password",
"tasks.max": "10",
"topics": "my-pod",
"name": "logs",
"type.name": "_doc",
"schema.ignore": "true",
"key.ignore": "true",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"transforms": "routeTS",
"transforms.routeTS.type": "org.apache.kafka.connect.transforms.TimestampRouter",
"transforms.routeTS.topic.format": "${topic}-${timestamp}",
"transforms.routeTS.timestamp.format": "YYYYMMDD"
}
}
This is the error i'm getting
cp-kafka-connect-server [2020-05-15 13:30:59,083] WARN Failed to execute batch 4830 of 18 records with attempt 4/6, will attempt retry after 539 ms. Failure reason: Bulk request failed: [{"type":"illegal_argument_exception","reason":"mapper [labelservice] of different type, current_type [text], merged_type [ObjectMapper]"}
I haven't created any mapping beforehand. I'm depending on the connector to create the index.
This is the mapping I have in es which is autocreated.
{
"mapping": {}
}
The error message is clear
reason":"mapper [labelservice] of different type, current_type [text],
merged_type [ObjectMapper]"
It means in your index mapping labelservice is defined as text but you are sending below data in labelservice field:
"labelservice": "my-pod",
"labelservice.istio.io/canonical-name": "my-pod",
"labelservice.istio.io/canonical-revision": "latest",
This is the format of object type in Elasticsearch, now there is a mismatch in the data-type which caused the error message.
You need to change your mapping and define labelservice as object to make it work. Refer object datatype in Elasticsearch for more info.

Timestamp in avro schema produces incompatible value validation in Kafka Connect JDBC

Error produced by JDBC sink connector:
org.apache.kafka.connect.errors.DataException: Invalid Java object for schema type INT64: class java.util.Date for field: "some_timestamp_field"
at org.apache.kafka.connect.data.ConnectSchema.validateValue(ConnectSchema.java:242)
at org.apache.kafka.connect.data.Struct.put(Struct.java:216)
at org.apache.kafka.connect.transforms.Cast.applyWithSchema(Cast.java:151)
at org.apache.kafka.connect.transforms.Cast.apply(Cast.java:107)
at org.apache.kafka.connect.runtime.TransformationChain.apply(TransformationChain.java:38)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:480)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:301)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:205)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:173)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:170)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:214)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
The avro schema registered by source JDBC connector (MySQL):
{
"type":"record",
"name":"ConnectDefault",
"namespace":"io.confluent.connect.avro",
"fields":[
...
{
"name":"some_timestamp_field",
"type":{
"type":"long",
"connect.version":1,
"connect.name":"org.apache.kafka.connect.data.Timestamp",
"logicalType":"timestamp-millis"
}
},
...
]
}
Looks like the exception is due to this code block: https://github.com/apache/kafka/blob/f0282498e7a312a977acb127557520def338d45c/connect/api/src/main/java/org/apache/kafka/connect/data/ConnectSchema.java#L239
So, in the avro schema, the timestamp field is registered as INT64 with correct (timestamp) logical type. But connect reads the schema type as INT64 and compares it with value type java.util.Date.
Is this a bug, or there is a work around for this? May be I am missing something as this looks like a standard connect model.
Thanks in advance.
UPDATE
Sink connector config:
{
"name": "sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "topic",
"connection.url": "jdbc:postgresql://host:port/db",
"connection.user": "user",
"connection.password": "password",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url": "http://host:port",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://host:port",
"auto.create": "true",
"insert.mode": "upsert",
"pk.mode": "record_value",
"pk.fields": "id"
}
}
Deserialised data in Kafka:
{
"id":678148,
"some_timestamp_field":1543806057000,
...
}
We have worked out a work around for the problem. Our goal was to convert the id from BIGINT to STRING(TEXT/VARCHAR) and save the record in downstream db.
But due to an issue (probably https://issues.apache.org/jira/browse/KAFKA-5891), casting the id field was not working. Kafka was trying to validate the timestamp fields also in the casting chain, but was reading the schema type/name wrong and resulting a type mismatch (see the above record body and error log).
So we made a work around as follows:
extract only the id field as key -> execute cast transform on the key -> it works as key does not contain timestamp field.
Here is the worked around configuration:
{
"name": "sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "topic",
"connection.url": "jdbc:postgresql://host:port/db",
"connection.user": "user",
"connection.password": "password",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url": "http://host:port",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://host:port",
"transforms": "createKey,castKeyToString",
"transforms.createKey.type": "org.apache.kafka.connect.transforms.ValueToKey",
"transforms.createKey.fields": "id",
"transforms.castKeyToString.type": "org.apache.kafka.connect.transforms.Cast$Key",
"transforms.castKeyToString.spec": "id:string",
"auto.create": "true",
"insert.mode": "upsert",
"pk.mode": "record_key",
"pk.fields": "id"
}
}
Disclaimer: This is not a proper solution, just a work around. The bug in casting transform should be fixed. In my opinion, the casting transform should only have concerns with the fields designated for casting, not other fields in the message.
Have a good day.

Resources