I have the following setup for implementing CDC using Debezium
Oracle -> Debezium Source Connector -> Kafka -> JDBC Sink Connector -> PostgreSQL
Source Connector config is
{
"name":"myfirst-connector",
"config":{
"connector.class":"io.debezium.connector.oracle.OracleConnector",
"tasks.max":"1",
"database.hostname":"192.168.29.102",
"database.port":"1521",
"database.user":"c##dbzuser",
"database.password":"dbz",
"database.dbname":"ORCLCDB",
"database.pdb.name":"ORCLPDB1",
"database.server.name":"oracle19",
"database.connection.adapter":"logminer",
"database.history.kafka.topic":"schema_changes",
"database.history.kafka.bootstrap.servers":"192.168.29.102:9092",
"database.tablename.case.insensitive":"true",
"snapshot.mode":"initial",
"tombstones.on.delete":"true",
"include.schema.changes": "true",
"sanitize.field.names":"true",
"key.converter":"org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable":"true",
"value.converter":"org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable":"true",
"time.precision.mode": "connect",
"database.oracle.version":19
} }
Sink connector config is
{
"name": "myjdbc-sink-testdebezium",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics.regex": "oracle19.C__DBZUSER.*",
"connection.url": "jdbc:postgresql://192.168.29.102:5432/postgres?user=puser&password=My19cPassword",
"key.converter":"org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable":"true",
"value.converter":"org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable":"true",
"dialect.name": "PostgreSqlDatabaseDialect",
"auto.create": "true",
"auto.evolve": "true",
"insert.mode": "upsert",
"delete.enabled": "true",
"transforms": "unwrap, RemoveString, TimestampConverter",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.delete.handling.mode": "none",
"transforms.RemoveString.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.RemoveString.regex": "(.*)\\.C__DBZUSER\\.(.*)",
"transforms.RemoveString.replacement": "$2",
"transforms.TimestampConverter.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
"transforms.TimestampConverter.target.type": "Timestamp",
"transforms.TimestampConverter.field": "dob",
"pk.mode": "record_key"
}
}
Now when I drop a table in Oracle I get an entry in schema_changes topic but the table is not dropped from PostgreSQL. Need help in figuring out the issue why drop is not getting propogated. Just FYI, all the other operations i.e. Create Table, Alter Table, Insert, Update, Delete are working fine. Only DROP is not working and I am not getting any exception in the logs either.
Related
I created a sink connector from kafka to mysql.
After transform in sink connector's config and deleting some columns I get this error whereas whithout transform it works:
(STRUCT) type doesn't have a mapping to the SQL database column type
{
"name": "mysql-conf-sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "3",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://localhost:8081",
"topics": "mysql.cars.prices",
"transforms": "dropPrefix,unwrap",
"transforms.dropPrefix.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.dropPrefix.regex": "mysql.cars.prices",
"transforms.dropPrefix.replacement": "prices",
"transforms.timestamp.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
"transforms.timestamp.target.type": "Timestamp",
"transforms.timestamp.field": "date_time",
"transforms.timestamp.format": "yyyy-MM-dd HH:mm:ss",
"errors.tolerance": "all",
"errors.log.enable": "true",
"errors.log.include.messages": "true",
"connection.url": "jdbc:mysql://localhost:3306/product",
"connection.user": "kafka",
"connection.password": "123456",
"transforms": "ReplaceField",
"transforms.ReplaceField.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.ReplaceField.blacklist": "id, brand",
"insert.mode": "insert",
"auto.create": "true",
"auto.evolve": "true",
"batch.size": 50000
}
}
You have put "transforms" key more than once in your JSON, which isn't valid.
Try with one entry
"transforms": "unwrap,ReplaceField,dropPrefix",
You are getting the error because you have overrode the value, and unwrap, specifically, is no longer called, so you have nested Structs.
The blacklist property got renamed to exclude, by the way - https://docs.confluent.io/platform/current/connect/transforms/replacefield.html#properties
I am using docker images of kafka and kafka connect to test cdc using debezium, and the database is a standalone one
My sink connector config json looks like this,
{
"name": "jdbc-sink-test-oracle",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"dialect.name": "OracleDatabaseDialect",
"table.name.format": "TEST",
"topics": "oracle-db-source.DBZ_SRC.TEST",
"connection.url": "jdbc:oracle:thin:#hostname:1521/DB",
"connection.user": "DBZ_TARGET",
"connection.password": "DBZ_TARGET",
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones": "false",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "true",
"auto.create": "true",
"insert.mode": "upsert",
"delete.enabled": "true",
"pk.fields": "ID",
"pk.mode": "record_key"
}
}
and my source connector config json looks like this,
{
"name": "test-source-connector",
"config": {
"connector.class" : "io.debezium.connector.oracle.OracleConnector",
"tasks.max" : "1",
"database.server.name" : "oracle-db-source",
"database.hostname" : "hostname",
"database.port" : "1521",
"database.user" : "clogminer",
"database.password" : "clogminer",
"database.dbname" : "DB",
"database.oracle.version": "19",
"database.history.kafka.bootstrap.servers" : "kafka:9092",
"database.history.kafka.topic": "schema-changes.DBZ_SRC",
"database.connection.adapter": "logminer",
"table.include.list" : "DBZ_SRC.TEST",
"database.schema": "DBZ_SRC",
"errors.log.enable": "true",
"snapshot.lock.timeout.ms":"5000",
"include.schema.changes": "true",
"snapshot.mode":"initial",
"decimal.handling.mode": "double"
}
}
and I am getting this error for the above configurations,
Error : 942, Position : 11, Sql = merge into "TEST" using (select :1 "ID", :2 "NAME", :3 "DESCRIPTION", :4 "WEIGHT" FROM dual) incoming on("TEST"."ID"=incoming."ID") when matched then update set "TEST"."NAME"=incoming."NAME","TEST"."DESCRIPTION"=incoming."DESCRIPTION","TEST"."WEIGHT"=incoming."WEIGHT" when not matched then insert("TEST"."NAME","TEST"."DESCRIPTION","TEST"."WEIGHT","TEST"."ID") values(incoming."NAME",incoming."DESCRIPTION",incoming."WEIGHT",incoming."ID"), OriginalSql = merge into "TEST" using (select ? "ID", ? "NAME", ? "DESCRIPTION", ? "WEIGHT" FROM dual) incoming on("TEST"."ID"=incoming."ID") when matched then update set "TEST"."NAME"=incoming."NAME","TEST"."DESCRIPTION"=incoming."DESCRIPTION","TEST"."WEIGHT"=incoming."WEIGHT" when not matched then insert("TEST"."NAME","TEST"."DESCRIPTION","TEST"."WEIGHT","TEST"."ID") values(incoming."NAME",incoming."DESCRIPTION",incoming."WEIGHT",incoming."ID"), Error Msg = ORA-00942: table or view does not exist
at io.confluent.connect.jdbc.sink.JdbcSinkTask.getAllMessagesException(JdbcSinkTask.java:150)
at io.confluent.connect.jdbc.sink.JdbcSinkTask.put(JdbcSinkTask.java:102)
... 11 more
2022-07-28 15:01:58,644 ERROR || WorkerSinkTask{id=jdbc-sink-test-oracle-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted [org.apache.kafka.connect.runtime.WorkerTask]
org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:610)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:330)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:232)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:201)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:188)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:237)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.kafka.connect.errors.ConnectException: java.sql.SQLException: Exception chain:
java.sql.BatchUpdateException: ORA-00942: table or view does not exist
java.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist
I believe according to the configurations I gave, the table needs to be auto created but it says table doesnt exist.
But, it is working fine and is auto creating the table named 'TEST2' and is also exporting the data from source to this table for this sink connector configuration
{
"name": "jdbc-sink-test2-oracle",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"dialect.name": "OracleDatabaseDialect",
"table.name.format": "TEST2",
"topics": "oracle-db-source.DBZ_SRC.TEST",
"connection.url": "jdbc:oracle:thin:#hostname:1521/DB",
"connection.user": "DBZ_TARGET",
"connection.password": "DBZ_TARGET",
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones": "false",
"auto.create": "true",
"insert.mode": "upsert",
"delete.enabled": "true",
"pk.fields": "ID",
"pk.mode": "record_key"
}
}
Edit:
The sink connector is working fine if the target table with the same name as the source table is already created with same DDL, but the target table is not getting auto created if it is not present already.
There are 2 columns that needs to be same in the table I want to sink on. Lets say columns are named as ID and PAYLOADID. But in the Kafka side, there are no seperate records for these columns. So, how can I configure my sink connector to write to these 2 columns from the same field in Kafka?
This is my connector config:
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"connection.user": "${file:/pass.properties:alm_user}",
"connection.password": "${file:/pass.properties:alm_pwd}",
"connection.url": "jdbc:oracle:thin:#(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=****)(PORT=****))(CONNECT_DATA=(SERVICE_NAME=****)))",
"table.name.format": "SCHEMA_NAME.TABLE_NAME",
"topics": "MY_TOPIC",
"transforms": "TimestampConverter1",
"transforms.TimestampConverter1.target.type": "Timestamp",
"transforms.TimestampConverter1.field": "RECORDDATE",
"transforms.TimestampConverter1.format": "MM.dd.yyyy hh:mm:ss",
"transforms.TimestampConverter1.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
"key.converter.schemas.enable": "false",
"value.converter.schemas.enable": "true",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"auto.create": "false",
"insert.mode": "insert",
"transforms": "rename",
"transforms.rename.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.rename.renames": "payload:PAYLOADID, type:TYPE"
You'd need to write your own transform, or otherwise pre-process the topic, such that you can copy and rename one field to another while keeping the same field-value.
I've used Debezium for Mysql -> Elasticsearch CDC.
Now, the issue is that when I delete data from MySQL, it still reappears in Elasticsearch, even if data is no longer present in MySQL DB. UPDATE and INSERT works fine, but DELETE isn't.
Also, I did the following:
Delete data in MySQL
Delete Elasticsearch Index and ES Kafka Sink
Create a new connector for ES in Kakfa
Now, the weird part is that all of my deleted data reappers here as well! When I check ES data before step (3), data wasn't there. But afterwards, this behaviour is observed.
Please help me fix this issue!
MySQL config :
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"database.allowPublicKeyRetrieval": "true",
"database.user": "cdc-reader",
"tasks.max": "1",
"database.history.kafka.bootstrap.servers": "X.X.X.X:9092",
"database.history.kafka.topic": "schema-changes.mysql",
"database.server.name": "data_test",
"schema.include.list": "data_test",
"database.port": "3306",
"tombstones.on.delete": "true",
"delete.enabled": "true",
"database.hostname": "X.X.X.X",
"database.password": "xxxxx",
"name": "slave_test",
"database.history.skip.unparseable.ddl": "true",
"table.include.list": "search_ai.*"
},
Elasticsearch config:
"config": {
"connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"type.name": "_doc",
"behavior.on.null.values": "delete",
"transforms.extractKey.field": "ID",
"tasks.max": "1",
"topics": "search_ai.search_ai.slave_data",
"transforms.InsertKey.fields": "ID",
"transforms": "unwrap,key,InsertKey,extractKey",
"key.ignore": "false",
"transforms.extractKey.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.key.field": "ID",
"transforms.key.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"name": "esd_2",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"connection.url": "http://X.X.X.X:9200",
"transforms.InsertKey.type": "org.apache.kafka.connect.transforms.ValueToKey"
},
Debezium is reading the transaction log, not the source table, so the inserts and updates are always going to be read first, causing inserts and doc updates in Elasticsearch...
Secondly, did you create the sink connector with a new name or different one?
If the same one, the original consumer group offsets would not have changed, causing the consumer group to pickup at the offsets before you deleted the original connector
if a new name, and depending on the auto.offset.reset value of the sink connector consumer, you could be consuming the Debezium topic from the beginning, and causing data to get re-inserted into Elasticsearch, as mentioned. You need to check if your Mysql delete events are actually getting produced/consumed as tombstone values to cause deletes in Elasticsearch
I am using the database source connector to move data from my Postgres database table to Kafka topic. I have an orders table having a foreign key with customers table using customerNumber field.
Below is the connector which is copying the orders to Kafka but without customers data into JSON. I am looking how could I construct the complete object of orders with customers into JSON.
and the connector is :
{
"name": "SOURCE_CONNECTOR",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"transforms.createKey.type": "org.apache.kafka.connect.transforms.ValueToKey",
"connection.password": "postgres_pwd",
"transforms.cast.type": "org.apache.kafka.connect.transforms.Cast$Value",
"transforms.cast.spec": "amount:float64",
"tasks.max": "1",
"transforms": "cast,createKey,extractInt",
"transforms.extractInt.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"batch.max.rows": "25",
"table.whitelist": "orders",
"mode": "bulk",
"topic.prefix": "data_",
"transforms.extractInt.field": "uuid",
"connection.user": "postgres_user",
"transforms.createKey.fields": "uuid",
"poll.interval.ms": "3600000",
"sql.quote.identifiers": "false",
"name": "SOURCE_CONNECTOR",
"numeric.mapping": "best_fit",
"connection.url": "url"
}
}
You can use Query-based ingest. Just specify query config option.