How connector-source works with "query" and "mode": timestamp+incrementing - jdbc

Can someone explain to me how the connector source works using "query" and the "mode": timestamp+incrementing?
Because it works perfectly when the query is small but for many records it becomes impossible.
From what I see, it re-runs the entire query over and over again.
This is my source connector:
{
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"tasks.max": "1",
"connection.url": "jdbc:informix-sqli://ip:port/sis:informixserver=mibase",
"connection.user":"informix",
"connection.password":"pass",
"query": "SELECT * FROM my_vw",
"topic.prefix": "novedades",
"db.timezone": "America/Argentina/Buenos_Aires",
"dialect.name": "GenericDatabaseDialect",
"timestamp.granularity": "connect_logical",
"poll.interval.ms": "10000",
"mode":"timestamp+incrementing",
"schema.pattern": "informix",
"timestamp.column.name": "last_date",
"incrementing.column.name": "id",
"validate.non.null": false,
"numeric.mapping":"best_fit",
"transforms": "copyFieldToKey,extractKeyFromStruct,removeKeyFromValue",
"transforms.copyFieldToKey.type": "org.apache.kafka.connect.transforms.ValueToKey",
"transforms.copyFieldToKey.fields": "id",
"transforms.extractKeyFromStruct.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.extractKeyFromStruct.field": "id",
"transforms.removeKeyFromValue.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.removeKeyFromValue.blacklist": "id",
"key.converter" : "org.apache.kafka.connect.converters.LongConverter"
}
The filters work well because it only brings me the changes and/or new records, but apparently it reruns the entire query and if it has millions of records, that total is read over and over again.

Related

dropfield transform sink connector: (STRUCT) type doesn't have a mapping to the SQL database column type

I created a sink connector from kafka to mysql.
After transform in sink connector's config and deleting some columns I get this error whereas whithout transform it works:
(STRUCT) type doesn't have a mapping to the SQL database column type
{
"name": "mysql-conf-sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "3",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://localhost:8081",
"topics": "mysql.cars.prices",
"transforms": "dropPrefix,unwrap",
"transforms.dropPrefix.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.dropPrefix.regex": "mysql.cars.prices",
"transforms.dropPrefix.replacement": "prices",
"transforms.timestamp.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
"transforms.timestamp.target.type": "Timestamp",
"transforms.timestamp.field": "date_time",
"transforms.timestamp.format": "yyyy-MM-dd HH:mm:ss",
"errors.tolerance": "all",
"errors.log.enable": "true",
"errors.log.include.messages": "true",
"connection.url": "jdbc:mysql://localhost:3306/product",
"connection.user": "kafka",
"connection.password": "123456",
"transforms": "ReplaceField",
"transforms.ReplaceField.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.ReplaceField.blacklist": "id, brand",
"insert.mode": "insert",
"auto.create": "true",
"auto.evolve": "true",
"batch.size": 50000
}
}
You have put "transforms" key more than once in your JSON, which isn't valid.
Try with one entry
"transforms": "unwrap,ReplaceField,dropPrefix",
You are getting the error because you have overrode the value, and unwrap, specifically, is no longer called, so you have nested Structs.
The blacklist property got renamed to exclude, by the way - https://docs.confluent.io/platform/current/connect/transforms/replacefield.html#properties

Kafka connect Jdbc sink connector not auto creating tables

I am using docker images of kafka and kafka connect to test cdc using debezium, and the database is a standalone one
My sink connector config json looks like this,
{
"name": "jdbc-sink-test-oracle",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"dialect.name": "OracleDatabaseDialect",
"table.name.format": "TEST",
"topics": "oracle-db-source.DBZ_SRC.TEST",
"connection.url": "jdbc:oracle:thin:#hostname:1521/DB",
"connection.user": "DBZ_TARGET",
"connection.password": "DBZ_TARGET",
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones": "false",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "true",
"auto.create": "true",
"insert.mode": "upsert",
"delete.enabled": "true",
"pk.fields": "ID",
"pk.mode": "record_key"
}
}
and my source connector config json looks like this,
{
"name": "test-source-connector",
"config": {
"connector.class" : "io.debezium.connector.oracle.OracleConnector",
"tasks.max" : "1",
"database.server.name" : "oracle-db-source",
"database.hostname" : "hostname",
"database.port" : "1521",
"database.user" : "clogminer",
"database.password" : "clogminer",
"database.dbname" : "DB",
"database.oracle.version": "19",
"database.history.kafka.bootstrap.servers" : "kafka:9092",
"database.history.kafka.topic": "schema-changes.DBZ_SRC",
"database.connection.adapter": "logminer",
"table.include.list" : "DBZ_SRC.TEST",
"database.schema": "DBZ_SRC",
"errors.log.enable": "true",
"snapshot.lock.timeout.ms":"5000",
"include.schema.changes": "true",
"snapshot.mode":"initial",
"decimal.handling.mode": "double"
}
}
and I am getting this error for the above configurations,
Error : 942, Position : 11, Sql = merge into "TEST" using (select :1 "ID", :2 "NAME", :3 "DESCRIPTION", :4 "WEIGHT" FROM dual) incoming on("TEST"."ID"=incoming."ID") when matched then update set "TEST"."NAME"=incoming."NAME","TEST"."DESCRIPTION"=incoming."DESCRIPTION","TEST"."WEIGHT"=incoming."WEIGHT" when not matched then insert("TEST"."NAME","TEST"."DESCRIPTION","TEST"."WEIGHT","TEST"."ID") values(incoming."NAME",incoming."DESCRIPTION",incoming."WEIGHT",incoming."ID"), OriginalSql = merge into "TEST" using (select ? "ID", ? "NAME", ? "DESCRIPTION", ? "WEIGHT" FROM dual) incoming on("TEST"."ID"=incoming."ID") when matched then update set "TEST"."NAME"=incoming."NAME","TEST"."DESCRIPTION"=incoming."DESCRIPTION","TEST"."WEIGHT"=incoming."WEIGHT" when not matched then insert("TEST"."NAME","TEST"."DESCRIPTION","TEST"."WEIGHT","TEST"."ID") values(incoming."NAME",incoming."DESCRIPTION",incoming."WEIGHT",incoming."ID"), Error Msg = ORA-00942: table or view does not exist
at io.confluent.connect.jdbc.sink.JdbcSinkTask.getAllMessagesException(JdbcSinkTask.java:150)
at io.confluent.connect.jdbc.sink.JdbcSinkTask.put(JdbcSinkTask.java:102)
... 11 more
2022-07-28 15:01:58,644 ERROR || WorkerSinkTask{id=jdbc-sink-test-oracle-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted [org.apache.kafka.connect.runtime.WorkerTask]
org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:610)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:330)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:232)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:201)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:188)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:237)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.kafka.connect.errors.ConnectException: java.sql.SQLException: Exception chain:
java.sql.BatchUpdateException: ORA-00942: table or view does not exist
java.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist
I believe according to the configurations I gave, the table needs to be auto created but it says table doesnt exist.
But, it is working fine and is auto creating the table named 'TEST2' and is also exporting the data from source to this table for this sink connector configuration
{
"name": "jdbc-sink-test2-oracle",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"dialect.name": "OracleDatabaseDialect",
"table.name.format": "TEST2",
"topics": "oracle-db-source.DBZ_SRC.TEST",
"connection.url": "jdbc:oracle:thin:#hostname:1521/DB",
"connection.user": "DBZ_TARGET",
"connection.password": "DBZ_TARGET",
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones": "false",
"auto.create": "true",
"insert.mode": "upsert",
"delete.enabled": "true",
"pk.fields": "ID",
"pk.mode": "record_key"
}
}
Edit:
The sink connector is working fine if the target table with the same name as the source table is already created with same DDL, but the target table is not getting auto created if it is not present already.

Debezium Kafka Connect DROP Table not working

I have the following setup for implementing CDC using Debezium
Oracle -> Debezium Source Connector -> Kafka -> JDBC Sink Connector -> PostgreSQL
Source Connector config is
{
"name":"myfirst-connector",
"config":{
"connector.class":"io.debezium.connector.oracle.OracleConnector",
"tasks.max":"1",
"database.hostname":"192.168.29.102",
"database.port":"1521",
"database.user":"c##dbzuser",
"database.password":"dbz",
"database.dbname":"ORCLCDB",
"database.pdb.name":"ORCLPDB1",
"database.server.name":"oracle19",
"database.connection.adapter":"logminer",
"database.history.kafka.topic":"schema_changes",
"database.history.kafka.bootstrap.servers":"192.168.29.102:9092",
"database.tablename.case.insensitive":"true",
"snapshot.mode":"initial",
"tombstones.on.delete":"true",
"include.schema.changes": "true",
"sanitize.field.names":"true",
"key.converter":"org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable":"true",
"value.converter":"org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable":"true",
"time.precision.mode": "connect",
"database.oracle.version":19
} }
Sink connector config is
{
"name": "myjdbc-sink-testdebezium",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics.regex": "oracle19.C__DBZUSER.*",
"connection.url": "jdbc:postgresql://192.168.29.102:5432/postgres?user=puser&password=My19cPassword",
"key.converter":"org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable":"true",
"value.converter":"org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable":"true",
"dialect.name": "PostgreSqlDatabaseDialect",
"auto.create": "true",
"auto.evolve": "true",
"insert.mode": "upsert",
"delete.enabled": "true",
"transforms": "unwrap, RemoveString, TimestampConverter",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.delete.handling.mode": "none",
"transforms.RemoveString.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.RemoveString.regex": "(.*)\\.C__DBZUSER\\.(.*)",
"transforms.RemoveString.replacement": "$2",
"transforms.TimestampConverter.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
"transforms.TimestampConverter.target.type": "Timestamp",
"transforms.TimestampConverter.field": "dob",
"pk.mode": "record_key"
}
}
Now when I drop a table in Oracle I get an entry in schema_changes topic but the table is not dropped from PostgreSQL. Need help in figuring out the issue why drop is not getting propogated. Just FYI, all the other operations i.e. Create Table, Alter Table, Insert, Update, Delete are working fine. Only DROP is not working and I am not getting any exception in the logs either.

Kafka connect database source connector : how to copy data from foreign key

I am using the database source connector to move data from my Postgres database table to Kafka topic. I have an orders table having a foreign key with customers table using customerNumber field.
Below is the connector which is copying the orders to Kafka but without customers data into JSON. I am looking how could I construct the complete object of orders with customers into JSON.
and the connector is :
{
"name": "SOURCE_CONNECTOR",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"transforms.createKey.type": "org.apache.kafka.connect.transforms.ValueToKey",
"connection.password": "postgres_pwd",
"transforms.cast.type": "org.apache.kafka.connect.transforms.Cast$Value",
"transforms.cast.spec": "amount:float64",
"tasks.max": "1",
"transforms": "cast,createKey,extractInt",
"transforms.extractInt.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"batch.max.rows": "25",
"table.whitelist": "orders",
"mode": "bulk",
"topic.prefix": "data_",
"transforms.extractInt.field": "uuid",
"connection.user": "postgres_user",
"transforms.createKey.fields": "uuid",
"poll.interval.ms": "3600000",
"sql.quote.identifiers": "false",
"name": "SOURCE_CONNECTOR",
"numeric.mapping": "best_fit",
"connection.url": "url"
}
}
You can use Query-based ingest. Just specify query config option.

ExtractField and Parse JSON in kafka-connect sink

I have a kafka-connect flow of mongodb->kafka connect->elasticsearch sending data end to end OK, but the payload document is JSON encoded. Here's my source mongodb document.
{
"_id": "1541527535911",
"enabled": true,
"price": 15.99,
"style": {
"color": "blue"
},
"tags": [
"shirt",
"summer"
]
}
And here's my mongodb source connector configuration:
{
"name": "redacted",
"config": {
"connector.class": "com.teambition.kafka.connect.mongo.source.MongoSourceConnector",
"databases": "redacted.redacted",
"initial.import": "true",
"topic.prefix": "redacted",
"tasks.max": "8",
"batch.size": "1",
"key.serializer": "org.apache.kafka.common.serialization.StringSerializer",
"value.serializer": "org.apache.kafka.common.serialization.JSONSerializer",
"key.serializer.schemas.enable": false,
"value.serializer.schemas.enable": false,
"compression.type": "none",
"mongo.uri": "mongodb://redacted:27017/redacted",
"analyze.schema": false,
"schema.name": "__unused__",
"transforms": "RenameTopic",
"transforms.RenameTopic.type":
"org.apache.kafka.connect.transforms.RegexRouter",
"transforms.RenameTopic.regex": "redacted.redacted_Redacted",
"transforms.RenameTopic.replacement": "redacted"
}
}
Over in elasticsearch, it ends up looking like this:
{
"_index" : "redacted",
"_type" : "kafka-connect",
"_id" : "{\"schema\":{\"type\":\"string\",\"optional\":true},\"payload\":\"1541527535911\"}",
"_score" : 1.0,
"_source" : {
"ts" : 1541527536,
"inc" : 2,
"id" : "1541527535911",
"database" : "redacted",
"op" : "i",
"object" : "{ \"_id\" : \"1541527535911\", \"price\" : 15.99,
\"enabled\" : true, \"tags\" : [\"shirt\", \"summer\"],
\"style\" : { \"color\" : \"blue\" } }"
}
}
I'd like to do use 2 single message transforms:
ExtractField to grab object, which is a string of JSON
Something to parse that JSON into an object or just let the normal JSONConverter handle it, as long as it ends up as properly structured in elasticsearch.
I've attempted to do it with just ExtractField in my sink config, but I see this error logged by kafka
kafka-connect_1 | org.apache.kafka.connect.errors.ConnectException:
Bulk request failed: [{"type":"mapper_parsing_exception",
"reason":"failed to parse",
"caused_by":{"type":"not_x_content_exception",
"reason":"Compressor detection can only be called on some xcontent bytes or
compressed xcontent bytes"}}]
Here's my elasticsearch sink connector configuration. In this version, I have things working but I had to code a custom ParseJson SMT. It's working well, but if there's a better way or a way to do this with some combination of built-in stuff (converters, SMTs, whatever works), I'd love to see that.
{
"name": "redacted",
"config": {
"connector.class":
"io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"batch.size": 1,
"connection.url": "http://redacted:9200",
"key.converter.schemas.enable": true,
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"schema.ignore": true,
"tasks.max": "1",
"topics": "redacted",
"transforms": "ExtractFieldPayload,ExtractFieldObject,ParseJson,ReplaceId",
"transforms.ExtractFieldPayload.type": "org.apache.kafka.connect.transforms.ExtractField$Value",
"transforms.ExtractFieldPayload.field": "payload",
"transforms.ExtractFieldObject.type": "org.apache.kafka.connect.transforms.ExtractField$Value",
"transforms.ExtractFieldObject.field": "object",
"transforms.ParseJson.type": "reaction.kafka.connect.transforms.ParseJson",
"transforms.ReplaceId.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.ReplaceId.renames": "_id:id",
"type.name": "kafka-connect",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": false
}
}
I am not sure about your Mongo connector. I don't recognize the class or the configurations... Most people probably use Debezium Mongo connector
I would setup this way, though
"connector.class": "com.teambition.kafka.connect.mongo.source.MongoSourceConnector",
"key.serializer": "org.apache.kafka.common.serialization.StringSerializer",
"value.serializer": "org.apache.kafka.common.serialization.JSONSerializer",
"key.serializer.schemas.enable": false,
"value.serializer.schemas.enable": true,
The schemas.enable is important, that way the internal Connect data classes can know how to convert to/from other formats.
Then, in the Sink, you again need to use JSON DeSerializer (via the converter) so that it creates a full object rather than a plaintext string, as you see in Elasticsearch ({\"schema\":{\"type\":\"string\").
"connector.class":
"io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"key.converter.schemas.enable": false,
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": true
And if this doesn't work, then you might have to manually create your index mapping in Elasticsearch ahead of time so it knows how to actually parse the strings you are sending it

Resources