I'm not able to perform SMT transformation "ExtractField" in order to extract field from key struct to a simple long value with an Oracle database. It works fine with a Postgres database.
I tried to use "ReplaceField" SMT to rename the key and it works fine. I suspect a problem in the class "org.apache.kafka.connect.transforms.ExtractField" on schema handling to get the field. Schema handling seems to work differently between "ReplaceField" and "ExtractField".
Oracle database version: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production Version 19.8.0.0.0
Debezium connect: 1.6
Kafka version: 2.7.0
Instanclient basic (Oracle client and drivers): 21.3.0.0.0
I got an "Unknown field ID_MYTABLE":
org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded
in error handler at
org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:206)
at
org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:132)
at
org.apache.kafka.connect.runtime.TransformationChain.apply(TransformationChain.java:50)
at
org.apache.kafka.connect.runtime.WorkerSourceTask.sendRecords(WorkerSourceTask.java:339)
at
org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:264)
at
org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:185)
at
org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:234)
at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834) Caused by:
java.lang.IllegalArgumentException: Unknown field: ID_MYTABLE
org.apache.kafka.connect.transforms.ExtractField.apply(ExtractField.java:65)
at
org.apache.kafka.connect.runtime.TransformationChain.lambda$apply$0(TransformationChain.java:50)
at
org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:156)
at
org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:190)
... 11 more
Here is my configuration of my Kafka connector:
{
"name": "oracle-connector",
"config": {
"connector.class": "io.debezium.connector.oracle.OracleConnector",
"tasks.max": "1",
"database.server.name": "serverName",
"database.user": "c##dbzuser",
"database.password": "dbz",
"database.url": "jdbc:oracle:thin:...",
"database.dbname": "dbName",
"database.pdb.name": "PDBName",
"database.connection.adapter": "logminer",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "schema-changes.data",
"schema.include.list": "mySchema",
"table.include.list": "mySchema.myTable",
"log.mining.strategy": "online_catalog",
"snapshot.mode": "initial",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schemas.enable": "true",
"value.converter.schema.registry.url": "http://schema-registry:8081",
"transforms": "unwrap,route,extractField",
"transforms.extractField.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.extractField.field": "ID_MYTABLE",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.route.regex": "([^.]+)\\.([^.]+)\\.([^.]+)",
"transforms.route.replacement": "$1_$2_$3"
}
}
Related
I am using Debezium source connector (postgres) to track database changes to kafka and I am using kafka jdbc sink connector to transfer the data to another postgres server. Here insert and update are working fine. The problem is with delete. Whenever the delete occurs in the source database debezium sending a tombstone message. But jdbc sink connector trying to insert the row into the destination database and fails. Please help me where am I going wrong?
Source Connector
{
"name": "ksqldb-connector-actions",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"plugin.name": "pgoutput",
"database.hostname": "ipadress",
"database.port": "5432",
"database.user": "db",
"database.password": "*********",
"database.dbname": "config",
"database.server.name": "postgres",
"topic.prefix":"kcon",
"table.include.list": "dbo.actions",
"slot.name" : "slot_actions_connector",
"transforms":"unwrap",
"transforms.unwrap.type":"io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones":"false",
"transforms.unwrap.delete.handling.mode":"rewrite",
"transforms.unwrap.add.fields":"table,lsn"
}
}
For transforms.unwrap.delete.handling.mode I tried "rewrite" as well as "drop" but both are failing on delete
Sink Connector
{
"name": "jdbc-sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "kcon.dbo.actions",
"connection.url": "jdbc:postgresql://ipadress:5432/config",
"connection.user": "wft",
"connection.password": "*******",
"insert.mode": "upsert",
"delete.enabled": "true",
"table.name.format":"dbo.actions_etl_kafka",
"pk.mode":"record_key",
"pk.fields": "action_id",
"db.timezone":"Asia/Kolkata",
"auto.create":"true",
"auto.evolve":"true",
"errors.tolerance": "all",
"errors.log.enable": "true",
"errors.log.include.messages": "true",
"transforms": "flatten",
"transforms.flatten.type": "org.apache.kafka.connect.transforms.Flatten$Key",
"transforms.flatten.delimiter": "_",
"input.data.format": "AVRO",
"key.converter":"io.confluent.connect.avro.AvroConverter",
"value.converter":"io.confluent.connect.avro.AvroConverter",
"key.converter.schemas.enable":"true",
"value.converter.schemas.enable": "true",
"key.converter.schema.registry.url":"http://schema-registry-ksql:8081",
"value.converter.schema.registry.url":"http://schema-registry-ksql:8081"
}
}
Actually the problem was the kafka connect version which is unable to handle the Tombstone message so all the time delete failed. I was using confluentinc/cp-kafka-connect:5.2.1.
Now I created a custom image with the latest version and the delete works fine. The custom image creation is below. May be helpful to someone.
FROM confluentinc/cp-kafka-connect:6.1.9
ENV CONNECT_PLUGIN_PATH=/usr/share/java/,/usr/share/confluent-hub-components/
RUN confluent-hub install --no-prompt confluentinc/kafka-connect-jdbc:10.5.2
RUN confluent-hub install --no-prompt debezium/debezium-connector-postgresql:1.9.3
RUN confluent-hub install --no-prompt jcustenborder/kafka-connect-transform-common:0.1.0.54
I have a table “books” in database motor. This is my source and for source connection I created a topic “mysql-books”. So far all good I am able to see messages on Confluent Control Center. Now these messages I want to sink into another database called "motor-audit" so that in audit I am should see all the changes that happened to the table “books”. I have given the topic “mysql-books” in my sink curl for sink connector since changes are being published to this topic.
My source config -
curl -X POST http://localhost:8083/connectors -H "Content-Type: application/json" -d '{
"name": "jdbc_source_mysql_001",
"config": {
"value.converter.schema.registry.url": "http://0.0.0.0:8081",
"key.converter.schema.registry.url": "http://0.0.0.0:8081",
"name": "jdbc_source_mysql_001",
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"connection.url": "jdbc:mysql://localhost:3306/motor",
"connection.user": "yagnesh",
"connection.password": "yagnesh123",
"catalog.pattern": "motor",
"mode": "bulk",
"poll.interval.ms": "10000",
"topic.prefix": "mysql-",
"transforms":"createKey,extractInt",
"transforms.createKey.type":"org.apache.kafka.connect.transforms.ValueToKey",
"transforms.createKey.fields":"id",
"transforms.extractInt.type":"org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.extractInt.field":"id"
}
}
My Sink config -
curl -X PUT http://localhost:8083/connectors/jdbc_sink_mysql_001/config \
-H "Content-Type: application/json" -d '{
"value.converter.schema.registry.url": "http://0.0.0.0:8081",
"value.converter.schemas.enable": "true",
"key.converter.schema.registry.url": "http://0.0.0.0:8081",
"name": "jdbc_sink_mysql_001",
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"topics":"mysql-books",
"connection.url": "jdbc:mysql://mysql:3306/motor",
"connection.user": "yagnesh",
"connection.password": "yagnesh123",
"insert.mode": "insert",
"auto.create": "true",
"auto.evolve": "true"
}'
This is how messages on the topic look like -
The keys are seen in bytes but even if I use either AvroConverter or StringConverter for the key and keep it same in both source and sink still I face the same error.
The database table which is into play is created with this schema -
CREATE TABLE `motor`.`books` (
`id` INT NOT NULL AUTO_INCREMENT,
`author` VARCHAR(45) NULL,
PRIMARY KEY (`id`));
With all this I am facing this error -
io.confluent.rest.exceptions.RestNotFoundException: Subject 'mysql-books-key' not found.
at io.confluent.kafka.schemaregistry.rest.exceptions.Errors.subjectNotFoundException(Errors.java:69)
Edit: I modified the URL in sink to have localhost and given stringconverter to key and kep avroconverter for value and now I am getting a new error which is -
Caused by: java.sql.SQLException: Exception chain:
java.sql.SQLSyntaxErrorException: BLOB/TEXT column 'id' used in key specification without a key length
Edit 2:
As suggested by #Onecricketeer I am trying Debezium and using below config for MysqlConnector. I have already enabled bin_log in mysqld.cnf but upon launching getting errors like -
Caused by: org.apache.kafka.connect.errors.DataException: Field does not exist: id
This is my debezium config -
{
"transforms.createKey.type": "org.apache.kafka.connect.transforms.ValueToKey",
"transforms.extractInt.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"value.converter.schema.registry.url": "http://0.0.0.0:8081",
"transforms.extractInt.field": "id",
"transforms.createKey.fields": "id",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"key.converter.schema.registry.url": "http://0.0.0.0:8081",
"name": "mysql-connector-deb-demo",
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"key.converter": "org.apache.kafka.connect.converters.IntegerConverter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"transforms": [
"createKey",
"extractInt",
"unwrap"
],
"database.hostname": "localhost",
"database.port": "3306",
"database.user": "yagnesh",
"database.password": "**********",
"database.server.name": "mysql",
"database.server.id": "1",
"event.processing.failure.handling.mode": "ignore",
"database.history.kafka.bootstrap.servers": "localhost:9092",
"database.history.kafka.topic": "dbhistory.demo",
"table.whitelist": [
"motor.books"
],
"table.include.list": [
"motor.books"
],
"include.schema.changes": "true"
}
Before using "unwrap" I was facing mismatched input '-' expecting <EOF> SQL
hence upon looking for this fixed this using "unwrap" following this question - Fix for mismatched input.
Let me know if this is actually needed or not.
I have to push data from Clickhouse to Kafka topics,so I tried to use the Confluent JDBC connector.
i am following this tutorial that uses mysql instead of clickhouse.
here is my configuration and its works with mysql but has this error with clickhouse.
Missing columns: 'CURRENT_TIMESTAMP' while processing query: 'SELECT CURRENT_TIMESTAMP', required columns: 'CURRENT_TIMESTAMP', source columns: 'dummy' (version 19.17.4.11 (official build))
my configuration:
{
"name": "jdbc_source_clickhouse_my-table_01",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url": "http://localhost:8081",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://localhost:8081",
"connection.url": "jdbc:clickhouse://localhost:8123/default?user=default&password=12344esz",
"table.whitelist": "my-table",
"mode": "timestamp",
"timestamp.column.name": "order_time",
"validate.non.null": "false",
"topic.prefix": "clickhouse-"
}
}
I'm trying to use TimeBasedPartitioner that extracts using RecordField with the following configuration:
{
"name": "s3-sink",
"connector.class": "io.confluent.connect.s3.S3SinkConnector",
"tasks.max": "10",
"topics": "topics1.topics2",
"s3.region": "us-east-1",
"s3.bucket.name": "bucket",
"s3.part.size": "5242880",
"s3.compression.type": "gzip",
"timezone": "UTC",
"rotate.schedule.interval.ms": "900000",
"flush.size": "1000000",
"schema.compatibility": "NONE",
"storage.class": "io.confluent.connect.s3.storage.S3Storage",
"format.class": "io.confluent.connect.s3.format.bytearray.ByteArrayFormat",
"partitioner.class": "io.confluent.connect.storage.partitioner.HourlyPartitioner",
"partition.duration.ms": "900000",
"locale": "en",
"timestamp.extractor": "RecordField",
"timestamp.field": "time",
"key.converter": "org.apache.kafka.connect.converters.ByteArrayConverter",
"key.converter.schemas.enabled": false,
"value.converter": "org.apache.kafka.connect.converters.ByteArrayConverter",
"value.converter.schemas.enabled": false,
"interal.key.converter": "org.apache.kafka.connect.json.JsonConverter",
"internal.key.converter.schemas.enabled": false,
"interal.value.converter": "org.apache.kafka.connect.json.JsonConverter",
"internal.value.converter.schemas.enabled": false,
}
I keep getting the following error and I'm not finding much that explains what is going on. I looked at the source code and it appears that the record is not a Struct or Map type so I'm wondering if there is an issue with using ByteArrayFormat?
org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:546)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:302)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:205)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:173)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:170)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:214)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: io.confluent.connect.storage.errors.PartitionException: Error encoding partition.
at io.confluent.connect.storage.partitioner.TimeBasedPartitioner$RecordFieldTimestampExtractor.extract(TimeBasedPartitioner.java:294)
at io.confluent.connect.s3.TopicPartitionWriter.executeState(TopicPartitionWriter.java:199)
at io.confluent.connect.s3.TopicPartitionWriter.write(TopicPartitionWriter.java:176)
at io.confluent.connect.s3.S3SinkTask.put(S3SinkTask.java:195)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:524)
I've been able to write out using the default partitioner.
I am attempting to flatten a topic before sending it along to my postgres db, using something like the connector below. I am using the confluent 4.1.1 kafka connect docker image, the only change being I copied a custom connector jar into /usr/share/java and am running it under a different accoount.
version (kafka connect) "1.1.1-cp1"
commit "0a5db4d59ee15a47"
{
"name": "problematic_postgres_sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"key.converter.schema.registry.url": "http://kafkaschemaregistry.service.consul:8081",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://kafkaschemaregistry.service.consul:8081",
"connection.url": "jdbc:postgresql://123.123.123.123:5432/mypostgresdb",
"connection.user": "abc",
"connection.password": "xyz",
"insert.mode": "upsert",
"auto.create": true,
"auto.evolve": true,
"topics": "mytopic",
"pk.mode": "kafka",
"transforms": "Flatten",
"transforms.Flatten.type": "org.apache.kafka.connect.transforms.Flatten$Value",
"transforms.Flatten.delimiter": "_"
}
}
I get a 400 error code:
Connector configuration is invalid and contains the following 1
error(s): Invalid value class
org.apache.kafka.connect.transforms.Flatten for configuration
transforms.Flatten.type: Error getting config definition from
Transformation: null