Debezium property snapshot.new.tables not working - apache-kafka-connect

The debezium connector works fine when initiated. But when i add new tables to the table list, instead of taking a snapshot of new tables, it starts reading from bin log even though i have provided snaphot.new.tables = parallel.
Here is my config.
{
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"database.dbname": "db_name",
"database.hostname": "host",
"database.port": "3306",
"database.user": "user",
"database.password": "pass",
"database.server.name": "server",
"database.include.list": "db_name",
"table.include.list": "table1,table2",
"decimal.handling.mode": "double",
"name": "server",
"database.history.kafka.bootstrap.servers": "kafka",
"database.history.kafka.topic": "stopic",
"offset.flush.interval.ms": "5000",
"offset.flush.timeout.ms": "60000",
"slot.name": "slot_name",
"snapshot.lock.timeout.ms": "99999999",
"snapshot.mode": "when_needed",
"tombstones.on.delete": "false",
"snapshot.new.tables":"parallel",
"sanitize.field.names": "true",
"snapshot.fetch.size": 50000,
"snapshot.locking.mode": "none",
"database.initial.statements": "SET SESSION wait_timeout=28800;SET SESSION interactive_timeout=28800;"
}

Related

Oracle -> Apache Kafka -> Postgres (Found a record at a null key and null key schema)

Error: Sink connector 'jdbc-sink' is configured with
'delete.enabled=true' and 'pk.mode=record_key' and therefore requires
records with a non-null key and non-null Struct or primitive key
schema, but found record at
(topic='person',partition=0,offset=0,timestamp=1676879822391) with a
null key and null key schema.
(org.apache.kafka.connect.runtime.WorkerSinkTask:566)
Although I am inserting data with a non-null value in the table and table is defined with non-null constraint. Source connector gives me an error:
This is my source connector:
{
"name" : "source",
"config" : {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"connection.url": "jdbc:oracle:thin:#192.168.91.253:1521/orcl",
"connection.user":"test",
"connection.password":"12",
"topic.prefix":"person",
"mode":"incrementing",
"poll.interval.ms":"1000",
"incrementing.column.name":"ID",
"numeric.mapping": "best_fit",
"query": "SELECT * FROM person",
"include.schema.changes": "true",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "true",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "true"
}
}
This is my sink connector:
{
"name": "jdbc-sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics":"person",
"connection.url": "jdbc:postgresql://192.168.91.229:5432/postgres?user=postgres&password=postgres ",
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones": "false",
"delete.enabled": "true",
"pk.fields": "ID",
"pk.mode":"record_key",
"insert.mode": "upsert",
"auto.create": "true",
"include.schema.changes": "true",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "true",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "true"
}
}

None of log files contains offset SCN: 9483418337736, re-snapshot is required

I am deploying a Debezium connector for oracle. When I deploy the connect it takes a consistent snapshot and it captures all the records being inserted into the table. After a while ranging from a few minutes to hours, it fails complaining about the SCN could not be found. When I perform a fresh snapshot it starts working and then fails after a while with the same error.
Following is the deployed config.
{
"name": "debezium-connector",
"config": {
"connector.class": "io.debezium.connector.oracle.OracleConnector",
"message.key.columns": "COMMON.SETTLEMENT_DAY:SETTLEMENT_DAY",
"tasks.max": "1",
"database.history.kafka.topic": "debezium.schema-changes.inventory",
"log.mining.strategy": "online_catalog",
"signal.data.collection": "ABC.debezium_signal",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"log.mining.archive.log.hours": "2",
"database.user": "ABC",
"database.dbname": "EDPM",
"database.connection.adapter": "logminer",
"database.history.kafka.bootstrap.servers": "10.0.0.36:9092,10.0.0.37:9092,10.0.0.38:9092",
"database.url": "jdbc:oracle:thin:#localhost:1521/EDPM",
"time.precision.mode": "connect",
"database.server.name": "DEBEZIUM",
"event.processing.failure.handling.mode": "warn",
"heartbeat.interval.ms": "300000",
"database.port": "1521",
"key.converter.schemas.enable": "true",
"database.hostname": "localhost",
"database.password": "************",
"value.converter.schemas.enable": "true",
"name": "debezium-connector",
"table.include.list": "COMMON.SETTLEMENT_DAY,ABC.debezium_signal",
"snapshot.mode": "schema_only"
}
}
I am getting the following error:
{
"name": "debezium-connector",
"connector": {
"state": "RUNNING",
"worker_id": "0.0.0.0:28084"
},
"tasks": [
{
"id": 0,
"state": "FAILED",
"worker_id": "0.0.0.0:28084",
"trace": "org.apache.kafka.connect.errors.ConnectException: An exception occurred in the change event producer. This connector will be stopped.\n\tat
io.debezium.pipeline.ErrorHandler.setProducerThrowable(ErrorHandler.java:42)\n\tat
io.debezium.connector.oracle.logminer.LogMinerStreamingChangeEventSource.execute(LogMinerStreamingChangeEventSource.java:211)\n\tat
io.debezium.connector.oracle.logminer.LogMinerStreamingChangeEventSource.execute(LogMinerStreamingChangeEventSource.java:63)\n\tat
io.debezium.pipeline.ChangeEventSourceCoordinator.streamEvents(ChangeEventSourceCoordinator.java:159)\n\tat
io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:122)\n\tat
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n\tat
java.util.concurrent.FutureTask.run(FutureTask.java:266)\n\tat
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat
java.lang.Thread.run(Thread.java:748)\n
Caused by: java.lang.IllegalStateException: None of log files contains offset SCN: 9483418337736, re-snapshot is required.\n\tat
io.debezium.connector.oracle.logminer.LogMinerHelper.setLogFilesForMining(LogMinerHelper.java:480)\n\tat
io.debezium.connector.oracle.logminer.LogMinerStreamingChangeEventSource.initializeRedoLogsForMining(LogMinerStreamingChangeEventSource.java:248)\n\tat
io.debezium.connector.oracle.logminer.LogMinerStreamingChangeEventSource.execute(LogMinerStreamingChangeEventSource.java:167)\n\t... 8 more\n"
}
],
"type": "source"
}

S3 connector with HourlyPartitioner failing

When we tried to write into S3 through S3 sink connector with default config, working fine without any issue. But when we tried with hourly partition getting failed with below error.
Please find the both codes and error messages and help us to resolve this issue
Default :
{
"value.converter.schemas.enable": "false",
"name": "tibconew1-test-s3standard-default-sink-connector",
"connector.class": "io.confluent.connect.s3.S3SinkConnector",
"tasks.max": "2",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.storage.StringConverter",
"errors.tolerance": "all",
"topics": [
"test.s3custom.default.dax.shipment.data",
"test.s3custom.default.dax.shipment.data",
"test.s3custom.hourly.onprem.tibco.dax_shipment.dpp_asn"
],
"topics.regex": "",
"errors.deadletterqueue.topic.name": "dlq_test.s3custom.default.dax.shipment.data",
"errors.deadletterqueue.context.headers.enable": "true",
"format.class": "io.confluent.connect.s3.format.json.JsonFormat",
"flush.size": "1000",
"s3.bucket.name": "test-stg-raw",
"s3.region": "us-east-1",
"s3.credentials.provider.class": "com.amazonaws.auth.InstanceProfileCredentialsProvider",
"s3.acl.canned": "bucket-owner-full-control",
"storage.class": "io.confluent.connect.s3.storage.S3Storage",
"topics.dir": "streams_dir",
"partitioner.class": "io.confluent.connect.storage.partitioner.DefaultPartitioner"
}
Hourly :
{
"value.converter.schema.registry.url": "https://confschema.test-dsol-core.testdigital-stg.com",
"value.converter.schemas.enable": "false",
"name": "test.s3custom.hourly.tibco.dax_shipment.dpp_asn.sink-connector",
"connector.class": "io.confluent.connect.s3.S3SinkConnector",
"tasks.max": "2",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"errors.tolerance": "all",
"topics": [
"test.s3custom.hourly.onprem.tibco.dax_shipment.dpp_asn"
],
"topics.regex": "",
"errors.deadletterqueue.topic.name": "dlq_test.s3custom.hourly.onprem.tibco.dax_shipment.dpp_asn.sink",
"errors.deadletterqueue.context.headers.enable": "true",
"format.class": "io.confluent.connect.s3.format.json.JsonFormat",
"flush.size": "10",
"s3.bucket.name": "test-stg-raw",
"s3.region": "us-east-1",
"s3.credentials.provider.class": "com.amazonaws.auth.InstanceProfileCredentialsProvider",
"s3.acl.canned": "bucket-owner-full-control",
"storage.class": "io.confluent.connect.s3.storage.S3Storage",
"topics.dir": "streams_dir",
"partitioner.class": "io.confluent.connect.storage.partitioner.HourlyPartitioner",
"locale": "en-US",
"timezone": "America/Chicago",
"timestamp.extractor": "RecordField",
"timestamp.field": "DPP_ASN.LST_UPDT_TS"
}
Error :
Finally we found the reason . Due to timestamp received from the payload is an invalid format which has additional space in it.So we corrected the format in source side. For the hourly partitioner, the connector is expecting the value is based on hours.
Hourly Partitioner:
io.confluent.connect.storage.partitioner.HourlyPartitioner is equivalent to the TimeBasedPartitioner with path.format='year'=YYYY/'month'=MM/'day'=dd/'hour'=HH and
Message was : "LST_UPDT_TS":"2021-02-01 07:16:23.567"
Corrected as : "LST_UPDT_TS":"2015-08-01T17:00:00.69243-05:00"

Confluent Kafka JDBC Source to Oracle EBR table

Confluent JDBC Source (Confluent 5.5) cannot detect a table with EBR (Edition Based Redefinition) from Oracle:
Config:
{
"name": "source-oracle-bulk2",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"tasks.max": 4,
"connection.url": "jdbc:oracle:thin:#//host:1521/service",
"connection.user": "user",
"connection.password": "passw",
"mode": "bulk",
"topic.prefix": "oracle-bulk-",
"numeric.mapping": "best_fit",
"poll.interval.ms": 60000,
"table.whitelist" : "SCHEMA1.T1"
}
}
Connect log:
connect | [2020-07-20 17:24:09,156] WARN No tasks will be run because no tables were found (io.confluent.connect.jdbc.JdbcSourceConnector)
With Query it successfully ingests data:
{
"name": "source-oracle-bulk1",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"tasks.max": 1,
"connection.url": "jdbc:oracle:thin:#//host:1521/service",
"connection.user": "user",
"connection.password": "pass",
"mode": "bulk",
"topic.prefix": "oracle-bulk-T1",
"numeric.mapping": "best_fit",
"poll.interval.ms": 60000,
"dialect.name" : "OracleDatabaseDialect",
"query": "SELECT * FROM SCHEMA1.T1"
}
}
Could it be that "table.types" should be specified for some specific type ?
I tried "VIEW" but it just cannot create a Source and fails with timeout: {"error_code":500,"message":"Request timed out"}

Kafka JDBC Source connector: create topics from column values

I have a microservice that uses OracleDB to publish the system changes in the EVENT_STORE table. The table EVENT_STORE contains a column TYPE with the name of the type of the event.
It is possible that JDBC Source Kafka Connect take the EVENT_STORE table changes and publish them with the value of column TYPE in the KAFKA-TOPIC?
It is my source kafka connector config:
{
"name": "kafka-connector-source-ms-name",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"tasks.max": "1",
"connection.url": "jdbc:oracle:thin:#localhost:1521:xe",
"connection.user": "squeme-name",
"connection.password": "password",
"topic.prefix": "",
"table.whitelist": "EVENT_STORE",
"mode": "timestamp+incrementing",
"timestamp.column.name": "CREATE_AT",
"incrementing.column.name": "ID",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"config.action.reload": "restart",
"errors.retry.timeout": "0",
"errors.retry.delay.max.ms": "60000",
"errors.tolerance": "none",
"errors.log.enable": "false",
"errors.log.include.messages": "false",
"connection.attempts": "3",
"connection.backoff.ms": "10000",
"numeric.precision.mapping": "false",
"validate.non.null": "true",
"quote.sql.identifiers": "ALWAYS",
"table.types": "TABLE",
"poll.interval.ms": "5000",
"batch.max.rows": "100",
"table.poll.interval.ms": "60000",
"timestamp.delay.interval.ms": "0",
"db.timezone": "UTC"
}
}
You can try the ExtractTopic transform to pull a topic name from a field
Add the following properties to the JSON
transforms=ValueFieldExample
transforms.ValueFieldExample.type=io.confluent.connect.transforms.ExtractTopic$Value
transforms.ValueFieldExample.field=TYPE

Resources