S3 connector with HourlyPartitioner failing - apache-kafka-connect

When we tried to write into S3 through S3 sink connector with default config, working fine without any issue. But when we tried with hourly partition getting failed with below error.
Please find the both codes and error messages and help us to resolve this issue
Default :
{
"value.converter.schemas.enable": "false",
"name": "tibconew1-test-s3standard-default-sink-connector",
"connector.class": "io.confluent.connect.s3.S3SinkConnector",
"tasks.max": "2",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.storage.StringConverter",
"errors.tolerance": "all",
"topics": [
"test.s3custom.default.dax.shipment.data",
"test.s3custom.default.dax.shipment.data",
"test.s3custom.hourly.onprem.tibco.dax_shipment.dpp_asn"
],
"topics.regex": "",
"errors.deadletterqueue.topic.name": "dlq_test.s3custom.default.dax.shipment.data",
"errors.deadletterqueue.context.headers.enable": "true",
"format.class": "io.confluent.connect.s3.format.json.JsonFormat",
"flush.size": "1000",
"s3.bucket.name": "test-stg-raw",
"s3.region": "us-east-1",
"s3.credentials.provider.class": "com.amazonaws.auth.InstanceProfileCredentialsProvider",
"s3.acl.canned": "bucket-owner-full-control",
"storage.class": "io.confluent.connect.s3.storage.S3Storage",
"topics.dir": "streams_dir",
"partitioner.class": "io.confluent.connect.storage.partitioner.DefaultPartitioner"
}
Hourly :
{
"value.converter.schema.registry.url": "https://confschema.test-dsol-core.testdigital-stg.com",
"value.converter.schemas.enable": "false",
"name": "test.s3custom.hourly.tibco.dax_shipment.dpp_asn.sink-connector",
"connector.class": "io.confluent.connect.s3.S3SinkConnector",
"tasks.max": "2",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"errors.tolerance": "all",
"topics": [
"test.s3custom.hourly.onprem.tibco.dax_shipment.dpp_asn"
],
"topics.regex": "",
"errors.deadletterqueue.topic.name": "dlq_test.s3custom.hourly.onprem.tibco.dax_shipment.dpp_asn.sink",
"errors.deadletterqueue.context.headers.enable": "true",
"format.class": "io.confluent.connect.s3.format.json.JsonFormat",
"flush.size": "10",
"s3.bucket.name": "test-stg-raw",
"s3.region": "us-east-1",
"s3.credentials.provider.class": "com.amazonaws.auth.InstanceProfileCredentialsProvider",
"s3.acl.canned": "bucket-owner-full-control",
"storage.class": "io.confluent.connect.s3.storage.S3Storage",
"topics.dir": "streams_dir",
"partitioner.class": "io.confluent.connect.storage.partitioner.HourlyPartitioner",
"locale": "en-US",
"timezone": "America/Chicago",
"timestamp.extractor": "RecordField",
"timestamp.field": "DPP_ASN.LST_UPDT_TS"
}
Error :

Finally we found the reason . Due to timestamp received from the payload is an invalid format which has additional space in it.So we corrected the format in source side. For the hourly partitioner, the connector is expecting the value is based on hours.
Hourly Partitioner:
io.confluent.connect.storage.partitioner.HourlyPartitioner is equivalent to the TimeBasedPartitioner with path.format='year'=YYYY/'month'=MM/'day'=dd/'hour'=HH and
Message was : "LST_UPDT_TS":"2021-02-01 07:16:23.567"
Corrected as : "LST_UPDT_TS":"2015-08-01T17:00:00.69243-05:00"

Related

Oracle -> Apache Kafka -> Postgres (Found a record at a null key and null key schema)

Error: Sink connector 'jdbc-sink' is configured with
'delete.enabled=true' and 'pk.mode=record_key' and therefore requires
records with a non-null key and non-null Struct or primitive key
schema, but found record at
(topic='person',partition=0,offset=0,timestamp=1676879822391) with a
null key and null key schema.
(org.apache.kafka.connect.runtime.WorkerSinkTask:566)
Although I am inserting data with a non-null value in the table and table is defined with non-null constraint. Source connector gives me an error:
This is my source connector:
{
"name" : "source",
"config" : {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"connection.url": "jdbc:oracle:thin:#192.168.91.253:1521/orcl",
"connection.user":"test",
"connection.password":"12",
"topic.prefix":"person",
"mode":"incrementing",
"poll.interval.ms":"1000",
"incrementing.column.name":"ID",
"numeric.mapping": "best_fit",
"query": "SELECT * FROM person",
"include.schema.changes": "true",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "true",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "true"
}
}
This is my sink connector:
{
"name": "jdbc-sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics":"person",
"connection.url": "jdbc:postgresql://192.168.91.229:5432/postgres?user=postgres&password=postgres ",
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones": "false",
"delete.enabled": "true",
"pk.fields": "ID",
"pk.mode":"record_key",
"insert.mode": "upsert",
"auto.create": "true",
"include.schema.changes": "true",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "true",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "true"
}
}

Can JDBC Sink Connector work on unique key instead of primary key?

I have a source PostgreSQL table with following columns
ID: Long
FirstName: Varchar
...
I am getting the messages as events in Kafka using Debezium. This is working fine
My question is related to JDBC Sink. My target table is:
ID: UUID
UserID: Long
FirstName: Varchar
If you notice the ID Type here is UUID and UserID is the one that is ID from source table.
So question is can I have my own primary key i.e ID and still can have upsert commands work?
My Config:
{
"name": "users-task-service",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"key.converter.schemas.enable": "true",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "true",
"database.hostname": "host.docker.internal",
"topics": "postgres.public.users",
"connection.url": "jdbc:postgresql://host.docker.internal:5432/tessting",
"connection.user": "postgres",
"connection.password": "",
"auto.create": "false",
"insert.mode": "upsert",
"table.name.format": "users_temp",
"dialect.name": "PostgreSqlDatabaseDialect",
"transforms": "unwrap, RenameField",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.RenameField.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.RenameField.renames": "id:userid",
"pk.fields": "id",
"pk.mode": "record_key",
"delete.enabled": "true",
"fields.whitelist": "userid"
}
}

None of log files contains offset SCN: 9483418337736, re-snapshot is required

I am deploying a Debezium connector for oracle. When I deploy the connect it takes a consistent snapshot and it captures all the records being inserted into the table. After a while ranging from a few minutes to hours, it fails complaining about the SCN could not be found. When I perform a fresh snapshot it starts working and then fails after a while with the same error.
Following is the deployed config.
{
"name": "debezium-connector",
"config": {
"connector.class": "io.debezium.connector.oracle.OracleConnector",
"message.key.columns": "COMMON.SETTLEMENT_DAY:SETTLEMENT_DAY",
"tasks.max": "1",
"database.history.kafka.topic": "debezium.schema-changes.inventory",
"log.mining.strategy": "online_catalog",
"signal.data.collection": "ABC.debezium_signal",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"log.mining.archive.log.hours": "2",
"database.user": "ABC",
"database.dbname": "EDPM",
"database.connection.adapter": "logminer",
"database.history.kafka.bootstrap.servers": "10.0.0.36:9092,10.0.0.37:9092,10.0.0.38:9092",
"database.url": "jdbc:oracle:thin:#localhost:1521/EDPM",
"time.precision.mode": "connect",
"database.server.name": "DEBEZIUM",
"event.processing.failure.handling.mode": "warn",
"heartbeat.interval.ms": "300000",
"database.port": "1521",
"key.converter.schemas.enable": "true",
"database.hostname": "localhost",
"database.password": "************",
"value.converter.schemas.enable": "true",
"name": "debezium-connector",
"table.include.list": "COMMON.SETTLEMENT_DAY,ABC.debezium_signal",
"snapshot.mode": "schema_only"
}
}
I am getting the following error:
{
"name": "debezium-connector",
"connector": {
"state": "RUNNING",
"worker_id": "0.0.0.0:28084"
},
"tasks": [
{
"id": 0,
"state": "FAILED",
"worker_id": "0.0.0.0:28084",
"trace": "org.apache.kafka.connect.errors.ConnectException: An exception occurred in the change event producer. This connector will be stopped.\n\tat
io.debezium.pipeline.ErrorHandler.setProducerThrowable(ErrorHandler.java:42)\n\tat
io.debezium.connector.oracle.logminer.LogMinerStreamingChangeEventSource.execute(LogMinerStreamingChangeEventSource.java:211)\n\tat
io.debezium.connector.oracle.logminer.LogMinerStreamingChangeEventSource.execute(LogMinerStreamingChangeEventSource.java:63)\n\tat
io.debezium.pipeline.ChangeEventSourceCoordinator.streamEvents(ChangeEventSourceCoordinator.java:159)\n\tat
io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:122)\n\tat
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n\tat
java.util.concurrent.FutureTask.run(FutureTask.java:266)\n\tat
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat
java.lang.Thread.run(Thread.java:748)\n
Caused by: java.lang.IllegalStateException: None of log files contains offset SCN: 9483418337736, re-snapshot is required.\n\tat
io.debezium.connector.oracle.logminer.LogMinerHelper.setLogFilesForMining(LogMinerHelper.java:480)\n\tat
io.debezium.connector.oracle.logminer.LogMinerStreamingChangeEventSource.initializeRedoLogsForMining(LogMinerStreamingChangeEventSource.java:248)\n\tat
io.debezium.connector.oracle.logminer.LogMinerStreamingChangeEventSource.execute(LogMinerStreamingChangeEventSource.java:167)\n\t... 8 more\n"
}
],
"type": "source"
}

Kafka JDBC Source connector: create topics from column values

I have a microservice that uses OracleDB to publish the system changes in the EVENT_STORE table. The table EVENT_STORE contains a column TYPE with the name of the type of the event.
It is possible that JDBC Source Kafka Connect take the EVENT_STORE table changes and publish them with the value of column TYPE in the KAFKA-TOPIC?
It is my source kafka connector config:
{
"name": "kafka-connector-source-ms-name",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"tasks.max": "1",
"connection.url": "jdbc:oracle:thin:#localhost:1521:xe",
"connection.user": "squeme-name",
"connection.password": "password",
"topic.prefix": "",
"table.whitelist": "EVENT_STORE",
"mode": "timestamp+incrementing",
"timestamp.column.name": "CREATE_AT",
"incrementing.column.name": "ID",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"config.action.reload": "restart",
"errors.retry.timeout": "0",
"errors.retry.delay.max.ms": "60000",
"errors.tolerance": "none",
"errors.log.enable": "false",
"errors.log.include.messages": "false",
"connection.attempts": "3",
"connection.backoff.ms": "10000",
"numeric.precision.mapping": "false",
"validate.non.null": "true",
"quote.sql.identifiers": "ALWAYS",
"table.types": "TABLE",
"poll.interval.ms": "5000",
"batch.max.rows": "100",
"table.poll.interval.ms": "60000",
"timestamp.delay.interval.ms": "0",
"db.timezone": "UTC"
}
}
You can try the ExtractTopic transform to pull a topic name from a field
Add the following properties to the JSON
transforms=ValueFieldExample
transforms.ValueFieldExample.type=io.confluent.connect.transforms.ExtractTopic$Value
transforms.ValueFieldExample.field=TYPE

having a problem with the flatten value transformation

I am attempting to flatten a topic before sending it along to my postgres db, using something like the connector below. I am using the confluent 4.1.1 kafka connect docker image, the only change being I copied a custom connector jar into /usr/share/java and am running it under a different accoount.
version (kafka connect) "1.1.1-cp1"
commit "0a5db4d59ee15a47"
{
"name": "problematic_postgres_sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"key.converter.schema.registry.url": "http://kafkaschemaregistry.service.consul:8081",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://kafkaschemaregistry.service.consul:8081",
"connection.url": "jdbc:postgresql://123.123.123.123:5432/mypostgresdb",
"connection.user": "abc",
"connection.password": "xyz",
"insert.mode": "upsert",
"auto.create": true,
"auto.evolve": true,
"topics": "mytopic",
"pk.mode": "kafka",
"transforms": "Flatten",
"transforms.Flatten.type": "org.apache.kafka.connect.transforms.Flatten$Value",
"transforms.Flatten.delimiter": "_"
}
}
I get a 400 error code:
Connector configuration is invalid and contains the following 1
error(s): Invalid value class
org.apache.kafka.connect.transforms.Flatten for configuration
transforms.Flatten.type: Error getting config definition from
Transformation: null

Resources