Kafka JDBC Source connector: create topics from column values - jdbc

I have a microservice that uses OracleDB to publish the system changes in the EVENT_STORE table. The table EVENT_STORE contains a column TYPE with the name of the type of the event.
It is possible that JDBC Source Kafka Connect take the EVENT_STORE table changes and publish them with the value of column TYPE in the KAFKA-TOPIC?
It is my source kafka connector config:
{
"name": "kafka-connector-source-ms-name",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"tasks.max": "1",
"connection.url": "jdbc:oracle:thin:#localhost:1521:xe",
"connection.user": "squeme-name",
"connection.password": "password",
"topic.prefix": "",
"table.whitelist": "EVENT_STORE",
"mode": "timestamp+incrementing",
"timestamp.column.name": "CREATE_AT",
"incrementing.column.name": "ID",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"config.action.reload": "restart",
"errors.retry.timeout": "0",
"errors.retry.delay.max.ms": "60000",
"errors.tolerance": "none",
"errors.log.enable": "false",
"errors.log.include.messages": "false",
"connection.attempts": "3",
"connection.backoff.ms": "10000",
"numeric.precision.mapping": "false",
"validate.non.null": "true",
"quote.sql.identifiers": "ALWAYS",
"table.types": "TABLE",
"poll.interval.ms": "5000",
"batch.max.rows": "100",
"table.poll.interval.ms": "60000",
"timestamp.delay.interval.ms": "0",
"db.timezone": "UTC"
}
}

You can try the ExtractTopic transform to pull a topic name from a field
Add the following properties to the JSON
transforms=ValueFieldExample
transforms.ValueFieldExample.type=io.confluent.connect.transforms.ExtractTopic$Value
transforms.ValueFieldExample.field=TYPE

Related

Oracle -> Apache Kafka -> Postgres (Found a record at a null key and null key schema)

Error: Sink connector 'jdbc-sink' is configured with
'delete.enabled=true' and 'pk.mode=record_key' and therefore requires
records with a non-null key and non-null Struct or primitive key
schema, but found record at
(topic='person',partition=0,offset=0,timestamp=1676879822391) with a
null key and null key schema.
(org.apache.kafka.connect.runtime.WorkerSinkTask:566)
Although I am inserting data with a non-null value in the table and table is defined with non-null constraint. Source connector gives me an error:
This is my source connector:
{
"name" : "source",
"config" : {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"connection.url": "jdbc:oracle:thin:#192.168.91.253:1521/orcl",
"connection.user":"test",
"connection.password":"12",
"topic.prefix":"person",
"mode":"incrementing",
"poll.interval.ms":"1000",
"incrementing.column.name":"ID",
"numeric.mapping": "best_fit",
"query": "SELECT * FROM person",
"include.schema.changes": "true",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "true",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "true"
}
}
This is my sink connector:
{
"name": "jdbc-sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics":"person",
"connection.url": "jdbc:postgresql://192.168.91.229:5432/postgres?user=postgres&password=postgres ",
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones": "false",
"delete.enabled": "true",
"pk.fields": "ID",
"pk.mode":"record_key",
"insert.mode": "upsert",
"auto.create": "true",
"include.schema.changes": "true",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "true",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "true"
}
}

Can JDBC Sink Connector work on unique key instead of primary key?

I have a source PostgreSQL table with following columns
ID: Long
FirstName: Varchar
...
I am getting the messages as events in Kafka using Debezium. This is working fine
My question is related to JDBC Sink. My target table is:
ID: UUID
UserID: Long
FirstName: Varchar
If you notice the ID Type here is UUID and UserID is the one that is ID from source table.
So question is can I have my own primary key i.e ID and still can have upsert commands work?
My Config:
{
"name": "users-task-service",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"key.converter.schemas.enable": "true",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "true",
"database.hostname": "host.docker.internal",
"topics": "postgres.public.users",
"connection.url": "jdbc:postgresql://host.docker.internal:5432/tessting",
"connection.user": "postgres",
"connection.password": "",
"auto.create": "false",
"insert.mode": "upsert",
"table.name.format": "users_temp",
"dialect.name": "PostgreSqlDatabaseDialect",
"transforms": "unwrap, RenameField",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.RenameField.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.RenameField.renames": "id:userid",
"pk.fields": "id",
"pk.mode": "record_key",
"delete.enabled": "true",
"fields.whitelist": "userid"
}
}

S3 connector with HourlyPartitioner failing

When we tried to write into S3 through S3 sink connector with default config, working fine without any issue. But when we tried with hourly partition getting failed with below error.
Please find the both codes and error messages and help us to resolve this issue
Default :
{
"value.converter.schemas.enable": "false",
"name": "tibconew1-test-s3standard-default-sink-connector",
"connector.class": "io.confluent.connect.s3.S3SinkConnector",
"tasks.max": "2",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.storage.StringConverter",
"errors.tolerance": "all",
"topics": [
"test.s3custom.default.dax.shipment.data",
"test.s3custom.default.dax.shipment.data",
"test.s3custom.hourly.onprem.tibco.dax_shipment.dpp_asn"
],
"topics.regex": "",
"errors.deadletterqueue.topic.name": "dlq_test.s3custom.default.dax.shipment.data",
"errors.deadletterqueue.context.headers.enable": "true",
"format.class": "io.confluent.connect.s3.format.json.JsonFormat",
"flush.size": "1000",
"s3.bucket.name": "test-stg-raw",
"s3.region": "us-east-1",
"s3.credentials.provider.class": "com.amazonaws.auth.InstanceProfileCredentialsProvider",
"s3.acl.canned": "bucket-owner-full-control",
"storage.class": "io.confluent.connect.s3.storage.S3Storage",
"topics.dir": "streams_dir",
"partitioner.class": "io.confluent.connect.storage.partitioner.DefaultPartitioner"
}
Hourly :
{
"value.converter.schema.registry.url": "https://confschema.test-dsol-core.testdigital-stg.com",
"value.converter.schemas.enable": "false",
"name": "test.s3custom.hourly.tibco.dax_shipment.dpp_asn.sink-connector",
"connector.class": "io.confluent.connect.s3.S3SinkConnector",
"tasks.max": "2",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"errors.tolerance": "all",
"topics": [
"test.s3custom.hourly.onprem.tibco.dax_shipment.dpp_asn"
],
"topics.regex": "",
"errors.deadletterqueue.topic.name": "dlq_test.s3custom.hourly.onprem.tibco.dax_shipment.dpp_asn.sink",
"errors.deadletterqueue.context.headers.enable": "true",
"format.class": "io.confluent.connect.s3.format.json.JsonFormat",
"flush.size": "10",
"s3.bucket.name": "test-stg-raw",
"s3.region": "us-east-1",
"s3.credentials.provider.class": "com.amazonaws.auth.InstanceProfileCredentialsProvider",
"s3.acl.canned": "bucket-owner-full-control",
"storage.class": "io.confluent.connect.s3.storage.S3Storage",
"topics.dir": "streams_dir",
"partitioner.class": "io.confluent.connect.storage.partitioner.HourlyPartitioner",
"locale": "en-US",
"timezone": "America/Chicago",
"timestamp.extractor": "RecordField",
"timestamp.field": "DPP_ASN.LST_UPDT_TS"
}
Error :
Finally we found the reason . Due to timestamp received from the payload is an invalid format which has additional space in it.So we corrected the format in source side. For the hourly partitioner, the connector is expecting the value is based on hours.
Hourly Partitioner:
io.confluent.connect.storage.partitioner.HourlyPartitioner is equivalent to the TimeBasedPartitioner with path.format='year'=YYYY/'month'=MM/'day'=dd/'hour'=HH and
Message was : "LST_UPDT_TS":"2021-02-01 07:16:23.567"
Corrected as : "LST_UPDT_TS":"2015-08-01T17:00:00.69243-05:00"

Confluent Kafka JDBC Source to Oracle EBR table

Confluent JDBC Source (Confluent 5.5) cannot detect a table with EBR (Edition Based Redefinition) from Oracle:
Config:
{
"name": "source-oracle-bulk2",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"tasks.max": 4,
"connection.url": "jdbc:oracle:thin:#//host:1521/service",
"connection.user": "user",
"connection.password": "passw",
"mode": "bulk",
"topic.prefix": "oracle-bulk-",
"numeric.mapping": "best_fit",
"poll.interval.ms": 60000,
"table.whitelist" : "SCHEMA1.T1"
}
}
Connect log:
connect | [2020-07-20 17:24:09,156] WARN No tasks will be run because no tables were found (io.confluent.connect.jdbc.JdbcSourceConnector)
With Query it successfully ingests data:
{
"name": "source-oracle-bulk1",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"tasks.max": 1,
"connection.url": "jdbc:oracle:thin:#//host:1521/service",
"connection.user": "user",
"connection.password": "pass",
"mode": "bulk",
"topic.prefix": "oracle-bulk-T1",
"numeric.mapping": "best_fit",
"poll.interval.ms": 60000,
"dialect.name" : "OracleDatabaseDialect",
"query": "SELECT * FROM SCHEMA1.T1"
}
}
Could it be that "table.types" should be specified for some specific type ?
I tried "VIEW" but it just cannot create a Source and fails with timeout: {"error_code":500,"message":"Request timed out"}

having a problem with the flatten value transformation

I am attempting to flatten a topic before sending it along to my postgres db, using something like the connector below. I am using the confluent 4.1.1 kafka connect docker image, the only change being I copied a custom connector jar into /usr/share/java and am running it under a different accoount.
version (kafka connect) "1.1.1-cp1"
commit "0a5db4d59ee15a47"
{
"name": "problematic_postgres_sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"key.converter.schema.registry.url": "http://kafkaschemaregistry.service.consul:8081",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://kafkaschemaregistry.service.consul:8081",
"connection.url": "jdbc:postgresql://123.123.123.123:5432/mypostgresdb",
"connection.user": "abc",
"connection.password": "xyz",
"insert.mode": "upsert",
"auto.create": true,
"auto.evolve": true,
"topics": "mytopic",
"pk.mode": "kafka",
"transforms": "Flatten",
"transforms.Flatten.type": "org.apache.kafka.connect.transforms.Flatten$Value",
"transforms.Flatten.delimiter": "_"
}
}
I get a 400 error code:
Connector configuration is invalid and contains the following 1
error(s): Invalid value class
org.apache.kafka.connect.transforms.Flatten for configuration
transforms.Flatten.type: Error getting config definition from
Transformation: null

Resources