Upserting into multiple tables from multiples topics using kafka-connect - jdbc

I am trying to read 2 kafka topics using JDBC sink connector and upsert into 2 Oracle tables which I manually created it. Each table has 1 primary key I want to use it in upsert mode. Connector works fine if I use only for 1 topic and only 1 field in pk.fields but if I enter multiple columns in pk.fields one from each table it fails to recognize the schema. Am I missing any thing please suggest.
name=oracle_sink_prod
connector.class=io.confluent.connect.jdbc.JdbcSinkConnector
tasks.max=1
topics=KAFKA1011,JAFKA1011
connection.url=URL
connection.user=UID
connection.password=PASSWD
auto.create=false
table.name.format=KAFKA1011,JAFKA1011
pk.mode=record_value
pk.fields= ID,COMPANY
auto.evolve=true
insert.mode=upsert
//ID is pk of kafka1011 table and COMPANY is of other

If the PK are different, just create two different sink connectors. They can both run on the same Kafka Connect worker.
You also have the option of using the key of the Kafka message itself. See doc for more info. This is the more scalable option, and you would then just need to ensure that your messages were keyed correctly for this to flow down to the JDBC Sink.

Related

JDBC Sink Connector - insert multiple topics to multiple tables with renaming

I'm trying cdc on the confluent cloud with debezium source connector and jdbc sink connector. Both of connectors are fully managed type. I am troubled with topic name and its table name pair.
In my cdc pipeline, topic name must be converted to table name like this:
shop.public.table1 --> table1
shop.public.table2 --> table2
shop.public.table3 --> table3
My question is very similar with old solved question:
Upserting into multiple tables from multiples topics using kafka-connect
But, my cdc pipeline works under confluent cloud and RegexRouter is not supported.
https://docs.confluent.io/platform/current/connect/transforms/regexrouter.html
Is there any idea split topics to proper tables?

I need Update or Insert functionality in Kafka JDBC Sink Connector

Kafka JDBC Sink Connector
Kafka JDBC sink connector provide 3 insert.mode ..but i need update or insert functionality together . Anyone help how to achieve this.
upsert literally means both insert or update for existing keys that have already been inserted
You can consider the following steps:
separation events among 2 topics on source connector (topic for inserts and topic for updates)
procesing thee topics with independent sink connectors with different configurations.

Send multiple oracle tables into single kafka topic

I'm using JDBC source connector to transfer data from Oracle to Kafka topic. I want to transfer 10 different oracle tables to same kafka topic using JDBC source connector with table name mentioned somewhere in message(e.g: header) . Is it possible?
with table name mentioned somewhere in message
You can use an ExtractTopic transform to read the topic name from a column in the tables
Otherwise, if that data isn't in the table, you can use the InsertField transform with static.value before the extract one to force the topic name to be the same
Note: If you use Avro or other record-type with schemas, and your tables do not have the same schema (column names and types), then you should expect all but the first producer to fail, becuase the schemas would be incompatible

Prevent Kafka JDBC Sink from recording __connect_partition and __connect_offset

I've got a log compacted topic in Kafka that is being written to Postgres via a JDBC sink connector. Though I've got mode=upsert set on the connector, it still adds a unique row in the sink database for each value because it's recording the topic offset (__connect_offset) and partition (__connect_partition) to each row along with the data.
How do I disable the JDBC Sink Connector from recording the topic information (which I don't care about)? Adding a fields.whitelist that grabs only my data columns did not succeed in preventing this metadata from creeping into my database.
An SMT like the following also does not work:
"transforms": "blacklist",
"transforms.blacklist.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.blacklist.blacklist": "__connect_partition, __connect_offset"
My bad... I had misconfigured my primary key on the connector. I thought that I was correctly telling it to convert the topic key into the table primary key. In the end, the following connector configuration worked:
"pk.mode": "record_key",
"pk.fields": "[Key column name here]"

Sync data between MySQL Databases with Kafka Connect

I'm trying to sync data between several MySQL databases with Confluent which base on Kafka Connect. I used "bulk" for mode in source connector config, since the primary key type is varchar, so I couldn't use incrementing mode. It works fine, but I got two problems:
It seems that it couldn't sync deleting, when data was deleted in source databases, nothing happened to the sink databases. The data is still present in the sink databases.
It takes quite a while to sync data. In my case, it takes about 2~4 minutes to sync a table with 3~4k rows. I can understand that using bulk mode may make it take more time to sync the data, but isn't that too long?
Here is my source connector config:
name=test-source
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
tasks.max=1
connection.url=jdbc:mysql://xxx.xxx.xxx:3306/xxx?useUnicode=true&characterEncoding=utf8
connection.user=user
connection.password=password
mode=bulk
table.whitelist=a_table
And this is my sink connector config:
name=test-sink
connector.class=io.confluent.connect.jdbc.JdbcSinkConnector
tasks.max=1 topics=a_table
connection.url=jdbc:mysql://xxx.xxx.xxx.xxx:3306/xxx?useUnicode=true&characterEncoding=utf8
connection.user=user
connection.password=password
insert.mode=upsert
pk.mode=record_value
pk.fields=mypk
auto.evolve=true
Any suggestion would be appreciate. Thank you.
If you want to sync deletes, you'll need to use CDC, such as Debezium. JDBC connector can only detect records that are there, not those that aren't there.
CDC is also more efficient than a bulk fetch, since it monitors the MySQL transaction log for any transactions on the tables required.
Your primary key is VARCHAR? Wow. If you don't want to use CDC, I'd suggest using an INT-based key, and then incremental load with the JDBC connector. That, or add a timestamp column to the table, and use that for incremental.

Resources