Is it possible to configure Kafka Connect (Source) to generate a tombstone record?
I have a table recording 'delete' events. I can populate this to a topic and write some code to forward tombstone records to other topics as needed, but if I can have the JDBC source connector generate the tombstone record for me, I can skip the ode part. I'm not see a way to set the value in kafka source connect to 'null'.
Thanks
Related
Kafka JDBC Sink Connector
Kafka JDBC sink connector provide 3 insert.mode ..but i need update or insert functionality together . Anyone help how to achieve this.
upsert literally means both insert or update for existing keys that have already been inserted
You can consider the following steps:
separation events among 2 topics on source connector (topic for inserts and topic for updates)
procesing thee topics with independent sink connectors with different configurations.
I'm fairly new to Kafka connect. I'm planning to use kafka connect source to read data from my MySQL database tables into one of the kafka topics. Now, since my source table is a transactional data store, i might get a new record inserted into it or a record might be updated. Now, I'm trying to understand how can i achieve parallelism to read the data from this table and my question is,
Can i use max.tasks to achieve parallelism (have more than one thread) to read the data and push onto the kafka topic? If yes, Please explain.
Thanks
As part of requirement, we are going ahead with Kafka connect to push data to our database. What I read so far is that there will be a 1x1 mapping between message and db row i.e. for a single message on Kafka, there will be a corresponding entry in database.
I wanted to know if there is a possibility of breaking down a nested json into multiple rows to be inserted in to db?
The 2 possibilities that I can think of are:-
1) Write custom connector for jdbc sink
2) Use consumer group instead of kafka connect
Use consumer group instead of kafka connect
Connect is a consumer group. It's highly recommended not to write your own logic for handling connection failures, offset management, retires, etc. and let Connect do that work for you. If those "benefits" don't work for you, even then I think it would be better to fork the Connector code (your option 2) rather than write a plain Consumer
Connect single message transforms are roughly what you're looking for. Otherwise, you would write a consumer/producer/Kstreams application to read and write back to a "flattened" topic, and then Connect reads that output topic into the database.
Note: JDBC isn't your only option. Mongodb or Couchbase handle nested JSON just fine
I've got a log compacted topic in Kafka that is being written to Postgres via a JDBC sink connector. Though I've got mode=upsert set on the connector, it still adds a unique row in the sink database for each value because it's recording the topic offset (__connect_offset) and partition (__connect_partition) to each row along with the data.
How do I disable the JDBC Sink Connector from recording the topic information (which I don't care about)? Adding a fields.whitelist that grabs only my data columns did not succeed in preventing this metadata from creeping into my database.
An SMT like the following also does not work:
"transforms": "blacklist",
"transforms.blacklist.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.blacklist.blacklist": "__connect_partition, __connect_offset"
My bad... I had misconfigured my primary key on the connector. I thought that I was correctly telling it to convert the topic key into the table primary key. In the end, the following connector configuration worked:
"pk.mode": "record_key",
"pk.fields": "[Key column name here]"
I am trying to read 2 kafka topics using JDBC sink connector and upsert into 2 Oracle tables which I manually created it. Each table has 1 primary key I want to use it in upsert mode. Connector works fine if I use only for 1 topic and only 1 field in pk.fields but if I enter multiple columns in pk.fields one from each table it fails to recognize the schema. Am I missing any thing please suggest.
name=oracle_sink_prod
connector.class=io.confluent.connect.jdbc.JdbcSinkConnector
tasks.max=1
topics=KAFKA1011,JAFKA1011
connection.url=URL
connection.user=UID
connection.password=PASSWD
auto.create=false
table.name.format=KAFKA1011,JAFKA1011
pk.mode=record_value
pk.fields= ID,COMPANY
auto.evolve=true
insert.mode=upsert
//ID is pk of kafka1011 table and COMPANY is of other
If the PK are different, just create two different sink connectors. They can both run on the same Kafka Connect worker.
You also have the option of using the key of the Kafka message itself. See doc for more info. This is the more scalable option, and you would then just need to ensure that your messages were keyed correctly for this to flow down to the JDBC Sink.