More than one JDBC Connectors one single topic - jdbc

I have requirement to consume two exactly same table on different database (location also differ) to publish on same topic.
Kafka JDBC connectors doesn't explain how it manage high-watermark so thought to check whats best practices in this scenario?
1. Can we keep 2 separate JDBC connector publishing to separate topic
2. Can we keep 2 separate JDBC connector publishing to same topic.
If we choose option 2 how Kafka JDBC connector manage in case message arrived into table concurrently at same time? How it manage different database time zone?

Can we keep 2 separate JDBC connector publishing to separate topic
Yes.
Can we keep 2 separate JDBC connector publishing to same topic.
Yes
how Kafka JDBC connector manage in case message arrived into table concurrently at same time?
You'll get both messages on the target topic. Your consumer would need logic in it to deal with the duplication if there is any. You could use a Single Message Transform to set the key on the message written to the topic and use that as part of the de-duplication.

Related

Spring Kafka JDBC Connector compatibility

Is Kafka JDBC connect compatible with Spring-Kafka library?
I did follow https://www.confluent.io/blog/kafka-connect-deep-dive-jdbc-source-connector/ and still have some confusions.
Let's say you want to consume from a Kafka topic and write to a JDBC database. Some of your options are
Use plain Kafka consumer to consume from the topic and use Jdbc api to write the consumed record to database.
Use spring Kafka to consume from the Kafka Topic and spring jdbc template or spring data to write it to the database
Use Kafka connect with Jdbc connector as sink to read from topic and write to a table.
So as you can see
Kafka Jdbc connector is a specialised component that can only do one job.
Kafka Consumer is very genric component which can do lot of job and you will be writing lot of code. In facr, it will be the foundational API from which other frameworks build on and specialise.
Spring Kafka simplfies it and let you deal with kafka records as java objects but doesnt tell you how to write that object to your db.
So they are alternative solutions to fulfil the task. Having said that you may have a flow where different segments are controlled by different teams and for each segment, any of them can be used and Kafka topic will act as joining channel

Kafka Topic to Oracle database using Kafka Connect API JDBC Sink Connector Example

I know to write a Kafka consumer and insert/update each record into Oracle database but I want to leverage Kafka Connect API and JDBC Sink Connector for this purpose. Except the property file, in my search I couldn't find a complete executable example with detailed steps to configure and write relevant code in Java to consume a Kafka topic with json message and insert/update (merge) a table in Oracle database using Kafka connect API with JDBC Sink Connector. Can someone point demonstrate an example including configuration and dependencies? Are there any disadvantages with this approach? Do we anticipate any potential issues when table data increases to millions?
Thanks in advance.
There won't be an example for your specific use-case becuase the JDBC connector is meant to be generic.
Here is one configuration example with an Oracle database
All you need is
A topic of some format
key.converter and value.converter to be set to deserialize that topic
Your JDBC string and database schema (tables, projection fields, etc)
Any other JDBC Sink Specific Options
All this goes in a Java properties / JSON file, not Java source code
If you have a specific issue creating this configuration, please comment.
Do we anticipate any potential issues when table data increases to millions?
Well, those issues would be database server related, not with Kafka Connect. For example, disk filling up or increased load while accepting continuous writes.
Are there any disadvantages with this approach?
You'd have to handle de-deduplication or record expiration (e.g. GDPR) separately, if you did want that.

Kafka streams - exactly once configuration

I was trying to use exactly once capabilities of kafka using kafka streams library. I've only configured proessing.guarantee as exactly_once. Along with this, there is a need to have transaction state stored in a internal topic (__transaction_state).
My question is, how to customize the name of the topic? if kafka cluster is being shared by multiple consumers, does each customer need a different topic for transaction management?
Thanks
Murthy
You don't need to worry about the topic __transaction_state -- it's an internal topic that will be automatically created for you -- you don't need to create it manually and it will always have this name (it's not possible to customize the name). It will be used for all producers that use transactions.

JMS Bridge use case

What are the typical use cases for JMS bridges? I.e. when should you typically prefer setting up a JMS bridge vs just using a regular queue with a producer and consumer?
Dont know if this is really relevant. But I had a case where I needed regular data from around 350 servers all running JMS Server and producing indivisual reports. So in order to centralise this process we created a central Server with Oracle integration and created JMS bridges to all 350 servers to fetch data from specific topics and queues. This was a more practical solution rather than creating 350 subscribers.

Is it possible for Spring-XD to listen to more than one JMS broker at a time?

I've managed to get Spring Xd working for a scenario where I have data coming in from one JMS broker.
I potentially am facing a scenario where data ingestion could happen from different sources thereby needing me to connect to different brokers.
Based on my current understanding, I'm not quite sure how to do this as there exists a JMS config file which allows you to setup only one broker.
Is there a workaround to this?
At the moment, you would have to create a separate jms-[provider]-infrastructure-context.xml for each broker (in modules/common), say call the provider activemq2.
Then use --provider=activemq2 in the module definition.
(I recently used this technique to test sonicmq and hornetq providers).

Resources