I'd like to use the Confluent JDBC Sink Connector via ksql to write to ClickHouse database.
I have a c# application that writes the data to Kafka topic. How can I format the message from my application, so that it is acceptable for sink to write to the database? I don't want to use the Schema Registry or other ksql constructs.
KSQL accepts JSON or CSV data, however ClickHouse has it's own Kafka Connector, so shouldn't need JDBC Sink, which will only work with a message with a schema (meaning you will need to use the Schema Registry, which is not only a KSQL construct and can be used in your C# code as well)
Related
Is Kafka JDBC connect compatible with Spring-Kafka library?
I did follow https://www.confluent.io/blog/kafka-connect-deep-dive-jdbc-source-connector/ and still have some confusions.
Let's say you want to consume from a Kafka topic and write to a JDBC database. Some of your options are
Use plain Kafka consumer to consume from the topic and use Jdbc api to write the consumed record to database.
Use spring Kafka to consume from the Kafka Topic and spring jdbc template or spring data to write it to the database
Use Kafka connect with Jdbc connector as sink to read from topic and write to a table.
So as you can see
Kafka Jdbc connector is a specialised component that can only do one job.
Kafka Consumer is very genric component which can do lot of job and you will be writing lot of code. In facr, it will be the foundational API from which other frameworks build on and specialise.
Spring Kafka simplfies it and let you deal with kafka records as java objects but doesnt tell you how to write that object to your db.
So they are alternative solutions to fulfil the task. Having said that you may have a flow where different segments are controlled by different teams and for each segment, any of them can be used and Kafka topic will act as joining channel
I try to understand how Kafka Streams API works with Schema Registry.
I know that you must specify Schema Registry URL when you setup your application, but I cannot understand how my application retrieves correct schema from registry without specifying a subject name or ID.
It retrieves schema using topic name?
I try to understand how Kafka Streams API works with Schema Registry.
Kafka Streams integrates with Confluent Schema Registry through the use of a Schema-Registry-aware Serde (serializer/deserializer) in your Kafka Streams application. Today, Schema Registry supports only Avro as the data format (additional formats like Protobuf and JSON are planned), hence there is an Avro Serde that integrates with Schema Registry. See Confluent's Kafka Streams documentation on 'Avro Serde'.
I know that you must specify Schema Registry URL when you setup your application
And that is why, see above, you must specify the SR URL when configuring your Kafka Streams application, because this setting will be passed from Kafka Streams to the Avro Serde.
I cannot understand how my application retrieves correct schema from registry without specifying a subject name or ID.
The subject name in Schema Registry a combination of topic name and a suffix, which is either -key or -value depending on whether the Serde is used to serialize/deserialize a Kafka message key or a Kafka message value, respectively (see Schema Registry documentation on 'Subjects'). In other words, there is a naming convention that maps a Kafka topic (that is read from or written to by your Kafka Streams application with the Avro serde) and subjects in Schema Registry.
Also, a schema may have a schema ID, which is used to disambiguate schema resolution in situations when there are multiple schemas registered under the same subject (and thus topic). See Schema Registry documentation on 'Schema IDs'.
Is there any way to fetch incremental data from an Oracle database using user-defined query using JDBC?
We are ok to use Spark, Kafka or plain JDBC.
The only thing it should be able to support heavy load.
You've not specified the destination. If it's a Kafka topic then using Apache Kafka makes sense to do the extract too, using Kafka Connect.
In which case, you can use the Kafka Connect JDBC connector to do this. See here for the specifics on using incremental mode with a custom query.
++ EDIT ++
If your final target is BigQuery then you can use Kafka Connect for that too with the appropriate BigQuery connector. You can see an example of it in action here.
I know to write a Kafka consumer and insert/update each record into Oracle database but I want to leverage Kafka Connect API and JDBC Sink Connector for this purpose. Except the property file, in my search I couldn't find a complete executable example with detailed steps to configure and write relevant code in Java to consume a Kafka topic with json message and insert/update (merge) a table in Oracle database using Kafka connect API with JDBC Sink Connector. Can someone point demonstrate an example including configuration and dependencies? Are there any disadvantages with this approach? Do we anticipate any potential issues when table data increases to millions?
Thanks in advance.
There won't be an example for your specific use-case becuase the JDBC connector is meant to be generic.
Here is one configuration example with an Oracle database
All you need is
A topic of some format
key.converter and value.converter to be set to deserialize that topic
Your JDBC string and database schema (tables, projection fields, etc)
Any other JDBC Sink Specific Options
All this goes in a Java properties / JSON file, not Java source code
If you have a specific issue creating this configuration, please comment.
Do we anticipate any potential issues when table data increases to millions?
Well, those issues would be database server related, not with Kafka Connect. For example, disk filling up or increased load while accepting continuous writes.
Are there any disadvantages with this approach?
You'd have to handle de-deduplication or record expiration (e.g. GDPR) separately, if you did want that.
I have few tables in Hive and my goal is to create a view over them and then publish it over a topic in Kafka through Apache NiFi.
What are the options to get it done?
I am planning to do it through Nifi .
I'm sure Nifi would work,
see PutHiveStreaming processor, but sounds like a lot of effort.
Kafka Connect HDFS is able to consume Kafka data and automatically register a Hive table for you.
And if I misunderstood that, and you're trying to query Hive and publish it into a Kafka topic, then sure - Nifi is perfectly capable of that
Use SelectHiveQL and PublishKafka, however Kafka Connect JDBC Source should be able to query Hive and write to Kafka as well