I am trying to export data from Kafka to Oracle db. I've searched related questions and web but could not understand that we need a platform (confluent etc.. ) or not. I'd been read the link below but it's not clear enough.
https://docs.confluent.io/3.2.2/connect/connect-jdbc/docs/sink_connector.html
So, what we actually need to export data without 3rd party platform? Thanks in advance.
It's not clear what you mean by "third-party" here
What you linked to is Kafka Connect, which is Apache 2.0 Licensed and open source.
Kafka Connect is a plugin ecosystem, you install connectors individually, written by anyone, or write your own, just like any other Java dependency (i.e. a third-party)
The JDBC connector just happens to be maintained by Confluent. and you can configure the Confluent Hub CLI
to install within any Kafka Connect distribution (or use Kafka Connect Docker images from Confluent)
Alternatively, you use Apache Spark, Flink, Nifi, and many other Kafka Consumer libraries to read data and then start an Oracle transaction per record batch
Or you can explore non-JVM kafka libraries as well and use a language you're more familiar with doing Oracle operations with
Related
I am a junior programmer in banking. I want to make a microservice system that get data from kafka and processes it. after that, save to database and send final data to client app. What technology can i use? I plan to use spring bacth and kafka. Can the technology be implemented in my project or is there a better alternative?
To process data from a Kafka topic I recommend you to use Kafka Streams API, especially Spring Kafka Streams.
Kafka Streams and Spring
And to store the data in a database, you should use a Kafka Sink Connector.
Kafka Connect
This approach is very common and easy if your company has a Kafka ecosystem.
In terms of alternatives, here you will find an interesting comparison:
https://scramjet.org/blog/welcome-to-the-family
3 in 1 serverless
Scramjet takes a slightly different approach - 3 platforms in one.
Both the free product https://hub.scramjet.org/ for installation on your server and the cloud platform are available - currently also free in the beta version https://scramjet.org/#join-beta
I'd like to use Kakfa Connect to move JSON messages from Kafka to HDFS and then Impala, only using OpenSource libs.
I was trying to understand if I can use the Confluent Sink library for Kakfa Connect, without the need to use the entire Confluent distribution.
Are there are other and/or better options to achieve this?
The Kafka Connect HDFS 2 Sink is available under the Confluent Community Licence. It is a plugin for Apache Kafka; you do not have to run Confluent Platform to use it.
Is there any way to fetch incremental data from an Oracle database using user-defined query using JDBC?
We are ok to use Spark, Kafka or plain JDBC.
The only thing it should be able to support heavy load.
You've not specified the destination. If it's a Kafka topic then using Apache Kafka makes sense to do the extract too, using Kafka Connect.
In which case, you can use the Kafka Connect JDBC connector to do this. See here for the specifics on using incremental mode with a custom query.
++ EDIT ++
If your final target is BigQuery then you can use Kafka Connect for that too with the appropriate BigQuery connector. You can see an example of it in action here.
I would like to use an open source version of kafka-connect instead of the confluent one as it appears that confluent cli is not for production and only for dev. I would like to be able to listen to changes on mysql database on aws ec2. Can someone point me in the right direction.
Kafka Connect is part of Apache Kafka. Period. If you want to use Kafka Connect you can do so with any modern distribution of Apache Kafka.
You then need a connector plugin to use with Kafka Connect, specific to your source technology. For integrating with a database there are various considerations, and available for MySQL you specifically have:
Kafka Connect JDBC - see it in action here
Debezium - see it in action here
The Confluent CLI is just a tool for helping manage and deploy Confluent Platform on developer machines. Confluent Platform itself is widely used in production.
I'm trying to find examples of kafka connect with springboot. It looks like there is no spring boot integration for kafka connect. Can some one point me in the right direction to be able to listen to changes on mysql db?
Kafka Connect doesn't really need Spring Boot because there is nothing for you to code for it, and it really works best when ran in distributed mode, as a cluster, not embedded within other (single-instance) applications. I suppose if you did want to do it, then you could copy relevent portions of the source code, but that of course isn't using Spring Boot, and you'd have to wire it all yourself
The framework itself consists of a few core Java dependencies that have already been written (Debezium or Confluent JDBC Connector, for your mysql example), and two config files. One for Kafka Connect to know the bootstrap servers, serializers, etc. and another for the actual MySQL connector. So, if you want to use Kafka Connect, run it by itself, then just write the consumer in the Spring app.
The alternatives to Kafka Connect itself would be to use Apache Camel within a Spring application (Spring Integration) or Spring Cloud Dataflow and interfacing with those Kafka "components" (which aren't using the Connect API, AFAIK)
Another option, specific for listening to MySQL, is to use Debezium Engine within your code.