Apache Kafka for an existing get request with Oracle DB - oracle

I’m trying to learn about streaming services and reading kafka doc’s :
https://kafka.apache.org/quickstart
https://kafka.apache.org/24/documentation/streams/quickstart
To take a simple example I’m attempting to refactor a Spring web services GET request which accepts an ID parameter and returns a list of attributes associated with that ID. The DB backend is Oracle.
What is the approach for loading a single Oracle DB table which can be served by Kafka ? The above docs don't contain information for this. Do I need to replicate the Oracle DB to a NoSql DB such as MongoDB ? (Why we require Apache Kafka with NoSQL databases?)

Kafka is an event streaming platform. It is not a database. Instead of thinking about "loading a single Oracle DB table which can be served by Kafka", you need to think in terms of what events are you looking for that will trigger processing?
Change Data Capture (CDC) products like Oracle Golden Gate (there are other products too) will detect changes to rows and send messages into Kafka each time a row changes.
Alternatively you could configure a Kafka JDBC Source Connector to execute a query and pull data into Kafka.

Related

Kafka Connect JDBC dblink

I'm starting to study Apache Kafka and Kafka Connect.
I'm trying to get data from a remote Oracle Database that my user only have read privilegies and can't list tables (i don't have permission to change that). To every query, i have to pass a dblink, but in the JDBC Connector, i didn't find a option to pass a dblink.
I can do the query if i pass a specific query on the connector configuration, but i want to fetch allot of tables and speficifying the query on the connector, would make me create allot of connectors.
There's a way to pass the dblink on the connector configuration or to the JDBC URL?

Kafka jdbc connector as change data capture

I am trying to use Kafka jdbc connector to only pull in rows from my database that have changed since the last pull.
The database is controlled by another team and they have a habit of reloading the entire database twice a day even if no information have changed. They also update the field :load-time, so the kafka connector, it will always look like a change.
Is there a way to tell kafka jdbc connector to only look in the relevant columns to detect a change?

How do we reset the state associated with a Kafka Connect source connector?

We are working with Kafka Connect 2.5.
We are using the Confluent JDBC source connector (although I think this question is mostly agnostic to the connector type) and are consuming some data from an IBM DB2 database onto a topic, using 'incrementing mode' (primary keys) as unique IDs for each record.
That works fine in the normal course of events; the first time the connector starts all records are consumed and placed on a topic, then, when new records are added, they are added to our topic. In our development environment, when we change connector parameters etc., we want to effectively reset the connector on-demand; i.e. have it consume data from the “beginning” of the table again.
We thought that deleting the connector (using the Kafka Connect REST API) would do this - and would have the side-effect of deleting all information regarding that connector configuration from the Kafka Connect connect-* metadata topics too.
However, this doesn’t appear to be what happens. The metadata remains in those topics, and when we recreate/re-add the connector configuration (again using the REST API), it 'remembers' the offset it was consuming from in the table. This seems confusing and unhelpful - deleting the connector doesn’t delete its state. Is there a way to more permanently wipe the connector and/or reset its consumption position, short of pulling down the whole Kafka Connect environment, which seems drastic? Ideally we’d like not to have to meddle with the internal topics directly.
Partial answer to this question: it seems the behaviour we are seeing is to be expected:
If you’re using incremental ingest, what offset does Kafka Connect
have stored? If you delete and recreate a connector with the same
name, the offset from the previous instance will be preserved.
Consider the scenario in which you create a connector. It successfully
ingests all data up to a given ID or timestamp value in the source
table, and then you delete and recreate it. The new version of the
connector will get the offset from the previous version and thus only
ingest newer data than that which was previously processed. You can
verify this by looking at the offset.storage.topic and the values
stored in it for the table in question.
At least for the Confluent JDBC connector, there is a workaround to reset the pointer.
Personally, I'm still confused why Kafka Connect retains state for the connector at all when it's deleted, but seems that is designed behaviour. Would still be interested if there is a better (and supported) way to remove that state.
Another related blog article: https://rmoff.net/2019/08/15/reset-kafka-connect-source-connector-offsets/

Kafka connect SourceTask commit() and commitRecord() methods

I am new to Kafka connect and trying to build an acknowledgement mechanism for my custom JDBC source connector (reading from oracle DB). So, whenever the data gets added to Kafka topic, I want to update the status/offset in my source DB table. The confluent docs for Kafka connect mentions 2 methods: commit and commitRecord for this but states that "The APIs are provided for source systems which have an acknowledgement mechanism for messages" (ref: https://docs.confluent.io/platform/current/connect/devguide.html, refer section: "Task Example - Source Task")
Does oracle DB supports acknowledgement mechanism?
If yes, can we use commit() or commitRecord() to update the status/offset in source DB?
How to implement these methods?
Can we use the default JDBC source connector for this? (https://docs.confluent.io/3.2.0/connect/connect-jdbc/docs/source_connector.html)
I am wondering why you want to mark records in source Oracle table that were read? If something was written to Kafka topic it means that it was read from source. In that case you can just use Confluent's JdbcSourceConnector with OracleDatabaseDialect.
You can of course create the Sink connector which will read from topic and update records in source table, but it art for art's sake.

Oracle to Neo4j Sync

Do we have any utility to sync data between Oracle & Neo4J database. I want to use Neo4j in readonly mode & all writes will happen to oracle DB.
I think this depends on how often you want to have the data synced. Are you looking for a periodic sync/ETL process (say hourly or daily), or are looking for live updates into Neo4j?
I'm not aware of tools designed for this, but it's not terribly difficult to script yourself.
A periodic sync is obviously easiest. You can do that directly using the Java API and connecting via JDBC to Oracle. You could also just dump the data from Oracle as a CSV and import into Neo4j. This would be done similiarly to how data is imported from PostreSQL in this article: http://neo4j.com/developer/guide-importing-data-and-etl/
There is a SO response for exporting data from Oracle using sqlplus/spool:
How do I spool to a CSV formatted file using SQLPLUS?
If you're looking for live syncing, you'd probably do this either through monitoring the transaction log or by adding triggers onto your tables, depending on the complexity of your data.

Resources