I'm using JDBC source connector to transfer data from Oracle to Kafka topic. I want to transfer 10 different oracle tables to same kafka topic using JDBC source connector with table name mentioned somewhere in message(e.g: header) . Is it possible?
with table name mentioned somewhere in message
You can use an ExtractTopic transform to read the topic name from a column in the tables
Otherwise, if that data isn't in the table, you can use the InsertField transform with static.value before the extract one to force the topic name to be the same
Note: If you use Avro or other record-type with schemas, and your tables do not have the same schema (column names and types), then you should expect all but the first producer to fail, becuase the schemas would be incompatible
Related
I'm trying cdc on the confluent cloud with debezium source connector and jdbc sink connector. Both of connectors are fully managed type. I am troubled with topic name and its table name pair.
In my cdc pipeline, topic name must be converted to table name like this:
shop.public.table1 --> table1
shop.public.table2 --> table2
shop.public.table3 --> table3
My question is very similar with old solved question:
Upserting into multiple tables from multiples topics using kafka-connect
But, my cdc pipeline works under confluent cloud and RegexRouter is not supported.
https://docs.confluent.io/platform/current/connect/transforms/regexrouter.html
Is there any idea split topics to proper tables?
Our current pipeline is following a structure similar to the one outlined here except we are pulling events from Oracle and pushing them to snowflake. The flow goes something like this:
Confluent Oracle CDC Source Connector mining the Oracle transaction log
Pushing these change events to a Kafka topic
Snowflake Sink Connector reading off the Kafka topic and pulling raw messages into Snowflake table.
In the end I have a table of record_metadata, and record_content fields that contain the raw kafka messages.
I'm having to build a set of procedures that handle the merge/upsert logic operating on a stream on top of the raw table. The tables I'm trying to replicate in snowflake are very wide and there are around 100 of them, so writing the SQL merge statements by hand is unfeasible.
Is there a better way to ingest the Kafka topic containing all of the CDC events generated from the Oracle connector straight into Snowflake, handling auto-creating nonexistent tables, auto-updating/deleting/etc as events come across the stream?
Kafka JDBC sink connector - is it possible to store the topic data as a json into the postgre DB. Currently it parse each json data from Topic and map it to the corresponding column in the table.
If anyone has worked on a similar case, can you please help me what are the config details I should add inside the connector code.
I used the below code. But, it didn't work.
"key.converter":"org.apache.kafka.connect.storage.StringConverter",
"key.converter.schemas.enable":"false",
"value.converter":"org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable":"false"
The JDBC sink requires a Struct type (JSON with Schema, Avro, etc)
If you want to store a string, that string needs to be the value of a key that corresponds to a database column. That string can be anything, including delimited JSON
I am trying to read 2 kafka topics using JDBC sink connector and upsert into 2 Oracle tables which I manually created it. Each table has 1 primary key I want to use it in upsert mode. Connector works fine if I use only for 1 topic and only 1 field in pk.fields but if I enter multiple columns in pk.fields one from each table it fails to recognize the schema. Am I missing any thing please suggest.
name=oracle_sink_prod
connector.class=io.confluent.connect.jdbc.JdbcSinkConnector
tasks.max=1
topics=KAFKA1011,JAFKA1011
connection.url=URL
connection.user=UID
connection.password=PASSWD
auto.create=false
table.name.format=KAFKA1011,JAFKA1011
pk.mode=record_value
pk.fields= ID,COMPANY
auto.evolve=true
insert.mode=upsert
//ID is pk of kafka1011 table and COMPANY is of other
If the PK are different, just create two different sink connectors. They can both run on the same Kafka Connect worker.
You also have the option of using the key of the Kafka message itself. See doc for more info. This is the more scalable option, and you would then just need to ensure that your messages were keyed correctly for this to flow down to the JDBC Sink.
I have a reporting framework to build and generate reports (tabular format reports). As of now I used to write SQL query and it used to fetch data from Oracle. Now I have got an interesting challenge where half of data will come from Oracle and remaining data come from MongoDB based on output from Oracle data. Fetched tabular format data from Oracle will have one additional column which will contain key to fetch data from MongoDB. With this I will have two data set in tabular format one from Oracle data and one from MongoDB. Based on one common column I need to merge both table data and produce one data set to produce report.
I can write logic in java code to merge two tables (say data in 2D array format). But instead of doing this from my own, I am thinking to utilize some RDBMS in-memory data concept. For example, H2 database, where I can create two tables in memory on the fly and execute H2 queries to merge two tables. Or, I believe, there could be something in Oracle too like global temp table etc. Could someone please suggest the better approach to join oracle table data with MongoDB collection.
I think you can try and use Kafka and Spark Streaming to solve this problem. Assuming your data is transactional, you can create a Kafka broker and create a topic. Then make change to the existing services where you are saving to Oracle and MongoDB. Create 2 Kafka producers (one for Oracle and another for Mongo) to write the data as streams to the Kafka topic. Then create a consumer group to receive streams from Kafka. You may then aggregate the real time streams using a Spark cluster(You can look at Spark Streaming API for Kafka 1) and save the results back to MongoDB (using Spark Connector from MongoDB 2) or any other distributed database. Then you can do data visualizations/reporting on those results stored in MongoDB.
Another suggestion would be to use apache drill. https://drill.apache.org
You can use a mongo and JDBC drill bits and then you can join oracle tables and mongo collections together.