We are consuming SourceRecord kafka object from debezium-embedded to ingest into bigquery receiving CDC events from MySQL .
is there an opensource converter class to convert source record object into SQL DML Queries?
Any inputs will be highly appreciated, thanks.
Related
Our current pipeline is following a structure similar to the one outlined here except we are pulling events from Oracle and pushing them to snowflake. The flow goes something like this:
Confluent Oracle CDC Source Connector mining the Oracle transaction log
Pushing these change events to a Kafka topic
Snowflake Sink Connector reading off the Kafka topic and pulling raw messages into Snowflake table.
In the end I have a table of record_metadata, and record_content fields that contain the raw kafka messages.
I'm having to build a set of procedures that handle the merge/upsert logic operating on a stream on top of the raw table. The tables I'm trying to replicate in snowflake are very wide and there are around 100 of them, so writing the SQL merge statements by hand is unfeasible.
Is there a better way to ingest the Kafka topic containing all of the CDC events generated from the Oracle connector straight into Snowflake, handling auto-creating nonexistent tables, auto-updating/deleting/etc as events come across the stream?
I'm using JDBC source connector to transfer data from Oracle to Kafka topic. I want to transfer 10 different oracle tables to same kafka topic using JDBC source connector with table name mentioned somewhere in message(e.g: header) . Is it possible?
with table name mentioned somewhere in message
You can use an ExtractTopic transform to read the topic name from a column in the tables
Otherwise, if that data isn't in the table, you can use the InsertField transform with static.value before the extract one to force the topic name to be the same
Note: If you use Avro or other record-type with schemas, and your tables do not have the same schema (column names and types), then you should expect all but the first producer to fail, becuase the schemas would be incompatible
Example:
{"id":"1","firstName":"abc","lastName":"xyz","dob":"12/09/1995","age":"23"}
This message structure is in kafka topic, but i want to index this in elasticsearch as below
{"id":"1","name"{"firstName":"abc","lastName":"xyz"},"dob":"12/09/1995","age":"23"}
how I can achieve this?
Two options:
Stream processing against the data in the Kafka topic. Using Kafka Streams you could wrangle the data model as required. KSQL would work for this in inverse but doesn't support creating STRUCTs yet. Other stream processing options would be Flink, Spark Streaming, etc
Modify the data as it passes through Kafka Connect, using Single Message Transform. There's no pre-built transform that does this but you could write one using the API.
Disclaimer: I work for Confluent, the company behind the open-source KSQL project, constributes to Kafka Streams, Kafka Connect, etc.
I'm connecting Oracle to Kafka by using a JDBC connector. When data comes in from Oracle, it is converted correctly except for the Oracle Columns that are Numbers. For such columns, the data is not decoded. The following is an example:
{"ID":"\u0004{","TYPE":"\u0000Ù","MODE":"bytes":"\u0007"},"STAT_TEMP":{"string":"TESTING"}}
I should mention that I'm also connecting the Kafka to spark such that I get the same output in the spark.
I'm wondering what is the best way to convert the data?
Whether to do it in Kafka or spark. If in Kafka, what is your suggestion in how to convert it?
Add in your connector config numeric.mapping
"numeric.mapping":"best_fit"
for more explication here
I have a reporting framework to build and generate reports (tabular format reports). As of now I used to write SQL query and it used to fetch data from Oracle. Now I have got an interesting challenge where half of data will come from Oracle and remaining data come from MongoDB based on output from Oracle data. Fetched tabular format data from Oracle will have one additional column which will contain key to fetch data from MongoDB. With this I will have two data set in tabular format one from Oracle data and one from MongoDB. Based on one common column I need to merge both table data and produce one data set to produce report.
I can write logic in java code to merge two tables (say data in 2D array format). But instead of doing this from my own, I am thinking to utilize some RDBMS in-memory data concept. For example, H2 database, where I can create two tables in memory on the fly and execute H2 queries to merge two tables. Or, I believe, there could be something in Oracle too like global temp table etc. Could someone please suggest the better approach to join oracle table data with MongoDB collection.
I think you can try and use Kafka and Spark Streaming to solve this problem. Assuming your data is transactional, you can create a Kafka broker and create a topic. Then make change to the existing services where you are saving to Oracle and MongoDB. Create 2 Kafka producers (one for Oracle and another for Mongo) to write the data as streams to the Kafka topic. Then create a consumer group to receive streams from Kafka. You may then aggregate the real time streams using a Spark cluster(You can look at Spark Streaming API for Kafka 1) and save the results back to MongoDB (using Spark Connector from MongoDB 2) or any other distributed database. Then you can do data visualizations/reporting on those results stored in MongoDB.
Another suggestion would be to use apache drill. https://drill.apache.org
You can use a mongo and JDBC drill bits and then you can join oracle tables and mongo collections together.