JDBC Sink Connector - insert multiple topics to multiple tables with renaming - apache-kafka-connect

I'm trying cdc on the confluent cloud with debezium source connector and jdbc sink connector. Both of connectors are fully managed type. I am troubled with topic name and its table name pair.
In my cdc pipeline, topic name must be converted to table name like this:
shop.public.table1 --> table1
shop.public.table2 --> table2
shop.public.table3 --> table3
My question is very similar with old solved question:
Upserting into multiple tables from multiples topics using kafka-connect
But, my cdc pipeline works under confluent cloud and RegexRouter is not supported.
https://docs.confluent.io/platform/current/connect/transforms/regexrouter.html
Is there any idea split topics to proper tables?

Related

I need Update or Insert functionality in Kafka JDBC Sink Connector

Kafka JDBC Sink Connector
Kafka JDBC sink connector provide 3 insert.mode ..but i need update or insert functionality together . Anyone help how to achieve this.
upsert literally means both insert or update for existing keys that have already been inserted
You can consider the following steps:
separation events among 2 topics on source connector (topic for inserts and topic for updates)
procesing thee topics with independent sink connectors with different configurations.

How to ingest CDC events produced by Oracle CDC Source Connector into Snowflake

Our current pipeline is following a structure similar to the one outlined here except we are pulling events from Oracle and pushing them to snowflake. The flow goes something like this:
Confluent Oracle CDC Source Connector mining the Oracle transaction log
Pushing these change events to a Kafka topic
Snowflake Sink Connector reading off the Kafka topic and pulling raw messages into Snowflake table.
In the end I have a table of record_metadata, and record_content fields that contain the raw kafka messages.
I'm having to build a set of procedures that handle the merge/upsert logic operating on a stream on top of the raw table. The tables I'm trying to replicate in snowflake are very wide and there are around 100 of them, so writing the SQL merge statements by hand is unfeasible.
Is there a better way to ingest the Kafka topic containing all of the CDC events generated from the Oracle connector straight into Snowflake, handling auto-creating nonexistent tables, auto-updating/deleting/etc as events come across the stream?

Send multiple oracle tables into single kafka topic

I'm using JDBC source connector to transfer data from Oracle to Kafka topic. I want to transfer 10 different oracle tables to same kafka topic using JDBC source connector with table name mentioned somewhere in message(e.g: header) . Is it possible?
with table name mentioned somewhere in message
You can use an ExtractTopic transform to read the topic name from a column in the tables
Otherwise, if that data isn't in the table, you can use the InsertField transform with static.value before the extract one to force the topic name to be the same
Note: If you use Avro or other record-type with schemas, and your tables do not have the same schema (column names and types), then you should expect all but the first producer to fail, becuase the schemas would be incompatible

Can kafka connect create stream directly?

I have a scenario where I need to import an entire DB in Kafka and create in DB term some views on those table that user can query after. My requirements is to rebuild the logical model via views out of the physical models (the tables).
Hence I am wondering about the step to do that.
My ideal would be that kafka Connect create the topics which corresponds to the tables, then right after that, for me to declaratively (using KSQL) to create the Views.
While what I describe here sounds feasible at first, I have an issue with the data the structure (schema) of the data within the topics. The problem it seems is that i might have to do an extra steps but wonder if it can be avoided or is actually necessary.
More specifically, Views usually represent join on table. I imagine that if i want to do join on table, I need to have Ktable or Kstream already created, which give the structure on which to do the joins. But if Kafka connect just create topics but no Ktable or Kstream, it seems that an extra steps need to happen that automatically make those topics availables as Ktable or Kstream. At which point, i can use KSQL to create the views that will represent the physical model.
1 - Hence the question, is there a way from Kafka connect to create Kstream or Ktable automatically ?
2 - Kafka connect as the notion of schema, how does that relate to the Kstream/KTable structure (schema) and format(json/avro/delimited) ?
3 - If Kafka connect can't create Kstream and KTable directly, can KSQL operate a join on the topics that Kafka connect create, directly ? Will it be able to interpret the structure of the data in those topics (i.e. kafka connect generated schema) and perform a join on it, and make the result available as a Kstream ?
4 - If all my assumption are wrong, can someone give me the step of what my problem would entail in term of KSQL/Kafka-stream/Kafka-connect ?
1 - Hence the question, is there a way from Kafka connect to create Kstream or Ktable automatically ?
No, you need to do so manually. But if you're using Avro then it's just a simple statement:
CREATE STREAM foo WITH (KAFKA_TOPIC='bar', VALUE_FORMAT='AVRO');
2 - Kafka connect as the notion of schema, how does that relate to the Kstream/KTable structure (schema) and format(json/avro/delimited) ?
KSQL Stream (or Table) = Kafka Topic plus Schema.
So you have a Kafka topic (loaded by Kafka Connect, for example), and you need a schema. The best thing is just use Avro when you produce the data (e.g. from Kafka Connect), because the schema then exists in the Schema Registry and KSQL can use it automagically.
If you want to use JSON or [shudder] Delimited then you have to provide the schema in KSQL when you declare the stream/table. Instead of the above statement you'd have something like
CREATE STREAM foo (COL1 INT, COL2 VARCHAR, COL3 INT, COL4 STRUCT<S1 INT,S2 VARCHAR>)
WITH (KAFKA_TOPIC='bar_json',VALUE_FORMAT='JSON');
3 - If Kafka connect can't create Kstream and KTable directly, can KSQL operate a join on the topics that Kafka connect create, directly ?
KSQL can join streams and tables, yes. A stream/table is just a Kafka topic, with a schema.
Will it be able to interpret the structure of the data in those topics (i.e. kafka connect generated schema) and perform a join on it, and make the result available as a Kstream ?
Yes. The schema is provided by Kafka Connect and if you're using Avro it 'just works'. If using JSON you need to manually enter the schema as shown above.
The output of a KSQL join is a Kafka topic, for example
CREATE STREAM A WITH (KAFKA_TOPIC='A', VALUE_FORMAT='AVRO');
CREATE TABLE B WITH (KAFKA_TOPIC='B', VALUE_FORMAT='AVRO', KEY='ID');
CREATE STREAM foobar AS
SELECT A.*, B.* FROM
A LEFT OUTER JOIN B ON A.ID = B.ID;
4 - If all my assumption are wrong, can someone give me the step of what my problem would entail in term of KSQL/Kafka-stream/Kafka-connect ?
I don't think your assumptions are wrong. Use Kafka Connect + KSQL, and use Avro :)
These references might help you further:
http://rmoff.dev/vienna19-ksql-intro
http://go.rmoff.net/devoxx18-build-streaming-pipeline

Upserting into multiple tables from multiples topics using kafka-connect

I am trying to read 2 kafka topics using JDBC sink connector and upsert into 2 Oracle tables which I manually created it. Each table has 1 primary key I want to use it in upsert mode. Connector works fine if I use only for 1 topic and only 1 field in pk.fields but if I enter multiple columns in pk.fields one from each table it fails to recognize the schema. Am I missing any thing please suggest.
name=oracle_sink_prod
connector.class=io.confluent.connect.jdbc.JdbcSinkConnector
tasks.max=1
topics=KAFKA1011,JAFKA1011
connection.url=URL
connection.user=UID
connection.password=PASSWD
auto.create=false
table.name.format=KAFKA1011,JAFKA1011
pk.mode=record_value
pk.fields= ID,COMPANY
auto.evolve=true
insert.mode=upsert
//ID is pk of kafka1011 table and COMPANY is of other
If the PK are different, just create two different sink connectors. They can both run on the same Kafka Connect worker.
You also have the option of using the key of the Kafka message itself. See doc for more info. This is the more scalable option, and you would then just need to ensure that your messages were keyed correctly for this to flow down to the JDBC Sink.

Resources