From the documentation for the Oracle Debezium Connector it seems that when an update is performed on a row it should send a Kafka message with all of the data for the state of the row before the update and all of the data for the state of the row after the update. However, I am getting zeros in almost all of the fields, except the field that was updated and one other field that has a unique constraint, but which is not used by Debezium as the key. The key used by Debezium is a combination of four fields, which together are unique. Here is how I created the connector. How can I get Debezium to give me data for all of the fields, not just the one that was updated, or is this not possible?
{
"name": "bom-tables",
"config": {
"name": "bom-tables",
"connector.class": "io.debezium.connector.oracle.OracleConnector",
"database.server.name": "fake.example.com",
"database.hostname": "fake2.example.com",
"snapshot.mode": "initial",
"database.port": "1521",
"database.user": "XSTRM",
"database.password": "FAKE_PASS",
"database.dbname": "FAKE_DBNAME",
"database.out.server.name": "DBZXOUT",
"database.history.kafka.bootstrap.servers": "localhost:9092",
"database.history.kafka.topic": "schema-changes.inventory",
"database.tablename.case.insensitive": "true",
"database.oracle.version": "11",
"include.schema.changes": "true",
"table.whitelist": "XXX,YYY",
"errors.log.enable": "true"
}
}
Thanks for any help.
Related
I have the following setup for implementing CDC using Debezium
Oracle -> Debezium Source Connector -> Kafka -> JDBC Sink Connector -> PostgreSQL
Source Connector config is
{
"name":"myfirst-connector",
"config":{
"connector.class":"io.debezium.connector.oracle.OracleConnector",
"tasks.max":"1",
"database.hostname":"192.168.29.102",
"database.port":"1521",
"database.user":"c##dbzuser",
"database.password":"dbz",
"database.dbname":"ORCLCDB",
"database.pdb.name":"ORCLPDB1",
"database.server.name":"oracle19",
"database.connection.adapter":"logminer",
"database.history.kafka.topic":"schema_changes",
"database.history.kafka.bootstrap.servers":"192.168.29.102:9092",
"database.tablename.case.insensitive":"true",
"snapshot.mode":"initial",
"tombstones.on.delete":"true",
"include.schema.changes": "true",
"sanitize.field.names":"true",
"key.converter":"org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable":"true",
"value.converter":"org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable":"true",
"time.precision.mode": "connect",
"database.oracle.version":19
} }
Sink connector config is
{
"name": "myjdbc-sink-testdebezium",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics.regex": "oracle19.C__DBZUSER.*",
"connection.url": "jdbc:postgresql://192.168.29.102:5432/postgres?user=puser&password=My19cPassword",
"key.converter":"org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable":"true",
"value.converter":"org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable":"true",
"dialect.name": "PostgreSqlDatabaseDialect",
"auto.create": "true",
"auto.evolve": "true",
"insert.mode": "upsert",
"delete.enabled": "true",
"transforms": "unwrap, RemoveString, TimestampConverter",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.delete.handling.mode": "none",
"transforms.RemoveString.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.RemoveString.regex": "(.*)\\.C__DBZUSER\\.(.*)",
"transforms.RemoveString.replacement": "$2",
"transforms.TimestampConverter.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
"transforms.TimestampConverter.target.type": "Timestamp",
"transforms.TimestampConverter.field": "dob",
"pk.mode": "record_key"
}
}
Now when I drop a table in Oracle I get an entry in schema_changes topic but the table is not dropped from PostgreSQL. Need help in figuring out the issue why drop is not getting propogated. Just FYI, all the other operations i.e. Create Table, Alter Table, Insert, Update, Delete are working fine. Only DROP is not working and I am not getting any exception in the logs either.
I have this connector and sink which basically creates a topic with
"Test.dbo.TEST_A" and write to the ES index "Test". I have set the "key.ignore": "false" so that row updates are also updated in ES and
"transforms.unwrap.add.fields":"table" to keep track on which table the document belong to.
{
"name": "Test-connector",
"config": {
"connector.class": "io.debezium.connector.sqlserver.SqlServerConnector",
"tasks.max": "1",
"database.hostname": "192.168.1.234",
"database.port": "1433",
"database.user": "user",
"database.password": "pass",
"database.dbname": "Test",
"database.server.name": "MyServer",
"table.include.list": "dbo.TEST_A",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "dbhistory.testA",
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones": "false",
"transforms.unwrap.delete.handling.mode": "rewrite",
"transforms.unwrap.add.fields":"table"
}
}
{
"name": "elastic-sink-test",
"config": {
"connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"tasks.max": "1",
"topics": "TEST_A",
"connection.url": "http://localhost:9200/",
"string.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schema.enable": "false",
"schema.ignore": "true",
"transforms": "topicRoute,unwrap,key",
"transforms.topicRoute.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.topicRoute.regex": "(.*).dbo.TEST_A", /* Use the database name */
"transforms.topicRoute.replacement": "$1",
"transforms.unwrap.type": "io.debezium.transforms.UnwrapFromEnvelope",
"transforms.unwrap.drop.tombstones": "false",
"transforms.key.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.key.field": "Id",
"key.ignore": "false",
"type.name": "TEST_A",
"behavior.on.null.values": "delete"
}
}
But when I add another connector/sink to include another table "TEST_B" from the database.
It seems like whenever the id from TEST_A and TEST_B are the same one of the row is deleted from ES?
Is it possible with this setup to have one index = one dabase or is the only solution to have one index per table?
The reason I want to have one index = one dabase is to decrease the amount of indexes when more database are added to ES.
You are reading data changes from different Databases/Tables and writing them into the same ElasticSearch index, with the ES document ID set to the DB record ID. And as you can see, if the DB record IDs collide, the index document IDs will also collide, causing old documents to be deleted.
You have a few options here:
ElasticSearch index per DB/Table name: You can implement this with different connectors or with a custom Single Message Transform (SMT)
Globally unique DB records: If you control the schema of the source tables, you can set the primary key to a UUID. This will prevent ID collisions.
As you mentioned in the comments, set the ES document ID to DB/Table/ID. You can implement this change using an SMT
I've used Debezium for Mysql -> Elasticsearch CDC.
Now, the issue is that when I delete data from MySQL, it still reappears in Elasticsearch, even if data is no longer present in MySQL DB. UPDATE and INSERT works fine, but DELETE isn't.
Also, I did the following:
Delete data in MySQL
Delete Elasticsearch Index and ES Kafka Sink
Create a new connector for ES in Kakfa
Now, the weird part is that all of my deleted data reappers here as well! When I check ES data before step (3), data wasn't there. But afterwards, this behaviour is observed.
Please help me fix this issue!
MySQL config :
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"database.allowPublicKeyRetrieval": "true",
"database.user": "cdc-reader",
"tasks.max": "1",
"database.history.kafka.bootstrap.servers": "X.X.X.X:9092",
"database.history.kafka.topic": "schema-changes.mysql",
"database.server.name": "data_test",
"schema.include.list": "data_test",
"database.port": "3306",
"tombstones.on.delete": "true",
"delete.enabled": "true",
"database.hostname": "X.X.X.X",
"database.password": "xxxxx",
"name": "slave_test",
"database.history.skip.unparseable.ddl": "true",
"table.include.list": "search_ai.*"
},
Elasticsearch config:
"config": {
"connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"type.name": "_doc",
"behavior.on.null.values": "delete",
"transforms.extractKey.field": "ID",
"tasks.max": "1",
"topics": "search_ai.search_ai.slave_data",
"transforms.InsertKey.fields": "ID",
"transforms": "unwrap,key,InsertKey,extractKey",
"key.ignore": "false",
"transforms.extractKey.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.key.field": "ID",
"transforms.key.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"name": "esd_2",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"connection.url": "http://X.X.X.X:9200",
"transforms.InsertKey.type": "org.apache.kafka.connect.transforms.ValueToKey"
},
Debezium is reading the transaction log, not the source table, so the inserts and updates are always going to be read first, causing inserts and doc updates in Elasticsearch...
Secondly, did you create the sink connector with a new name or different one?
If the same one, the original consumer group offsets would not have changed, causing the consumer group to pickup at the offsets before you deleted the original connector
if a new name, and depending on the auto.offset.reset value of the sink connector consumer, you could be consuming the Debezium topic from the beginning, and causing data to get re-inserted into Elasticsearch, as mentioned. You need to check if your Mysql delete events are actually getting produced/consumed as tombstone values to cause deletes in Elasticsearch
I am using the database source connector to move data from my Postgres database table to Kafka topic. I have an orders table having a foreign key with customers table using customerNumber field.
Below is the connector which is copying the orders to Kafka but without customers data into JSON. I am looking how could I construct the complete object of orders with customers into JSON.
and the connector is :
{
"name": "SOURCE_CONNECTOR",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"transforms.createKey.type": "org.apache.kafka.connect.transforms.ValueToKey",
"connection.password": "postgres_pwd",
"transforms.cast.type": "org.apache.kafka.connect.transforms.Cast$Value",
"transforms.cast.spec": "amount:float64",
"tasks.max": "1",
"transforms": "cast,createKey,extractInt",
"transforms.extractInt.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"batch.max.rows": "25",
"table.whitelist": "orders",
"mode": "bulk",
"topic.prefix": "data_",
"transforms.extractInt.field": "uuid",
"connection.user": "postgres_user",
"transforms.createKey.fields": "uuid",
"poll.interval.ms": "3600000",
"sql.quote.identifiers": "false",
"name": "SOURCE_CONNECTOR",
"numeric.mapping": "best_fit",
"connection.url": "url"
}
}
You can use Query-based ingest. Just specify query config option.
I have topics being created in kafka (test1, test2, test3) and I want to sink them to elastic at creation time. I tried topics.regex but it only creates indices for topics already existing. How can I sink a new topic into an index when it gets created dynamically?
Here is the connector config that I am using for kafka-sink:
{
"name": "elastic-sink-test-regex",
"config": {
"connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"tasks.max": "1",
"topics.regex": "test[0-9]+",
"type.name": "kafka-connect",
"connection.url": "http://192.168.0.188:9200",
"key.ignore": "true",
"schema.ignore": "true",
"schema.enable": "false",
"batch.size": "100",
"flush.timeout.ms": "100000",
"max.buffered.records": "10000",
"max.retries": "10",
"retry.backoff.ms": "1000",
"max.in.flight.requests": "3",
"is.timebased.indexed": "False",
"time.index": "at"
}
}
A sink connector won't read new topics till this connector is restarted (or a scheduled rebalance occurred). You can run a Kafka Stream that reads messages from new topics and put them into a result-like topic. A Sink Connector reads from the result-like topic.
To save a "message - topic" matching you can use Kafka Record Headers.
Make sure it meets your requirements!