Auto sinking of topics being created in kafka to elasticsearch - elasticsearch

I have topics being created in kafka (test1, test2, test3) and I want to sink them to elastic at creation time. I tried topics.regex but it only creates indices for topics already existing. How can I sink a new topic into an index when it gets created dynamically?
Here is the connector config that I am using for kafka-sink:
{
"name": "elastic-sink-test-regex",
"config": {
"connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"tasks.max": "1",
"topics.regex": "test[0-9]+",
"type.name": "kafka-connect",
"connection.url": "http://192.168.0.188:9200",
"key.ignore": "true",
"schema.ignore": "true",
"schema.enable": "false",
"batch.size": "100",
"flush.timeout.ms": "100000",
"max.buffered.records": "10000",
"max.retries": "10",
"retry.backoff.ms": "1000",
"max.in.flight.requests": "3",
"is.timebased.indexed": "False",
"time.index": "at"
}
}

A sink connector won't read new topics till this connector is restarted (or a scheduled rebalance occurred). You can run a Kafka Stream that reads messages from new topics and put them into a result-like topic. A Sink Connector reads from the result-like topic.
To save a "message - topic" matching you can use Kafka Record Headers.
Make sure it meets your requirements!

Related

Debezium : Data deleted from MySQL reappearing in Elasticsearch

I've used Debezium for Mysql -> Elasticsearch CDC.
Now, the issue is that when I delete data from MySQL, it still reappears in Elasticsearch, even if data is no longer present in MySQL DB. UPDATE and INSERT works fine, but DELETE isn't.
Also, I did the following:
Delete data in MySQL
Delete Elasticsearch Index and ES Kafka Sink
Create a new connector for ES in Kakfa
Now, the weird part is that all of my deleted data reappers here as well! When I check ES data before step (3), data wasn't there. But afterwards, this behaviour is observed.
Please help me fix this issue!
MySQL config :
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"database.allowPublicKeyRetrieval": "true",
"database.user": "cdc-reader",
"tasks.max": "1",
"database.history.kafka.bootstrap.servers": "X.X.X.X:9092",
"database.history.kafka.topic": "schema-changes.mysql",
"database.server.name": "data_test",
"schema.include.list": "data_test",
"database.port": "3306",
"tombstones.on.delete": "true",
"delete.enabled": "true",
"database.hostname": "X.X.X.X",
"database.password": "xxxxx",
"name": "slave_test",
"database.history.skip.unparseable.ddl": "true",
"table.include.list": "search_ai.*"
},
Elasticsearch config:
"config": {
"connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"type.name": "_doc",
"behavior.on.null.values": "delete",
"transforms.extractKey.field": "ID",
"tasks.max": "1",
"topics": "search_ai.search_ai.slave_data",
"transforms.InsertKey.fields": "ID",
"transforms": "unwrap,key,InsertKey,extractKey",
"key.ignore": "false",
"transforms.extractKey.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.key.field": "ID",
"transforms.key.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"name": "esd_2",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"connection.url": "http://X.X.X.X:9200",
"transforms.InsertKey.type": "org.apache.kafka.connect.transforms.ValueToKey"
},
Debezium is reading the transaction log, not the source table, so the inserts and updates are always going to be read first, causing inserts and doc updates in Elasticsearch...
Secondly, did you create the sink connector with a new name or different one?
If the same one, the original consumer group offsets would not have changed, causing the consumer group to pickup at the offsets before you deleted the original connector
if a new name, and depending on the auto.offset.reset value of the sink connector consumer, you could be consuming the Debezium topic from the beginning, and causing data to get re-inserted into Elasticsearch, as mentioned. You need to check if your Mysql delete events are actually getting produced/consumed as tombstone values to cause deletes in Elasticsearch

Debezium Connector Sends Mostly Zeros When Updating Row in Oracle

From the documentation for the Oracle Debezium Connector it seems that when an update is performed on a row it should send a Kafka message with all of the data for the state of the row before the update and all of the data for the state of the row after the update. However, I am getting zeros in almost all of the fields, except the field that was updated and one other field that has a unique constraint, but which is not used by Debezium as the key. The key used by Debezium is a combination of four fields, which together are unique. Here is how I created the connector. How can I get Debezium to give me data for all of the fields, not just the one that was updated, or is this not possible?
{
"name": "bom-tables",
"config": {
"name": "bom-tables",
"connector.class": "io.debezium.connector.oracle.OracleConnector",
"database.server.name": "fake.example.com",
"database.hostname": "fake2.example.com",
"snapshot.mode": "initial",
"database.port": "1521",
"database.user": "XSTRM",
"database.password": "FAKE_PASS",
"database.dbname": "FAKE_DBNAME",
"database.out.server.name": "DBZXOUT",
"database.history.kafka.bootstrap.servers": "localhost:9092",
"database.history.kafka.topic": "schema-changes.inventory",
"database.tablename.case.insensitive": "true",
"database.oracle.version": "11",
"include.schema.changes": "true",
"table.whitelist": "XXX,YYY",
"errors.log.enable": "true"
}
}
Thanks for any help.

kafka-connect-elasticsearch How to route multiple topics to same elasticsearch index in same connector?

Trying to create elasticsearch sink connector with following config, the creation is successful but when a message is produced on "my.topic.one", ES sink connector fails while trying to create an index with name "my.topic.one" : "Could not create index 'my.topic.one'" (User that I am using to connect to ES does not have create index permission intentionally). Why is it trying to create a new index and how to get the connector to index to previously created "elasticsearch_index_name"?
{
"type.name": "_doc",
"tasks.max": "1",
"connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"connection.url": "http://elasticsearch-service:9200",
"behavior.on.null.values": "delete",
"key.ignore": "false",
"write.method": "upsert",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"key.converter.schemas.enable": "false",
"value.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter.schemas.enable": "false",
"topics": "my.topic.one,my.topic.two",
"transforms": "renameTopic",
"transforms.renameTopic.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.renameTopic.regex": ".*",
"transforms.renameTopic.replacement": "elasticsearch_index_name"
}
UPDATE: ES sink connector throws error even if I use just one topic in "topics" attribute and same topic name in "renameTopic.regex" like below, rest all attributes same.
"topics": "my.topic.one",
"transforms.renameTopic.regex": "my.topic.one"
Adding following property to ES sink connector config, solved the issue at hand :
"auto.create.indices.at.start": "false"

confluent elasticsearch sink regex to assign to all topics and send to index

I have multiple topics in like this
client1-table1
client1-table2
client1-table3
client1-table4
I want my elasticsearch sink to listen for any incoming messages and send them to an index accordingly. However my current configurations are not working... what can I do... below is my elasticsearch sink
{
"name": "es-data",
"config": {
"_comment": "-- standard converter stuff -- this can actually go in the worker config globally --",
"connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url": "http://localhost:8081",
"value.converter.schema.registry.url": "http://localhost:8081",
"_comment": "--- Elasticsearch-specific config ---",
"_comment": "Elasticsearch server address",
"connection.url": "http://127.0.0.1:9200",
"_comment": "If the Kafka message doesn't have a key (as is the case with JDBC source) you need to specify key.ignore=true. If you don't, you'll get an error from the Connect task: 'ConnectException: Key is used as document id and can not be null.",
"key.ignore": "true"
}
}

Confluent Kafka connect ElasticSearch ID document creation

I'm using the kafka connect elasticsearch connector to write data from a topic to an ElasticSearch index. Both the key and value of the topic messages are in json format. The connector is not able to start because of the following error:
org.apache.kafka.connect.errors.DataException: MAP is not supported as the document id.
Following is the format of my messages (key | value):
{"key":"OKOK","start":1517241690000,"end":1517241695000} | {"measurement":"responses","count":9,"sum":1350.0,"max":150.0,"min":150.0,"avg":150.0}
And following is the body of the POST request I'm using to create the connector:
{
"name": "elasticsearch-sink-connector",
"config": {
"connector.class":"io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"tasks.max": "1",
"topics": "output-topic-elastic",
"connection.url": "http://elasticsearch:9200",
"type.name": "aggregator",
"schemas.enable": "false",
"topic.schema.ignore": "true",
"topic.key.ignore": "false",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false",
"key.ignore":"false",
"topic.index.map": "output-topic-elastic:aggregator",
"name": "elasticsearch-sink",
"transforms": "InsertKey",
"transforms.InsertKey.type":"org.apache.kafka.connect.transforms.ValueToKey",
"transforms.InsertKey.fields":"key"
}}
Any help would be really appreciated. I've found out a similar question on stackoverflow 1 but I've got no luck with the answers.
ES document ID creation
You also need ExtractField in there
"transforms": "InsertKey,extractKey",
"transforms.InsertKey.type":"org.apache.kafka.connect.transforms.ValueToKey",
"transforms.InsertKey.fields":"key",
"transforms.extractKey.type":"org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.extractKey.field":"key"
Check out this post for more details.

Resources