Elastic Sink Connector not working new version - elasticsearch

I have used the "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector" version 13.0.0.
I am doing the SMT for this connector. but I got the below error.
Found a topic name 'es.contact3.model' that doesn't match the assigned partitions. The connector doesn't support topic mutating SMTs
I got the error even though I set the "flush.syncronously": "true".
My config is as follows,
{
"type.name": "_doc",
"connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"topics": "es.contact.model",
"tasks.max": "1",
"transforms": "Dealership",
"key.ignore": "true",
"input.data.format": "AVRO",
"transforms.Dealership.type": "io.confluent.connect.transforms.ExtractTopic$Value",
"transforms.Dealership.field": "indexTopicName",
"schema.ignore": "true",
"name": "ContactElasticSinkConnector",
"flush.syncronously": "true",
"connection.url": "http://192.168.1.7:19200",
"transforms.Dealership.skip.missing.or.null": "true"
}

I faced this same issue and decided to try what you did within your question. There is a typo within:
"flush.syncronously": "true"
It should be:
"flush.synchronously": "true"
Simply making this correction worked for me.

Related

Debezium : Data deleted from MySQL reappearing in Elasticsearch

I've used Debezium for Mysql -> Elasticsearch CDC.
Now, the issue is that when I delete data from MySQL, it still reappears in Elasticsearch, even if data is no longer present in MySQL DB. UPDATE and INSERT works fine, but DELETE isn't.
Also, I did the following:
Delete data in MySQL
Delete Elasticsearch Index and ES Kafka Sink
Create a new connector for ES in Kakfa
Now, the weird part is that all of my deleted data reappers here as well! When I check ES data before step (3), data wasn't there. But afterwards, this behaviour is observed.
Please help me fix this issue!
MySQL config :
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"database.allowPublicKeyRetrieval": "true",
"database.user": "cdc-reader",
"tasks.max": "1",
"database.history.kafka.bootstrap.servers": "X.X.X.X:9092",
"database.history.kafka.topic": "schema-changes.mysql",
"database.server.name": "data_test",
"schema.include.list": "data_test",
"database.port": "3306",
"tombstones.on.delete": "true",
"delete.enabled": "true",
"database.hostname": "X.X.X.X",
"database.password": "xxxxx",
"name": "slave_test",
"database.history.skip.unparseable.ddl": "true",
"table.include.list": "search_ai.*"
},
Elasticsearch config:
"config": {
"connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"type.name": "_doc",
"behavior.on.null.values": "delete",
"transforms.extractKey.field": "ID",
"tasks.max": "1",
"topics": "search_ai.search_ai.slave_data",
"transforms.InsertKey.fields": "ID",
"transforms": "unwrap,key,InsertKey,extractKey",
"key.ignore": "false",
"transforms.extractKey.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.key.field": "ID",
"transforms.key.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"name": "esd_2",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"connection.url": "http://X.X.X.X:9200",
"transforms.InsertKey.type": "org.apache.kafka.connect.transforms.ValueToKey"
},
Debezium is reading the transaction log, not the source table, so the inserts and updates are always going to be read first, causing inserts and doc updates in Elasticsearch...
Secondly, did you create the sink connector with a new name or different one?
If the same one, the original consumer group offsets would not have changed, causing the consumer group to pickup at the offsets before you deleted the original connector
if a new name, and depending on the auto.offset.reset value of the sink connector consumer, you could be consuming the Debezium topic from the beginning, and causing data to get re-inserted into Elasticsearch, as mentioned. You need to check if your Mysql delete events are actually getting produced/consumed as tombstone values to cause deletes in Elasticsearch

kafka-connect-elasticsearch How to route multiple topics to same elasticsearch index in same connector?

Trying to create elasticsearch sink connector with following config, the creation is successful but when a message is produced on "my.topic.one", ES sink connector fails while trying to create an index with name "my.topic.one" : "Could not create index 'my.topic.one'" (User that I am using to connect to ES does not have create index permission intentionally). Why is it trying to create a new index and how to get the connector to index to previously created "elasticsearch_index_name"?
{
"type.name": "_doc",
"tasks.max": "1",
"connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"connection.url": "http://elasticsearch-service:9200",
"behavior.on.null.values": "delete",
"key.ignore": "false",
"write.method": "upsert",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"key.converter.schemas.enable": "false",
"value.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter.schemas.enable": "false",
"topics": "my.topic.one,my.topic.two",
"transforms": "renameTopic",
"transforms.renameTopic.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.renameTopic.regex": ".*",
"transforms.renameTopic.replacement": "elasticsearch_index_name"
}
UPDATE: ES sink connector throws error even if I use just one topic in "topics" attribute and same topic name in "renameTopic.regex" like below, rest all attributes same.
"topics": "my.topic.one",
"transforms.renameTopic.regex": "my.topic.one"
Adding following property to ES sink connector config, solved the issue at hand :
"auto.create.indices.at.start": "false"

Timestamp in avro schema produces incompatible value validation in Kafka Connect JDBC

Error produced by JDBC sink connector:
org.apache.kafka.connect.errors.DataException: Invalid Java object for schema type INT64: class java.util.Date for field: "some_timestamp_field"
at org.apache.kafka.connect.data.ConnectSchema.validateValue(ConnectSchema.java:242)
at org.apache.kafka.connect.data.Struct.put(Struct.java:216)
at org.apache.kafka.connect.transforms.Cast.applyWithSchema(Cast.java:151)
at org.apache.kafka.connect.transforms.Cast.apply(Cast.java:107)
at org.apache.kafka.connect.runtime.TransformationChain.apply(TransformationChain.java:38)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:480)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:301)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:205)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:173)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:170)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:214)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
The avro schema registered by source JDBC connector (MySQL):
{
"type":"record",
"name":"ConnectDefault",
"namespace":"io.confluent.connect.avro",
"fields":[
...
{
"name":"some_timestamp_field",
"type":{
"type":"long",
"connect.version":1,
"connect.name":"org.apache.kafka.connect.data.Timestamp",
"logicalType":"timestamp-millis"
}
},
...
]
}
Looks like the exception is due to this code block: https://github.com/apache/kafka/blob/f0282498e7a312a977acb127557520def338d45c/connect/api/src/main/java/org/apache/kafka/connect/data/ConnectSchema.java#L239
So, in the avro schema, the timestamp field is registered as INT64 with correct (timestamp) logical type. But connect reads the schema type as INT64 and compares it with value type java.util.Date.
Is this a bug, or there is a work around for this? May be I am missing something as this looks like a standard connect model.
Thanks in advance.
UPDATE
Sink connector config:
{
"name": "sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "topic",
"connection.url": "jdbc:postgresql://host:port/db",
"connection.user": "user",
"connection.password": "password",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url": "http://host:port",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://host:port",
"auto.create": "true",
"insert.mode": "upsert",
"pk.mode": "record_value",
"pk.fields": "id"
}
}
Deserialised data in Kafka:
{
"id":678148,
"some_timestamp_field":1543806057000,
...
}
We have worked out a work around for the problem. Our goal was to convert the id from BIGINT to STRING(TEXT/VARCHAR) and save the record in downstream db.
But due to an issue (probably https://issues.apache.org/jira/browse/KAFKA-5891), casting the id field was not working. Kafka was trying to validate the timestamp fields also in the casting chain, but was reading the schema type/name wrong and resulting a type mismatch (see the above record body and error log).
So we made a work around as follows:
extract only the id field as key -> execute cast transform on the key -> it works as key does not contain timestamp field.
Here is the worked around configuration:
{
"name": "sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "topic",
"connection.url": "jdbc:postgresql://host:port/db",
"connection.user": "user",
"connection.password": "password",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url": "http://host:port",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://host:port",
"transforms": "createKey,castKeyToString",
"transforms.createKey.type": "org.apache.kafka.connect.transforms.ValueToKey",
"transforms.createKey.fields": "id",
"transforms.castKeyToString.type": "org.apache.kafka.connect.transforms.Cast$Key",
"transforms.castKeyToString.spec": "id:string",
"auto.create": "true",
"insert.mode": "upsert",
"pk.mode": "record_key",
"pk.fields": "id"
}
}
Disclaimer: This is not a proper solution, just a work around. The bug in casting transform should be fixed. In my opinion, the casting transform should only have concerns with the fields designated for casting, not other fields in the message.
Have a good day.

Auto sinking of topics being created in kafka to elasticsearch

I have topics being created in kafka (test1, test2, test3) and I want to sink them to elastic at creation time. I tried topics.regex but it only creates indices for topics already existing. How can I sink a new topic into an index when it gets created dynamically?
Here is the connector config that I am using for kafka-sink:
{
"name": "elastic-sink-test-regex",
"config": {
"connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"tasks.max": "1",
"topics.regex": "test[0-9]+",
"type.name": "kafka-connect",
"connection.url": "http://192.168.0.188:9200",
"key.ignore": "true",
"schema.ignore": "true",
"schema.enable": "false",
"batch.size": "100",
"flush.timeout.ms": "100000",
"max.buffered.records": "10000",
"max.retries": "10",
"retry.backoff.ms": "1000",
"max.in.flight.requests": "3",
"is.timebased.indexed": "False",
"time.index": "at"
}
}
A sink connector won't read new topics till this connector is restarted (or a scheduled rebalance occurred). You can run a Kafka Stream that reads messages from new topics and put them into a result-like topic. A Sink Connector reads from the result-like topic.
To save a "message - topic" matching you can use Kafka Record Headers.
Make sure it meets your requirements!

Confluent Kafka connect ElasticSearch ID document creation

I'm using the kafka connect elasticsearch connector to write data from a topic to an ElasticSearch index. Both the key and value of the topic messages are in json format. The connector is not able to start because of the following error:
org.apache.kafka.connect.errors.DataException: MAP is not supported as the document id.
Following is the format of my messages (key | value):
{"key":"OKOK","start":1517241690000,"end":1517241695000} | {"measurement":"responses","count":9,"sum":1350.0,"max":150.0,"min":150.0,"avg":150.0}
And following is the body of the POST request I'm using to create the connector:
{
"name": "elasticsearch-sink-connector",
"config": {
"connector.class":"io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"tasks.max": "1",
"topics": "output-topic-elastic",
"connection.url": "http://elasticsearch:9200",
"type.name": "aggregator",
"schemas.enable": "false",
"topic.schema.ignore": "true",
"topic.key.ignore": "false",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false",
"key.ignore":"false",
"topic.index.map": "output-topic-elastic:aggregator",
"name": "elasticsearch-sink",
"transforms": "InsertKey",
"transforms.InsertKey.type":"org.apache.kafka.connect.transforms.ValueToKey",
"transforms.InsertKey.fields":"key"
}}
Any help would be really appreciated. I've found out a similar question on stackoverflow 1 but I've got no luck with the answers.
ES document ID creation
You also need ExtractField in there
"transforms": "InsertKey,extractKey",
"transforms.InsertKey.type":"org.apache.kafka.connect.transforms.ValueToKey",
"transforms.InsertKey.fields":"key",
"transforms.extractKey.type":"org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.extractKey.field":"key"
Check out this post for more details.

Resources