Confluent Kafka connect ElasticSearch ID document creation - elasticsearch

I'm using the kafka connect elasticsearch connector to write data from a topic to an ElasticSearch index. Both the key and value of the topic messages are in json format. The connector is not able to start because of the following error:
org.apache.kafka.connect.errors.DataException: MAP is not supported as the document id.
Following is the format of my messages (key | value):
{"key":"OKOK","start":1517241690000,"end":1517241695000} | {"measurement":"responses","count":9,"sum":1350.0,"max":150.0,"min":150.0,"avg":150.0}
And following is the body of the POST request I'm using to create the connector:
{
"name": "elasticsearch-sink-connector",
"config": {
"connector.class":"io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"tasks.max": "1",
"topics": "output-topic-elastic",
"connection.url": "http://elasticsearch:9200",
"type.name": "aggregator",
"schemas.enable": "false",
"topic.schema.ignore": "true",
"topic.key.ignore": "false",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false",
"key.ignore":"false",
"topic.index.map": "output-topic-elastic:aggregator",
"name": "elasticsearch-sink",
"transforms": "InsertKey",
"transforms.InsertKey.type":"org.apache.kafka.connect.transforms.ValueToKey",
"transforms.InsertKey.fields":"key"
}}
Any help would be really appreciated. I've found out a similar question on stackoverflow 1 but I've got no luck with the answers.
ES document ID creation

You also need ExtractField in there
"transforms": "InsertKey,extractKey",
"transforms.InsertKey.type":"org.apache.kafka.connect.transforms.ValueToKey",
"transforms.InsertKey.fields":"key",
"transforms.extractKey.type":"org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.extractKey.field":"key"
Check out this post for more details.

Related

Write same value to 2 different columns with JDBC Sink Connector

There are 2 columns that needs to be same in the table I want to sink on. Lets say columns are named as ID and PAYLOADID. But in the Kafka side, there are no seperate records for these columns. So, how can I configure my sink connector to write to these 2 columns from the same field in Kafka?
This is my connector config:
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"connection.user": "${file:/pass.properties:alm_user}",
"connection.password": "${file:/pass.properties:alm_pwd}",
"connection.url": "jdbc:oracle:thin:#(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=****)(PORT=****))(CONNECT_DATA=(SERVICE_NAME=****)))",
"table.name.format": "SCHEMA_NAME.TABLE_NAME",
"topics": "MY_TOPIC",
"transforms": "TimestampConverter1",
"transforms.TimestampConverter1.target.type": "Timestamp",
"transforms.TimestampConverter1.field": "RECORDDATE",
"transforms.TimestampConverter1.format": "MM.dd.yyyy hh:mm:ss",
"transforms.TimestampConverter1.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
"key.converter.schemas.enable": "false",
"value.converter.schemas.enable": "true",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"auto.create": "false",
"insert.mode": "insert",
"transforms": "rename",
"transforms.rename.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.rename.renames": "payload:PAYLOADID, type:TYPE"
You'd need to write your own transform, or otherwise pre-process the topic, such that you can copy and rename one field to another while keeping the same field-value.

Elastic Sink Connector not working new version

I have used the "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector" version 13.0.0.
I am doing the SMT for this connector. but I got the below error.
Found a topic name 'es.contact3.model' that doesn't match the assigned partitions. The connector doesn't support topic mutating SMTs
I got the error even though I set the "flush.syncronously": "true".
My config is as follows,
{
"type.name": "_doc",
"connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"topics": "es.contact.model",
"tasks.max": "1",
"transforms": "Dealership",
"key.ignore": "true",
"input.data.format": "AVRO",
"transforms.Dealership.type": "io.confluent.connect.transforms.ExtractTopic$Value",
"transforms.Dealership.field": "indexTopicName",
"schema.ignore": "true",
"name": "ContactElasticSinkConnector",
"flush.syncronously": "true",
"connection.url": "http://192.168.1.7:19200",
"transforms.Dealership.skip.missing.or.null": "true"
}
I faced this same issue and decided to try what you did within your question. There is a typo within:
"flush.syncronously": "true"
It should be:
"flush.synchronously": "true"
Simply making this correction worked for me.

kafka-connect-elasticsearch How to route multiple topics to same elasticsearch index in same connector?

Trying to create elasticsearch sink connector with following config, the creation is successful but when a message is produced on "my.topic.one", ES sink connector fails while trying to create an index with name "my.topic.one" : "Could not create index 'my.topic.one'" (User that I am using to connect to ES does not have create index permission intentionally). Why is it trying to create a new index and how to get the connector to index to previously created "elasticsearch_index_name"?
{
"type.name": "_doc",
"tasks.max": "1",
"connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"connection.url": "http://elasticsearch-service:9200",
"behavior.on.null.values": "delete",
"key.ignore": "false",
"write.method": "upsert",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"key.converter.schemas.enable": "false",
"value.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter.schemas.enable": "false",
"topics": "my.topic.one,my.topic.two",
"transforms": "renameTopic",
"transforms.renameTopic.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.renameTopic.regex": ".*",
"transforms.renameTopic.replacement": "elasticsearch_index_name"
}
UPDATE: ES sink connector throws error even if I use just one topic in "topics" attribute and same topic name in "renameTopic.regex" like below, rest all attributes same.
"topics": "my.topic.one",
"transforms.renameTopic.regex": "my.topic.one"
Adding following property to ES sink connector config, solved the issue at hand :
"auto.create.indices.at.start": "false"

Timestamp in avro schema produces incompatible value validation in Kafka Connect JDBC

Error produced by JDBC sink connector:
org.apache.kafka.connect.errors.DataException: Invalid Java object for schema type INT64: class java.util.Date for field: "some_timestamp_field"
at org.apache.kafka.connect.data.ConnectSchema.validateValue(ConnectSchema.java:242)
at org.apache.kafka.connect.data.Struct.put(Struct.java:216)
at org.apache.kafka.connect.transforms.Cast.applyWithSchema(Cast.java:151)
at org.apache.kafka.connect.transforms.Cast.apply(Cast.java:107)
at org.apache.kafka.connect.runtime.TransformationChain.apply(TransformationChain.java:38)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:480)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:301)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:205)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:173)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:170)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:214)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
The avro schema registered by source JDBC connector (MySQL):
{
"type":"record",
"name":"ConnectDefault",
"namespace":"io.confluent.connect.avro",
"fields":[
...
{
"name":"some_timestamp_field",
"type":{
"type":"long",
"connect.version":1,
"connect.name":"org.apache.kafka.connect.data.Timestamp",
"logicalType":"timestamp-millis"
}
},
...
]
}
Looks like the exception is due to this code block: https://github.com/apache/kafka/blob/f0282498e7a312a977acb127557520def338d45c/connect/api/src/main/java/org/apache/kafka/connect/data/ConnectSchema.java#L239
So, in the avro schema, the timestamp field is registered as INT64 with correct (timestamp) logical type. But connect reads the schema type as INT64 and compares it with value type java.util.Date.
Is this a bug, or there is a work around for this? May be I am missing something as this looks like a standard connect model.
Thanks in advance.
UPDATE
Sink connector config:
{
"name": "sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "topic",
"connection.url": "jdbc:postgresql://host:port/db",
"connection.user": "user",
"connection.password": "password",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url": "http://host:port",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://host:port",
"auto.create": "true",
"insert.mode": "upsert",
"pk.mode": "record_value",
"pk.fields": "id"
}
}
Deserialised data in Kafka:
{
"id":678148,
"some_timestamp_field":1543806057000,
...
}
We have worked out a work around for the problem. Our goal was to convert the id from BIGINT to STRING(TEXT/VARCHAR) and save the record in downstream db.
But due to an issue (probably https://issues.apache.org/jira/browse/KAFKA-5891), casting the id field was not working. Kafka was trying to validate the timestamp fields also in the casting chain, but was reading the schema type/name wrong and resulting a type mismatch (see the above record body and error log).
So we made a work around as follows:
extract only the id field as key -> execute cast transform on the key -> it works as key does not contain timestamp field.
Here is the worked around configuration:
{
"name": "sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "topic",
"connection.url": "jdbc:postgresql://host:port/db",
"connection.user": "user",
"connection.password": "password",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url": "http://host:port",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://host:port",
"transforms": "createKey,castKeyToString",
"transforms.createKey.type": "org.apache.kafka.connect.transforms.ValueToKey",
"transforms.createKey.fields": "id",
"transforms.castKeyToString.type": "org.apache.kafka.connect.transforms.Cast$Key",
"transforms.castKeyToString.spec": "id:string",
"auto.create": "true",
"insert.mode": "upsert",
"pk.mode": "record_key",
"pk.fields": "id"
}
}
Disclaimer: This is not a proper solution, just a work around. The bug in casting transform should be fixed. In my opinion, the casting transform should only have concerns with the fields designated for casting, not other fields in the message.
Have a good day.

confluent elasticsearch sink regex to assign to all topics and send to index

I have multiple topics in like this
client1-table1
client1-table2
client1-table3
client1-table4
I want my elasticsearch sink to listen for any incoming messages and send them to an index accordingly. However my current configurations are not working... what can I do... below is my elasticsearch sink
{
"name": "es-data",
"config": {
"_comment": "-- standard converter stuff -- this can actually go in the worker config globally --",
"connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url": "http://localhost:8081",
"value.converter.schema.registry.url": "http://localhost:8081",
"_comment": "--- Elasticsearch-specific config ---",
"_comment": "Elasticsearch server address",
"connection.url": "http://127.0.0.1:9200",
"_comment": "If the Kafka message doesn't have a key (as is the case with JDBC source) you need to specify key.ignore=true. If you don't, you'll get an error from the Connect task: 'ConnectException: Key is used as document id and can not be null.",
"key.ignore": "true"
}
}

Resources