Confluent Elasticsearch Sink connector, write.method : "UPSERT" on different key - elasticsearch

In COnfluent Elasticsearch Sink connector, I am trying to write in same Elasticsearch index from two different topics. First topic is INSERT and another topic is UPSERT. For UPSERT, I want to update the JSON document based on some other field instead of "_id". IS that possible ? If yes, How can I do that ?

Use key.ignore=false and use existing primary key columns as _id for each json document.

Related

How to put two KSQLDB tables in the same index in Elasticsearch

I have two tables in KSQLDB that I want to put in the same index in Elasticsearch
But the Elasticsearch Service Sink Connector for Confluent Platform does not support
topic changes like:
io.confluent.connect.transforms.ExtractTopic$Key
io.confluent.connect.transforms.ExtractTopic$Value
as seen in the documentation https://docs.confluent.io/kafka-connect-elasticsearch/current/overview.html#limitations
Are there other ways of doing it?

How to Match Data between two indexes in elastic search

I've got two indexes one with customer data and the other with netflow.
I want to match the data while entering to the netflow index and match it with other index, if there is a match I want to mutate the data and add the customer id.
I tried using logstash but nothing works ok :|
any ideas?
Thanks in advice
Logstash looks to be the best strategy.
You can use a logstash input to read your netflow index (or use logstash to ingest your netflow directly)
Then in an elasticsearch filter you will query your customer index, find the good customer document, and add the data on your netflow event.
In an elasticsearch output, you update (or ingest) your enhanced netflow document.
I use this strategy for data fixes and data enhancement, when a enrich processor is not the good strategy.

Kafka-connect elasticsearch auto-lowercase topic name for for index

I'm using elasticsearch sink kafka-connector to index messages from multiple kafka topics to elasticsearch.
I have my topics with camelCase naming, and I can't change it. So when starting up the ES sink connector, it does not index anything because elaticsearch has problems with non-lowercase index names.
I know I can use topic.index.map property to manually convert topic name to index.
topic.index.map=myTopic1:mytopic1, myTopic2:mytopic2,...
Is there a way to convert to lowercase automatically? I have dozens of topics to convert, and I suspect it to be around hundred soon.
Found out that since 5.1 they do that automatically, if mapping is not specified for the topic.
from here:
final String indexOverride = topicToIndexMap.get(topic);
String index = indexOverride != null ? indexOverride : topic.toLowerCase();
See this commit for details.
As of recent versions of the Elasticsearch sink connector this is done automatically. The PR that fixed this was https://github.com/confluentinc/kafka-connect-elasticsearch/pull/251

Using elasticsearch generated ID's in kafka elasticsearch connector

I noticed that documents indexed in elasticsearch using the kafka elasticsearch connector have their ids in the following format topic+partition+offset.
I would prefer to use id's generated by elasticsearch. It seems topic+partition+offset is not usually unique so I am loosing data.
How can I change that?
As Phil says in the comments -- topic-partition-offset should be unique, so I don't see how this is causing data loss for you.
Regardless - you can either let the connector generate the key (as you are doing), or you can define the key yourself (key.ignore=false). There is no other option.
You can use Single Message Transformations with Kafka Connect to derive a key from the fields in your data. Based on your message in the Elasticsearch forum it looks like there is an id in your data - if that's going to be unique you could set that as your key, and thus as your Elasticsearch document ID too. Here's an example of defining a key with SMT:
# Add the `id` field as the key using Simple Message Transformations
transforms=InsertKey, ExtractId
# `ValueToKey`: push an object of one of the column fields (`id`) into the key
transforms.InsertKey.type=org.apache.kafka.connect.transforms.ValueToKey
transforms.InsertKey.fields=id
# `ExtractField`: convert key from an object to a plain field
transforms.ExtractId.type=org.apache.kafka.connect.transforms.ExtractField$Key
transforms.ExtractId.field=id
(via https://www.confluent.io/blog/building-real-time-streaming-etl-pipeline-20-minutes/)
#Robin Moffatt, as much as I see it, topic-partition-offset can cause duplicates in case that upgrade your kafka cluster, but not in rolling upgrade fashion but just replace cluster with cluster (which is sometime easier to replace). In this case you will experience data loss because of overwriting data.
Regarding to your excellent example, this can be the solution for many of the cases, but I'd add another option. Maybe you can add epoc timestamp element to the topic-partition-offset so this will be like this topic-partition-offset-current_timestamp.
What do you think?

How to move unique Ids from Kafka to elastic search?

How to move unique Ids from Kafka to elastic search ?
I have done using Elasticsearch connector from kafka to Elasticsearch But it is sending the entire Data. I need to send only the Unique Ids from kafka to ES

Resources