Elasticsearch Kafka Connector - setting index based on message value

Elasticsearch Kafka Connector - setting index based on message value - elasticsearch

I have messages coming through a Kafka topic in the following format:
{"elasticsearch_index": "index_1", "first_name": "Jane"}
{"elasticsearch_index": "index_2", "first_name": "John"}
Note that each message contains the desired Elasticsearch index that we would like to route the record to. Is it possible to use Confluent's Elasticsearch Kafka Connector to route these records to the appropriate index name (e.g. whatever is listed under the elasticsearch_index key)?
It doesn't look like the Single Message Transforms (SMT) supports this behavior currently, but maybe I am misreading. Any info will be greatly appreciated.

Two options:
Write your own Transform using the Single Message Transform API
Use KSQL (or Kafka Streams) to route the messages to required topics first, and then use the new (Apache Kafka 1.1) regex capabilities to land those topics to Elasticsearch from Kafka Connect.

Related

For Kafka sink Connector I send a single message to multiple indices documents in elasticseach?

I am recieving a very complex json inside a topic message, so i want to do some computations with it using SMTs and send to different elasticsearch indice documents. is it possible?
I am not able to find a solution for this.

The Elasticsearch sink connector only writes to one index, per record, based on the topic name. It's explicitly written in the Confluent documentation that topic altering transforms such as RegexRouter will not work as expected.
I'd suggest looking at logstash Kafka input and Elasticsearch output as an alternative, however, I'm still not sure how you'd "split" a record into multiple documents there either.
You may need an intermediate Kafka consumer such as Kafka Streams or ksqlDB to extract your nested JSON and emit multiple records that you expect in Elasticsearch.

Running awk in logstash

I do not have the ability to do much but receive unstructured syslogs from Kafka which have been produced with logstash.
When I attach logstash as a consumer, these syslogs are all over the place and contain half a dozen patterns or more which very wildly. This is something more fitting to be run somehow streamed with an awk filter since the programmatic approach to passing incoming messages is actually quite sttisghtforward with such a tool.
Does anyone have any input on how one could attach a consumer to a Kafka topic and procure incoming logs and ship these logs in am intelligent way towards an elasticsearch clister?

Try to use grok expressions in your LOGSTASH config to parse the logs https://logz.io/blog/logstash-grok/ . This should allow you to filter, transform or drop data.
Or use something like CRIBL in between KAFKA and ELASTIC https://docs.cribl.io/stream/about/
Note on the CRIBL page how under sources KAFKA is one of the supported sources and ELASTIC is one of the supported destinations. This should allow to transform your data before ingesting it into ELASTIC.

Kafka JDBC Sink handle array datatype

I know that Kafka JDBC Sink Connector have some drawbacks for array datatype. However is it possible to combine the Sink Connector with a simple Kafka Connector which can support array datatype. How can I filter from Kafka configuration and switch them into simple Kafka Connector configuration What does simple Kafka Configuration mean? How can Kafka Connect support array fields
name: topic_name
type: array
item: Topic file
Is this possible where it will consume to the db as a string not an array
"fields":[{
"name":"item_id",
"type":{
"type":"array",
"items":["null", "string"]
},
"default":[]
}]
}

Kafka Connect framework itself doesn't expose limitations around types, it's in the source code for the JDBC sink that arrays are rejected.
There's an outstanding PR to support it for Postgres - https://github.com/confluentinc/kafka-connect-jdbc/pull/805
Unclear what you mean by "simple", but if you want to use a different connector, then you'd need to install it, and then change the class. For example, maybe the MongoDB sink handles arrays. I know that S3 and HDFS sinks do...
is it possible to combine the Sink Connector with a simple Kafka Connector
Again, unsure what you mean by this, but connectors generally don't "chain together". While you could use MirrorMaker2 with a transform to effectively do the same work as Kafka Streams, best to use the more appropriate tools for that
Is this possible where it will consume to the db as a string not an array
Sure, if the message field is actually a string. As suggested, you're going to need to process the message before the sink connector consumes it

Kafka Connect to persist topic to Elasticsearch index using field from (json) message

I'm attempting to index messages in Elasticsearch using SMT's from Kafka's Connect API only.
So far I had luck with simply using the topic and timestamp router functionality.
However, now I'd like to create separate indices based on a certain field in the message.
Suppose the messages are formatted as such:
{"productId": 1, "category": "boat", "price": 135000}
{"productId": 1, "category": "helicopter", "price": 300000}
{"productId": 1, "category": "car", "price": 25000}
Would it somehow be possible to index these to the following indices based on product category?
product-boat
product-helicopter
product-car
or would I have to create separate topics for every single category (knowing that it could become hundreds or thousands of them)?
Am I overseeing a transform that could do this or is this simply not possible and will a custom component have to be built?

There's nothing out of the box with Kafka Connect that will do this. You have a few options:
The Elasticsearch sink connector will route messages to a target index based on its topic, so you could write a custom SMT that would inspect a message and route it to a different topic accordingly
Use a stream processor to pre-process the messages such that they're already on different topics by the time they are consumed by the Elasticsearch sink connector. For example, Kafka Streams or KSQL.
KSQL you would need to hard code each category (CREATE STREAM product-boat AS SELECT * FROM messages WHERE category='boat' etc)
Kafka Streams now has Dynamic Routing (KIP-303) which would be a more flexible way of doing it
Handcode a bespoke Elasticsearch sink connector with the logic coded in to route the messages to indices based on message contents. This feels like the worst of the three approach IMO.

If you are using Confluent Platform you can do some kind of routing depends on field value in the message.
To do that you have to use ExtractTopic SMT from Confluent. More details regarding that SMT can be found at https://docs.confluent.io/current/connect/transforms/extracttopic.html#extracttopic
Kafka Sink Connector processes messages, that are represented by SinkRecord. Each SinkRecord contains of several fields: topic, partition, value, key, etc. Those fields are set by Kafka Connect and using transformation you can change those value. ExtractTopic SMT changes value of topic based on value or key of the message.
Transformations configuration will be something like that:
{
...
"transforms": "ExtractTopic",
"transforms.ExtractTopic.type": "io.confluent.connect.transforms.ExtractTopic$Value",
"transforms.ExtractTopic.field": "name", <-- name of field, that value will be used as index name
...
}
One limitation is, that you have to create indices in advance.
How I assume you are using Elasticsearch Sink Connector. Elasticsearch connector has ability to create index, but it does it when its opens - method to create writers for particular partition (ElasticsearchSinkTask::open). In your use case at that moment all indices can't be created, because value of all messages are not available.
Maybe it isn't the purest approach, because ExtractTopic should be rather used for Source connectors, but in you case it might work.

Topic mapping when streaming from Kafka to Elasticsearch

When I transfer or stream two and three tables then I can easily map in Elasticsearch but can I map automatically map topics to index
I have streamed data from PostgreSQL to ES by mapping manually topic.index.map=topic1:index1,topic2:index2, etc.
Can I map automatically whatever topics send by producer then consumer consume in ES connector automatically?

By default, the topics map directly to an index of the same name.
If you want "better" control, you can use RegexRouter in a transforms property
To quote the docs
topic.index.map
This option is now deprecated. A future version may remove it completely. Please use single message transforms, such as RegexRouter, to map topic names to index names
If you cannot capture a single regex for each topic in the connector, then run more connectors with a different pattern

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Elasticsearch Kafka Connector - setting index based on message value - elasticsearch

Two options: Write your own Transform using the Single Message Transform API Use KSQL (or Kafka Streams) to route the messages to required topics first, and then use the new (Apache Kafka 1.1) regex capabilities to land those topics to Elasticsearch from Kafka Connect.

Related

For Kafka sink Connector I send a single message to multiple indices documents in elasticseach?

Running awk in logstash

Kafka JDBC Sink handle array datatype

Kafka Connect to persist topic to Elasticsearch index using field from (json) message

Topic mapping when streaming from Kafka to Elasticsearch

Categories

Resources