confluent kafka connect elasticsearch sink throughput drops permanently after task restart - elasticsearch

I have a topic with 7 million records (3 partitions) and deploy an Elasticsearch sink with 1 task using mostly the default configurations. The sink starts by creating the index in Elasticsearch and then starts writing at a rate of 10,000 msgs/second. If I make any changes to the connector's tasks
pause the connector, restart the task, start the connector
leave connector running but restart the task
The throughput drops to 400 msgs/second and never recovers to the original 10,000/sec.
If I stop the connector, delete the index from Elasticsearch and resume the connector it goes back to sinking 10k messages/sec.
I've tried changing the connector configs away from the defaults with no results.
connection.timeout.ms=1000
batch.size=2000
max.retries=5
max.in.flight.requests=5
retry.backoff.ms=100
max.buffered.records=20000
flush.timeout.ms=10000
read.timeout.ms=3000
My connector config
connector.class=io.confluent.connect.elasticsearch.ElasticsearchSinkConnector
type.name=logdata
errors.log.include.messages=true
tasks.max=1
topics=d8.qa.id.log.sso.transformed.0
key.ignore=true
schema.ignore=true
value.converter.schemas.enable=false
elastic.security.protocol=PLAINTEXT
name=elasticsearch-sink-d8.qa.id.log.transformed
connection.url=http://172.30.2.23:9200,http://172.30.0.158:9200,http://172.30.1.63:9200
client.id=elasticsearch-sink-d8.qa.id.log.transformed
Environment Details
Elasticsearch 6.8 (10 data nodes, 3 master)
Elasticsearch connector (version 2.2.1)
Kafka Connect (2 workers with 16GB memory, version 2.2.1)
Kafka Broker (3 brokers with 32GB memory, version 2.2.1)
NOTES:
Same behaviour with ES 7.2 and Elasticsearch connector version 2.3.1
This is the only connector on deployed to the connect cluster

This is a known issue for the Confluent Platform 5.3.x and below caused by the index not being cached if the index isn't created by JestElasticsearchClient. The fixes PR-340 and PR-309 have been merged and will be deployed with Confluent Platform 5.4.

Related

Sending data from elasticsearch to kafka and finally to influxdb?

I would like to know how can I send data from elasticsearch to kafka and then to influxdb?
I've already tried using confluent platform with sources connector from elasticsearch and sink connector from influxdb, but the problem is that I'm stuck on sending data from elasticsearch to kafka
moreover once my computer is off I no longer have the backup of the connectors and I have to start from scratch
that's why my questions:
How to send data from elasticsearch to kafka? using confluent platform?
Do I really have to use confluent platform if I want to use kafka connect?
Kafka Connect is Apache 2.0 Licensed and is included with Apache Kafka download.
Confluent (among other companies) write plugins for it, such as Sinks to Elasticsearch or Influx.
It appears the Elasticsearch source on Confluent Hub is not built by Confluent, for example.
Related - Use Confluent Hub without Confluent Platform installation
once my computer is off I no longer have the backup of the connectors and I have to start from scratch
Kafka Connect distributed mode stores its config data in Kafka topics... Kafka defaults to store topic data in /tmp... Which is deleted when you shutdown your computer
Similarly, if you are using Docker for any of these systems without mounted volumes, Docker also is not persistent by default

How to connect Flink and Elasticsearch in Pyflink?

I aim to create a project related to Kafka > Flink > ElasticSearch > Kibana with real time processing.
I can consume messages from Kafka in Flink but can not to connect Flink and ElasticSearch. How can I send kafka messages Flink consumed to ElasticSearch?
My python 3.8 environment includes: apache-flink=1.15.0
You can use the Table API to create an Elasticsearch Sink table:
https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/python/table/python_table_api_connectors/#how-to-use-connectors
https://nightlies.apache.org/flink/flink-docs-stable/docs/connectors/table/elasticsearch/#how-to-create-an-elasticsearch-table
If you need to convert your DataStream to the Table API you can find some help in here: https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/table/data_stream_api/

Confluent cloud elastic search sink connector

I want to connect the confluent cloud to elastic search (local environment). Is it possible to connect the local elasticsearch to confluent cloud kafka?.
Thanks,
Bala
Yes; a local instance of Kafka Connect with the ElasticSearch sink connector can be installed and consume from Confluent Cloud (or any Kafka cluster)
A connector running in the cloud would be unlikely to connect/write to your local instance without port-forwarding from your router, thus why you should consume from a remote cluster and write to Elastic locally.

How to configure a fallback in Logtash if Elasticsearch is disconnected?

We are going to deploy an Elasticsearch in a VM and configure our Logstash output to point to it. We don't plan for a multiple node cluster or cloud for hosting Elasticsearch. But we are checking for any possibility to fallback to our system-locally run Elasticsearch service, in case of any connection failure to the VM hosted Elasticsearch.
Is it possible to configure in Logstash in any way to have such fallback, in case connection to Elasticsearch is not available.
We use 5.6.5 version of Logstash and Elasticsearch. Thanks!

Upgrading consumers from zk based offset storage to kafka based storage

I am using Golang and Sarama client. Kafka version is 0.9 which I plan to upgrade.
I am planning to upgrade sarama clients to latest version and use sarama-cluster instead of wvanbergen/kafka. I see that offset will be committed to kafka now.
On Apache Kafka page it says that for doing migration from zk based storage to kafka you need to do following:
Set offsets.storage=kafka and dual.commit.enabled=true in your consumer config.
There is no such property in wvanbergen/kafka library and they don't have plans to add it too.
Has anyone performed a similar upgrade from wvanbergen/kafka to sarama-cluster without the dual.commit.enabled settings on a production system? How did you migrate offsets from zk to kafka?

Resources