I aim to create a project related to Kafka > Flink > ElasticSearch > Kibana with real time processing.
I can consume messages from Kafka in Flink but can not to connect Flink and ElasticSearch. How can I send kafka messages Flink consumed to ElasticSearch?
My python 3.8 environment includes: apache-flink=1.15.0
You can use the Table API to create an Elasticsearch Sink table:
https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/python/table/python_table_api_connectors/#how-to-use-connectors
https://nightlies.apache.org/flink/flink-docs-stable/docs/connectors/table/elasticsearch/#how-to-create-an-elasticsearch-table
If you need to convert your DataStream to the Table API you can find some help in here: https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/table/data_stream_api/
Related
I have several Flink data streams that will ultimately end up in Elasticsearch. Is it better to use Flink's Elasticsearch sink or Flink's Kafka sink combined with Kafka's Elasticsearch sink? What would the tradeoffs be?
I would like to know how can I send data from elasticsearch to kafka and then to influxdb?
I've already tried using confluent platform with sources connector from elasticsearch and sink connector from influxdb, but the problem is that I'm stuck on sending data from elasticsearch to kafka
moreover once my computer is off I no longer have the backup of the connectors and I have to start from scratch
that's why my questions:
How to send data from elasticsearch to kafka? using confluent platform?
Do I really have to use confluent platform if I want to use kafka connect?
Kafka Connect is Apache 2.0 Licensed and is included with Apache Kafka download.
Confluent (among other companies) write plugins for it, such as Sinks to Elasticsearch or Influx.
It appears the Elasticsearch source on Confluent Hub is not built by Confluent, for example.
Related - Use Confluent Hub without Confluent Platform installation
once my computer is off I no longer have the backup of the connectors and I have to start from scratch
Kafka Connect distributed mode stores its config data in Kafka topics... Kafka defaults to store topic data in /tmp... Which is deleted when you shutdown your computer
Similarly, if you are using Docker for any of these systems without mounted volumes, Docker also is not persistent by default
I'm trying to integrate Apache Kafka with Elastic Stack(Beats, Logstash, Elasticsearch, and Kibana)
From the diagram, Kafka is located between Beats and Logstash. I was wondering if I can put another Kafka between Logstash and Elasticsearch. (Where I drew with a red pen.)
Two Kafka sound okay?
Any ideas or thoughts to share?
Yes.
Logstash can write to Kafka as an output.
You can use Kafka Connect Elasticsearch for streaming from Kafka into Elasticsearch.
If you want to buffer/scale the output from Logstash by using Kafka here, it is possible and would make sense.
But bear in mind that you could also:
(a) write from Beats to Kafka and do any processing with KSQL/Kafka Streams etc to write back to Kafka and then Kafka Connect to Elasticsearch
or
(b) just write from Logstash to Elasticsearch
I'm looking to consume from Kafka and save data into Hadoop and Elasticsearch.
I've seen 2 ways of doing this currently: using Filebeat to consume from Kafka and send it to ES and using Kafka-Connect framework. There is a Kafka-Connect-HDFS and Kafka-Connect-Elasticsearch module.
I'm not sure which one to use to send streaming data. Though I think that if I want at some point to take data from Kafka and place it into Cassandra I can use a Kafka-Connect module for that but no such feature exists for Filebeat.
Kafka Connect can handle streaming data and is a bit more flexible. If you are just going to elastic, Filebeat is a clean integration for log sources. However, if you are going from Kafka to a number of different sinks, Kafka Connect is probably what you want. I'd recommend checking out the connector hub to see some examples of open source connectors at your disposal currently http://www.confluent.io/product/connectors/
Is it possible to receive live input streams of logs from Logstash or Elasticsearch into Spark Streaming?
I see there's a builtin Flume receiver. But any existing Custom receivers for Logstash or Elasticsearch?
Possibly the best solution currently seems to be to use the logstash output plugin for kafka and then read the kafka topic using spark kafka receiver
http://spark.apache.org/docs/latest/streaming-kafka-integration.html