Logstash - prevent to loading logdata - elasticsearch

I have log data from tomcat server(logback), and would like to analysis it.
So, this is my thought.
logback log data -> logstash -> elasticsearch -> request elastic query.
From some other architecture, there is redis in front of logback, becuase logback has no buffer.
If I would like to do this in real time, is redis requeried?
Or no need it because all data is already stored in elastic? than why are they using redis for it?

Related

Is there any Elastic Search appender for directly sending(storing) spring-boot application logs to Elastic Search without using ELK stack

We are planning to store our (Spring-Boot) application logs to ElasticSearch. I am aware of ELK stack, which uses FileBeat + LogStash to collect and process the logs.
What is desired: Have an appender in logback.xml to directly send the logs to ElasticSearch. The very basic idea is of having an appender like File-Appenders with the difference of target for storing logs being ElasticSearch. At the same time, we want to do it in asynchronous manner. FYI, we are using slf4j with logback implementations for logging.
More specifically: We want to remove the intermediators:: Logstash or Beats as they will need more infra and may bring unwanted overhead. And having the process of sending logs to ElasticSearch in asynchronous way would be really great (so that application does not suffer latency due to logging).
What I have already tried:
Send Spring Boot logs directly to LogStash. But it seems of not much use, since it internally uses file appenders and the logs are then sent to LogStash.
Is there any such appenders available? Or maybe there is some workaround.

How to configure the Kafka Cluster to work with Elastic Search Cluster?

I have to build a log-cluster and monitoring cluster ( For high-availability ) like this topology. I'm wondering to know how to config those log-shippers clusters. ( I have 2 Topo in the Image)
If I use Kafka with FileBeat in Kafka Cluster, Will Elastic Search
receive duplication data because Kafka has replicas in data?
If I use Logstash (In Elastic Search Cluster) for getting logs from
Kafka Cluster, how the config should be because I think that
Logstash will not know where to read the log efficiency on Kafka
Cluster.
Cluster topology
Thanks for reading. If you have any idea, please discuss with me ^^!
As i see both configurations are compatible with Kafka, you can use filebeat, logstash or mixed them in consumer and producer stages!
IMHO all depends about your needs, ie: sometimes we use some filters to rich the data before ingest to kafka (producer stage), or before index the data to elastic (consumer stage), in this case is better work with logsatsh, because is easier using filters than in filebeat
But if you want to play with raw data, maybe filebeat is betther, because the agent is lighter.
About your questions:
Kafka has the data replicted, but for HA propouses, you only read one time the data with the same consumer group
For read the log from kafka with logstash, you can use the logstash input plugin for kafka, is easy and works fine!
https://www.elastic.co/guide/en/logstash/current/plugins-inputs-kafka.html

Fastest way to send logs from Kafka to Elasticsearch

I am looking for the fastest log shipper which can directly transfer my logs to elasticsearch from kafka.
I can name some ways to do this:
Kafka -> Elasticsearch
Kafka -> Logstash -> Elasticsearch
Kafka -> Golang -> Elasticsearch
Kafka -> rsyslog -> Elasticsearch
Kafka -> java/c/c++ -> Elasticsearch ...
Can someone tell me which is the fastest way (Highest EPS with the same resource) to do the job?
Thanks in advance!
Fastest is kinda hard to say; there are several good options and it's going to come down to factors including your hardware, message size, and so on.
For any integration in and out of Kafka, my starting point is always Kafka Connect—since it's part of Apache Kafka itself. There is a connector for Elasticsearch which you can download standalone for use with an existing Kafka Connect cluster, or indeed obtain as part of Confluent Platform.
Disclaimer: I work for Confluent.

Kafka to Elasticsearch, HDFS with Logstash or Kafka Streams/Connect

I use Kafka for message queue/processing. My question is about performance/best practice. I will do my own performance tests but maybe someone has results/experience already.
The data is raw in a Kafka (0.10) topic and I want to transfer it structured to ES and HDFS.
Now I see 2 possibilities:
Logstash (Kafka input plugin, grok filter (parsing), ES/webhdfs output plugin)
Kafka Streams (parsing), Kafka Connect (ES sink, HDFS sink)
Without any tests I would say that the second option is better/cleaner and more reliable?
Logstash "best practice" for getting data into Elasticsearch. WebHDFS won't have the raw performance of the Java API that is part of the Kafka Connect plugin, however.
Grok could be done in a Kafka Streams process, so your parsing could be done in either location.
If you are on an Elastic subscription, then they would like to sell Logstash. Confluent would like to sell Kafka Streams + Kafka Connect.
Avro seems to be the best medium for data transfer, and the Schema Registry is a popular way to do that. IIUC, Logstash doesn't work well with a Schema Registry or Avro, and prefers JSON.
In the Hadoop landscape, I would offer the intermediate options of Apache Nifi or Streamsets.
In the end, it really depends on your priorities, and how well you (and your team) can support these tools.

Kafka-Connect vs Filebeat & Logstash

I'm looking to consume from Kafka and save data into Hadoop and Elasticsearch.
I've seen 2 ways of doing this currently: using Filebeat to consume from Kafka and send it to ES and using Kafka-Connect framework. There is a Kafka-Connect-HDFS and Kafka-Connect-Elasticsearch module.
I'm not sure which one to use to send streaming data. Though I think that if I want at some point to take data from Kafka and place it into Cassandra I can use a Kafka-Connect module for that but no such feature exists for Filebeat.
Kafka Connect can handle streaming data and is a bit more flexible. If you are just going to elastic, Filebeat is a clean integration for log sources. However, if you are going from Kafka to a number of different sinks, Kafka Connect is probably what you want. I'd recommend checking out the connector hub to see some examples of open source connectors at your disposal currently http://www.confluent.io/product/connectors/

Resources