Setting up an ELK cluster - elasticsearch

I am trying to build a log pipe using RabbitMQ + ELK on Windows Servers.
RabbitMQ --> Logstash --> ElasticSearch --> Kibana.
Ideally i want to have 2 instances to RabbitMQ, 2 of Logstash, 3 of ElasticSearch and 1 Kibana.
Has anyone setup up something like this ? I know we can setup ElasticSearch cluster easily via setting the cluster name in the yml. What is the mechanism for lagstash to write to the ES cluster ?
Should i setup RabbitmQ+Logstash combos in each instance so that if MQs are behind a load balancer, each MQ will have its own logstash output instance and from there data goes to the cluster.

Technically you could write directly from Logstash to ES using elasticsearch output plugin or Elasticsearch_http output plugin(if using ES version not compatible with Logstash). That said for an enterprise scenario you would need fault tolerance and to handle volume, its a good idea to have RabbitMQ/Redis.
Your above config looks good, although input to your Rabbit cluster would be from one or many Logstash shippers(instances running on the client machines where logs live), that would point to a HA RabbitMQ cluster. Then a Logstash indexer whose input would be configured to look at the RabbitMQ queue(s)and output it to Elastic search cluster.
Hope that helps.

It's not recommended to put directelly the DATA from Logstash to ES.
ES Write is slow , so in heavy load you can loose data .
The idea is to add a proxy between Logstash and ES .
Logstash --> Proxy --> Elasticsearch
Logstash support Redis and RabbitMQ as a proxy .
This proxy can handle large Inputs and work as a queue mechanism .
Logstash is putting Redis as a primary choice (Because of simplicity of setup and monitoring).

Related

How to configure filebeat for logstash cluster environment?

I am missing something very basic when I think of how Filebeat will be configured in a clustered logstash setup.
As per the article
https://www.elastic.co/guide/en/logstash/current/deploying-and-scaling.html
and this architecture diagram
I think that there is some kind of load balancer in front of the logstash cluster. However, the Filebeat output documentation suggests that there must be an array of all the Logstatsh nodes specified. Using this list of nodes, Filebeat will do the load balancing from the client-side.
Also as per this GitHub issue, there is no native logstash clustering available yet.
So, my question is, what kind of setup do I need to be able to point my multiple Filebeat to one logstash service endpoint without specifying the logstash nodes in the cluster?
Is it possible?
Would having load balancer in front of Logstash cluster be of any help?
Thanks,
Manish
Since the Logstash clustering feature is still in the works and you don't want to specify all the Logstash hosts inside all your Beats configurations, then the only solution I see is to use a TCP load balancer in front of Logstash.
All your Beats would point to that load balancer endpoint and you can manage your Logstash cluster behind that load balancer as you see fit. Be aware, though, that you're adding a hop (hence latency) between your Beats and your Logstash cluster.

How to generate huge random data and populate Elastic search running on K8S cluster?

I've K8S cluster up and running. There is Elastic search and Kibana deployed on the K8S cluster.
I need to populate ES with almost 25 t0 50GB of random data to Elastic search for testing. Any easy way to achieve this. I'm a newbie to ES and K8S. Any inputs or pointers will be of great help.
You can use Logstash for ingesting data to the elasticsearch. Logstash supports various input plugins from elasticsearch log4j to S3. You can try ingesting data from any one of the sources that logstash supports as input plugin.
[https://www.elastic.co/guide/en/logstash/current/input-plugins.html][1]

Centralized logging with Kafka and ELK stack

There are more than 50 Java applications (They are not microservices, so we don't have to worry about multiple instance of the service). Now my architect designed a solution to get the log files and feed into a kafka topic and from kafka feed it into logstash and push it to elastic search so we can view the logs in kibana. Now I am new to Kafka and ELK stack. Will someone point me to a right direction on how to do this task. I learnt that Log4J and SLF4J can be configured to push the logs to kafka topic.
1. Now how to consume from kafka and load it into logstash? Do I have to write a kafka consumer or we can do that just by configuration?
2. How logstash will feed the logs to elastic search?
3. How can I differentiate all the 50 application logs, do i have to create topic for each and every application?
I put the business problem, now I need step by step expert advice. - Thanks in advance.
Essentially what your architect has laid out for you can be divided into two major components based upon their function (on architecture level);
Log Buffer (Kafka)
Log Ingester (ELK)
[Java Applications] =====> [Kafka] ------> [ELK]
If you study ELK you would feel like it is sufficient for your solution and Kafka would appear surplus. However, Kafka has important role to play when it comes to scale. When many of your Java applications would send logs to ELK, ELK may become overloaded and break.
To avoid ELK from overload your architect has setup a buffer (Kafka). Kafka will receive logs from applications and queue it up in case ELK is under load. In this way you do not break ELK and also you do not loose logs when ELK is struggling.
Answers to your questions in the same order;
(1) Logstash has 'input' plugins that can be used to setup a link between Kafka & Logstash. Read on Logstash and its plugins.
i- Logstash Guide or Reference
ii- Input Plugins (scroll down to find Kafka plugin)
(2) Logstash will feed received logs to Elasticsearch by Output plugin for Elasticsearch. See Logstash output plugin for Elasticsearch.
(3) I may not be spot-on on this, but I think you would be able to filter & distinguish the logs at the Logstash level once you receive it from Kafka. You could apply tags or fields to each log message on reception. This additional info will be used by Elasticsearch to distinguish the applications from one another.
Implementation Steps
As somebody who is new to Kafka & ELK follow these steps to your solution;
Step 1: Start by setting up ELK first. Once you do that you would be able to see how logs are visualized and will become clearer how end solution may look like.
Guide to ELK Stack
Step 2: Setup Kafka to link your application logs to ELK.
Caveats:
You may find ELK to have some decent learning curve. Much time is required to understand how each element in the ELK stack works and what is its individual configuration and languages are.
To have deep understanding of ELK use the local deployment path where you setup ELK on your system. Avoid the cloud ELK services for that matter.
Logstash has a kafka input and an elasticsearch output, so this is configuration on the logstash side. You could differentiate the application using configuration on the log4j side (although using many topics is another possibility).

Logstash cluster output to Elasticseach cluster without multicast

I want to run logstash -> elasticsearch with high availability and cannot find an easy way to achieve it. Please review how I see it and correct me:
Goal:
5 machines each running elasticsearch united into a single cluster.
5 machines each running logstash server and streaming data into elasticsearch cluster.
N machines under monitoring each running lumberjack and streaming data into logstash servers.
Constraint:
It is supposed to be run on PaaS (CoreOS/Docker) so multi-casting
discovery does not work.
Solution:
Lumberjack allows to specify a list of logstash servers to forward data to. Lumberjack will randomly select the target server and switch to another one if this server goes down. It works.
I can use zookeeper discovery plugin to construct elasticsearch cluster. It works.
With multi-casting each logstash server discovers and joins the elasticsearch cluster. Without multicasting it allows me to specify a single elasticsearch host. But it is not high availability. I want to output to the cluster, not a single host that can go down.
Question:
Is it realistic to add a zookeeper discovery plugin to logstash's embedded elasticsearch? How?
Is there an easier (natural) solution for this problem?
Thanks!
You could potentially run a separate (non-embedded) Elasticsearch instance within the Logstash container, but configure Elasticsearch not to store data, maybe set these as the master nodes.
node.data: false
node.master: true
You could then add your Zookeeper plugin to all Elasticsearch instances so they form the cluster.
Logstash then logs over http to the local Elasticsearch, who works out where in the 5 data storing nodes to actually index the data.
Alternatively this Q explains how to get plugins working with the embedded version of Elasticsearch Logstash output to Elasticsearch on AWS EC2

Logstash output to server with elasticsearch

I intend to run logstash on multiple clients, which in turn would submit their logstash reports to the elastic search on a server(a Ubuntu machine, say).
Thus there are several clients running logstash outputting their logs to the elastic search on a COMMON server.
Is this o/p redirection to a server possible with Logstash on the various clients?
If yes, what would the configuration file be?
You need a "broker" to collect the outputs from each of the servers.
Here's a good tutorial:
http://logstash.net/docs/1.1.11/tutorials/getting-started-centralized

Resources