How to configure filebeat for logstash cluster environment? - elasticsearch

I am missing something very basic when I think of how Filebeat will be configured in a clustered logstash setup.
As per the article
https://www.elastic.co/guide/en/logstash/current/deploying-and-scaling.html
and this architecture diagram
I think that there is some kind of load balancer in front of the logstash cluster. However, the Filebeat output documentation suggests that there must be an array of all the Logstatsh nodes specified. Using this list of nodes, Filebeat will do the load balancing from the client-side.
Also as per this GitHub issue, there is no native logstash clustering available yet.
So, my question is, what kind of setup do I need to be able to point my multiple Filebeat to one logstash service endpoint without specifying the logstash nodes in the cluster?
Is it possible?
Would having load balancer in front of Logstash cluster be of any help?
Thanks,
Manish

Since the Logstash clustering feature is still in the works and you don't want to specify all the Logstash hosts inside all your Beats configurations, then the only solution I see is to use a TCP load balancer in front of Logstash.
All your Beats would point to that load balancer endpoint and you can manage your Logstash cluster behind that load balancer as you see fit. Be aware, though, that you're adding a hop (hence latency) between your Beats and your Logstash cluster.

Related

Can I use a single elasticsearch/kibana for multiple k8 clusters?

Do you know of any gotcha's or requirements that would not allow using a single ES/kibana as a target for fluentd in multiple k8 clusters?
We are engineering rolling out a new kubernetes model. I have requirements to run multiple kubernetes clusters, lets say 4-6. Even though the workload is split in multiple k8 clusters, I do not have a requirement to split the logging and believe it would be easier to find the logs for pods in all clusters in a centralized location. Also less maintenance for kibana/elasticsearch.
Using EFK for Kubernetes, can I point Fluentd from multiple k8 clusters at a single ElasticSearch/Kibana? I don't think I'm the first one with this thought however I haven't been able to find any discussion of doing this. Found lots of discussions of setting up efk but all that I have found only discuss a single k8 to its own elasticsearch/kibana.
Has anyone else gone down the path of using a single es/kibana to service logs from multiple kubernetes clusters? We'll plunge ahead with testing it out but seeing if anyone else has already gone down this road.
I dont think you should create an elastic instance for each kubernetes cluster, you can run a main elastic instance and index it all logs.
But even if you don`t have an elastic instance for each kubernetes client, i think you sohuld have a drp, so lets says instead moving your logs of all pods to elastic directly, maybe move it to kafka, and then split it to two elastic clusters.
Also it is very depend on the use case, if every kubernetes cluster is on different regions, and you need the pod`s logs in low latency (<1s), so maybe one elastic instance is not the right answer.
Based on [1] we can read:
Fluentd collects logs from pods running on cluster nodes, then routes
them to a central​​​​​​ized Elasticsearch.
Then Elasticsearch ingests these logs from Fluentd and stores them in a central location. It is also used to efficiently search text files.
Kibana is the UI; the user can visualize the collected logs and metrics and create custom dashboards based on queries.
There are several ways in which they can solve your dilemma:
a) Create a centralized dashboard and use each cluster’s Elasticsearch as backend. So you can see all your clusters logs in one place.
b) Create an Elasticsearch cluster and add each Elasticsearch into it. This is NOT the best option since you will duplicate your data several times, you will need to handle each index shards and you will need to fight with the split brain dilemma but it’s great for data resiliency.
c) Use another solution like an APM (New Relic, Instana, etc) to fully centralize your logs in one place.
[1] https://techbeacon.com/enterprise-it/9-top-open-source-tools-monitoring-kubernetes

Tuning logstash performance

I use logstash to connect between elasticsearch and ntopng(a flow collector).
but there are many drop flows, so I think the bottle neck is on logstash because my RAM is 20G and CPU 8 cores.
But I am not sure which parameter could I edit to tune the logstash in the logstash.yml
thank you in advance!
It seems like one step of working out a solution to your problem is to supply decent Logstash monitoring. One good way to achieve this is by installing X-Pack which provides Logstash monitoring in the X-Pack monitoring ui in Kibana.
Please refer to https://www.elastic.co/guide/en/logstash/6.1/logstash-monitoring-ui.html for more information about the Logstash monitoring ui and https://www.elastic.co/guide/en/logstash/6.1/installing-xpack-log.html for information on how to install and configure X-Pack for Logstash.
Apart from Logstash monitoring, you should of course also monitor the used resources on the systems you are running Logstash on. There are several ways to do this, for example with active monitoring solutions, such as Nagios, our passive monitoring solutions such as Elasticsearch with Metricbeat.
Once you know what the bottleneck is, you can go through https://www.elastic.co/guide/en/logstash/6.1/performance-troubleshooting.html and tune Logstash settings or if necessary add more Logstash instances for distributing load.

Where will E L K and filebeat reside

I am working in a distributed environment.. I have a central machine which needs to monitor some 100 machines.
So I need to use ELK stack and keep monitoring the data.
Since elasticsearch, logstash,kibana and filebeat are independent softwares, i want to know where should i ideally place them in my distributed environment.
My approach was to keep kibana, elasticsearch in the central node and keep logstash and filebeat at individual nodes.
Logstash will send data to central node's elasticsearch search which kibana displays it.
Please let me know if this design is right.
Your design is not that bad but if you install elasticsearch on only one server, with time you will face the problem of availability.
You can do this:
Install filebeat and logstash on all the nodes.
Install elasticsearch as a cluster. That way if one node of elasticsearch goes down, another node can easily take over.
Install Kibana on the central node.
NB:
Make sure you configure filebeat to point to more than one logstash server. By so doing, if one logstash fails, filebeat can still ships logs to another server.
Also make sure your configuration of logstash points to all the node.data of your elasticsearch cluster.
You can also go further by installing kibana on says 3 nodes and attaching a load balancer to it. That way your load balancer will choose the instance of kibana that is healthy and display it.
UPDATE
With elasticsearch configured, we can configure logstash as follows:
output {
elasticsearch{
hosts => ["http://123.456.789.1:9200","http://123.456.789.2:9200"]
index => "indexname"
}
}
You don't need to add stdout { codec => rubydebug } in your configuration.
Hope this helps.

Setting up an ELK cluster

I am trying to build a log pipe using RabbitMQ + ELK on Windows Servers.
RabbitMQ --> Logstash --> ElasticSearch --> Kibana.
Ideally i want to have 2 instances to RabbitMQ, 2 of Logstash, 3 of ElasticSearch and 1 Kibana.
Has anyone setup up something like this ? I know we can setup ElasticSearch cluster easily via setting the cluster name in the yml. What is the mechanism for lagstash to write to the ES cluster ?
Should i setup RabbitmQ+Logstash combos in each instance so that if MQs are behind a load balancer, each MQ will have its own logstash output instance and from there data goes to the cluster.
Technically you could write directly from Logstash to ES using elasticsearch output plugin or Elasticsearch_http output plugin(if using ES version not compatible with Logstash). That said for an enterprise scenario you would need fault tolerance and to handle volume, its a good idea to have RabbitMQ/Redis.
Your above config looks good, although input to your Rabbit cluster would be from one or many Logstash shippers(instances running on the client machines where logs live), that would point to a HA RabbitMQ cluster. Then a Logstash indexer whose input would be configured to look at the RabbitMQ queue(s)and output it to Elastic search cluster.
Hope that helps.
It's not recommended to put directelly the DATA from Logstash to ES.
ES Write is slow , so in heavy load you can loose data .
The idea is to add a proxy between Logstash and ES .
Logstash --> Proxy --> Elasticsearch
Logstash support Redis and RabbitMQ as a proxy .
This proxy can handle large Inputs and work as a queue mechanism .
Logstash is putting Redis as a primary choice (Because of simplicity of setup and monitoring).

Logstash cluster output to Elasticseach cluster without multicast

I want to run logstash -> elasticsearch with high availability and cannot find an easy way to achieve it. Please review how I see it and correct me:
Goal:
5 machines each running elasticsearch united into a single cluster.
5 machines each running logstash server and streaming data into elasticsearch cluster.
N machines under monitoring each running lumberjack and streaming data into logstash servers.
Constraint:
It is supposed to be run on PaaS (CoreOS/Docker) so multi-casting
discovery does not work.
Solution:
Lumberjack allows to specify a list of logstash servers to forward data to. Lumberjack will randomly select the target server and switch to another one if this server goes down. It works.
I can use zookeeper discovery plugin to construct elasticsearch cluster. It works.
With multi-casting each logstash server discovers and joins the elasticsearch cluster. Without multicasting it allows me to specify a single elasticsearch host. But it is not high availability. I want to output to the cluster, not a single host that can go down.
Question:
Is it realistic to add a zookeeper discovery plugin to logstash's embedded elasticsearch? How?
Is there an easier (natural) solution for this problem?
Thanks!
You could potentially run a separate (non-embedded) Elasticsearch instance within the Logstash container, but configure Elasticsearch not to store data, maybe set these as the master nodes.
node.data: false
node.master: true
You could then add your Zookeeper plugin to all Elasticsearch instances so they form the cluster.
Logstash then logs over http to the local Elasticsearch, who works out where in the 5 data storing nodes to actually index the data.
Alternatively this Q explains how to get plugins working with the embedded version of Elasticsearch Logstash output to Elasticsearch on AWS EC2

Resources