metricbeat agent running on ELK cluster? - elasticsearch

Does metricbeat need always an agent running separately from the ELK cluster or it provides a plugin/agent/approach to run metricbeat on the cluster side?

If I understand your question, you want to know if their is a way to monitor your cluster without installing a beat.
You can enable monitoring in the stack monitoring tab of Kibana.
If you want more, beats are standalone objects pluggables with logstash or Elasticsearch.
Latest versions of Elastic Stack (formally known as ELK ) offer more centralized configurations in Kibana, and the 7.9 version introduce a unified elastic agent in Beta to gather several beats in one and manage you "fleet" on agent within Kibana.
But information used by your beats are not directly part of Elastic (CPU, RAM, Logs, etc...)
So you'll still have to install a daemon on your system.

Related

Need to ship logs to elastic from EKS

We have an EKS cluster running and we are looking for best practices to ship application logs from pods to Elastic.
In the EKS workshop there is an option to ship the logs to cloudwatch and then to Elastic.
Wondered if there is an option to ship the logs directly to Elastic, or to understand best practices.
Additional requirement:
We need the logs to determine from which namespace the logs is coming from and to deliver a dedicated index
You can deploy EFK stack in kubernetes cluster. Follow the reference --> https://github.com/acehko/kubernetes-examples/tree/master/efk/production
Fluentd would be deployed as DaemonSet so that one replica is run on each node collecting the logs from all pods and push them to elasticsearch

Elastic search cluster on Kubernetes Cluster vs VM

I want to setup elastic stack (elastic search, logstash, beats and kibana) for monitoring my kubernetes cluster which is running on on-prem bare metals. I need some recommendations on the following 2 approaches, like which one would be more robust,fault-tolerant and of production grade. Let's say I have a K8 cluster named as K8-abc.
Approach 1- Will be it be good to setup the elastic stack outside the kubernetes cluster?
In this approach, all the logs from pods running in kube-system namespace and user-defined namespaces would be fetched by beats(running on K8-abc) and put into into the ES Cluster which is configured on Linux Bare Metals via Logstash (which is also running on VMs). And for fetching the kubernetes node logs, the beats running on respective VMs (which are participating in forming the K8-abc) would fetch the logs and put it into the ES Cluster which is configured on VMs. The thing to note here is the VMs used for forming the ES Cluster are not the part of the K8-abc.
Approach 2- Will be it be good to setup the elastic stack on the kubernetes cluster k8-abc itself?
In this approach, all the logs from pods running in kube-system namespace and user-defined namespaces would be send to Elastic search cluster configured on the K8-abc via logstash and beats (both running on K8-abc). For fetching the K8-abc node logs, the beats running on VMs (which are participating in forming the K8-abc) would put the logs into ES running on K8-abc via logstash which is running on k8-abc.
Can some one help me in evaluating the pros and cons of the before mentioned two approaches? It will be helpful even if the relevant links to blogs and case studies is provided.
I would be more inclined to the second solution. It has many advantages over the first one however it may seem more complex as it comes to the initial setup. You can actually ask similar question when it comes to migrate any other type of workload to Kubernetes. It has many advantages over VM. To name just a few:
self-healing cluster,
service discovery and integrated load balancing,
Such solution is much easier to scale (HPA) in comparison with VMs,
Storage orchestration. Kubernetes allows you to automatically mount a storage system of your choice, such as local storage, public cloud providers, and many more including Dynamic Volume Provisioning mechanism.
All the above points could be easily applied to any other workload and may bee seen as Kubernetes advantages in general so let's look why to use it for implementing Elastic Stack:
It looks like Elastic is actively promoting use of Kubernetes on their website. See also this article.
They also provide an official elasticsearch helm chart so it is already quite well supported by Elastic.
Probably there are many other reasons in favour of Kubernetes solution I didn't mention here. Here you can find a hands-on article about setting up Highly Available and Scalable Elasticsearch on Kubernetes.

What is a most benefit way to gather server hardware utilization, app logs, app jvm metrics, using Elastic-Stack?

Besides ELK standard goal for gathering application logs data i want to leverage this stack for advanced data collection such as JVM metrics (via JMX) and host's cpu/ram/disk/network utilization.
The most suitable one i thought is using metricbeat, but i doubt if metricbeat is enough for purposes described above.
Since i aiming at minimal stack of things to configure, will Metricbeat-Elasticsearch-Kibana be enough for collecting app logs,app jvm metrics,host's hardware utilization or there are some more suitable alternatives ?
UPDATE
Oh, i see now, that i need also filebeat besides metricbeat for gathering app logs.
Is there any out of the box single solution that combines filebeat and metricbeat agents ?
Currently Filebeat and Metricbeat are separate binaries and you need to run both:
Filebeat to collect your logs (and potentially parse them with Elasticsearch Ingest node).
Metricbeat with the system module for cpu/ram/disk/network and we also have a JMX / Jolokia module for that functionality.

Tuning logstash performance

I use logstash to connect between elasticsearch and ntopng(a flow collector).
but there are many drop flows, so I think the bottle neck is on logstash because my RAM is 20G and CPU 8 cores.
But I am not sure which parameter could I edit to tune the logstash in the logstash.yml
thank you in advance!
It seems like one step of working out a solution to your problem is to supply decent Logstash monitoring. One good way to achieve this is by installing X-Pack which provides Logstash monitoring in the X-Pack monitoring ui in Kibana.
Please refer to https://www.elastic.co/guide/en/logstash/6.1/logstash-monitoring-ui.html for more information about the Logstash monitoring ui and https://www.elastic.co/guide/en/logstash/6.1/installing-xpack-log.html for information on how to install and configure X-Pack for Logstash.
Apart from Logstash monitoring, you should of course also monitor the used resources on the systems you are running Logstash on. There are several ways to do this, for example with active monitoring solutions, such as Nagios, our passive monitoring solutions such as Elasticsearch with Metricbeat.
Once you know what the bottleneck is, you can go through https://www.elastic.co/guide/en/logstash/6.1/performance-troubleshooting.html and tune Logstash settings or if necessary add more Logstash instances for distributing load.

Logstash cluster output to Elasticseach cluster without multicast

I want to run logstash -> elasticsearch with high availability and cannot find an easy way to achieve it. Please review how I see it and correct me:
Goal:
5 machines each running elasticsearch united into a single cluster.
5 machines each running logstash server and streaming data into elasticsearch cluster.
N machines under monitoring each running lumberjack and streaming data into logstash servers.
Constraint:
It is supposed to be run on PaaS (CoreOS/Docker) so multi-casting
discovery does not work.
Solution:
Lumberjack allows to specify a list of logstash servers to forward data to. Lumberjack will randomly select the target server and switch to another one if this server goes down. It works.
I can use zookeeper discovery plugin to construct elasticsearch cluster. It works.
With multi-casting each logstash server discovers and joins the elasticsearch cluster. Without multicasting it allows me to specify a single elasticsearch host. But it is not high availability. I want to output to the cluster, not a single host that can go down.
Question:
Is it realistic to add a zookeeper discovery plugin to logstash's embedded elasticsearch? How?
Is there an easier (natural) solution for this problem?
Thanks!
You could potentially run a separate (non-embedded) Elasticsearch instance within the Logstash container, but configure Elasticsearch not to store data, maybe set these as the master nodes.
node.data: false
node.master: true
You could then add your Zookeeper plugin to all Elasticsearch instances so they form the cluster.
Logstash then logs over http to the local Elasticsearch, who works out where in the 5 data storing nodes to actually index the data.
Alternatively this Q explains how to get plugins working with the embedded version of Elasticsearch Logstash output to Elasticsearch on AWS EC2

Resources