Tuning logstash performance - elasticsearch

I use logstash to connect between elasticsearch and ntopng(a flow collector).
but there are many drop flows, so I think the bottle neck is on logstash because my RAM is 20G and CPU 8 cores.
But I am not sure which parameter could I edit to tune the logstash in the logstash.yml
thank you in advance!

It seems like one step of working out a solution to your problem is to supply decent Logstash monitoring. One good way to achieve this is by installing X-Pack which provides Logstash monitoring in the X-Pack monitoring ui in Kibana.
Please refer to https://www.elastic.co/guide/en/logstash/6.1/logstash-monitoring-ui.html for more information about the Logstash monitoring ui and https://www.elastic.co/guide/en/logstash/6.1/installing-xpack-log.html for information on how to install and configure X-Pack for Logstash.
Apart from Logstash monitoring, you should of course also monitor the used resources on the systems you are running Logstash on. There are several ways to do this, for example with active monitoring solutions, such as Nagios, our passive monitoring solutions such as Elasticsearch with Metricbeat.
Once you know what the bottleneck is, you can go through https://www.elastic.co/guide/en/logstash/6.1/performance-troubleshooting.html and tune Logstash settings or if necessary add more Logstash instances for distributing load.

Related

Elastic Uptime Monitors using Heartbeat --Few Monitors are missing in kibana

I have the elk setup in a ec2 server.With Beats like metricbeat,filebeat,heartbeat.
I have setup the elastic apm for some applications like jenkins & sonarqube.
Now In uptime I can see only few monitors like sonarqube and jenkins
Other application are missing..
When I see data from yesterday not available in elasticsearch for particular application
The best way to troubleshoot what is going on is to check if the events from Heartbeat are being collected. The Uptime application only displays events from Heartbeat, and therefore — this is the Beat that you need to check.
First, check the connectivity of Heartbeat and the configured output:
metricbeat test output
Secondly, check if the events are being generated. You can check this by commenting out your existing output (Likely Elasticsearc/Elastic Cloud) and enabling either the Console output or the File output. Then start your Metricbeat and check if events are being generated. If they are, then it might be something with the backend side of things; maybe Elasticsearch is rejecting the documents sent and refusing to index them.
Apropos, Elastic is implementing a native Jenkins plugin that allows you to observe your CI pipeline using OpenTelemetry compatible backends such as Elastic APM. You can learn more about this plugin here.

metricbeat agent running on ELK cluster?

Does metricbeat need always an agent running separately from the ELK cluster or it provides a plugin/agent/approach to run metricbeat on the cluster side?
If I understand your question, you want to know if their is a way to monitor your cluster without installing a beat.
You can enable monitoring in the stack monitoring tab of Kibana.
If you want more, beats are standalone objects pluggables with logstash or Elasticsearch.
Latest versions of Elastic Stack (formally known as ELK ) offer more centralized configurations in Kibana, and the 7.9 version introduce a unified elastic agent in Beta to gather several beats in one and manage you "fleet" on agent within Kibana.
But information used by your beats are not directly part of Elastic (CPU, RAM, Logs, etc...)
So you'll still have to install a daemon on your system.

How to configure filebeat for logstash cluster environment?

I am missing something very basic when I think of how Filebeat will be configured in a clustered logstash setup.
As per the article
https://www.elastic.co/guide/en/logstash/current/deploying-and-scaling.html
and this architecture diagram
I think that there is some kind of load balancer in front of the logstash cluster. However, the Filebeat output documentation suggests that there must be an array of all the Logstatsh nodes specified. Using this list of nodes, Filebeat will do the load balancing from the client-side.
Also as per this GitHub issue, there is no native logstash clustering available yet.
So, my question is, what kind of setup do I need to be able to point my multiple Filebeat to one logstash service endpoint without specifying the logstash nodes in the cluster?
Is it possible?
Would having load balancer in front of Logstash cluster be of any help?
Thanks,
Manish
Since the Logstash clustering feature is still in the works and you don't want to specify all the Logstash hosts inside all your Beats configurations, then the only solution I see is to use a TCP load balancer in front of Logstash.
All your Beats would point to that load balancer endpoint and you can manage your Logstash cluster behind that load balancer as you see fit. Be aware, though, that you're adding a hop (hence latency) between your Beats and your Logstash cluster.

What is a most benefit way to gather server hardware utilization, app logs, app jvm metrics, using Elastic-Stack?

Besides ELK standard goal for gathering application logs data i want to leverage this stack for advanced data collection such as JVM metrics (via JMX) and host's cpu/ram/disk/network utilization.
The most suitable one i thought is using metricbeat, but i doubt if metricbeat is enough for purposes described above.
Since i aiming at minimal stack of things to configure, will Metricbeat-Elasticsearch-Kibana be enough for collecting app logs,app jvm metrics,host's hardware utilization or there are some more suitable alternatives ?
UPDATE
Oh, i see now, that i need also filebeat besides metricbeat for gathering app logs.
Is there any out of the box single solution that combines filebeat and metricbeat agents ?
Currently Filebeat and Metricbeat are separate binaries and you need to run both:
Filebeat to collect your logs (and potentially parse them with Elasticsearch Ingest node).
Metricbeat with the system module for cpu/ram/disk/network and we also have a JMX / Jolokia module for that functionality.

why do we need filebeat when we can ship logs to Logstatsh

Hi as a newbie to elastic I have a doubt on why we need fileBeat to ship logs to ElasticSearch(ES) or Logstatsh.
As far as I knew we can directly read logs from files and send to logstash and from there to ES. If the former is allowed why we need FileBeat to be a intermediary layer between logs and logstash.
What i knew : xyzlogfile--->logstash-file--->ES--->kibana
Why do we need FileBeat between : xyzlogfile--->fileBeat--->logstash-file--->ES--->kibana
I assume you are talking about File Input Plugin vs Filebeat.
Some points to note:
Logstash is much heavier in terms of memory and CPU usage than Filebeat. It requires a JVM which might be fine if you deploy java software but for many projects a JVM is an unnecessary overhead. Filebeat is just a light native executable.
You might not need Logstash at all
If your logs are JSON
If you don't need any parsing and you are ok with timestamps generated by Filebeat ([EDIT 2021-01-01] Filebeat has various processors, it can even do arbitrary script execution, pure go implementation of ECMASCRIPT 5.1, https://www.elastic.co/guide/en/beats/filebeat/current/processor-script.html)
If you have simple regex parsing (e.g. grok filter) you can just use Ingest Nodes (https://www.elastic.co/guide/en/elasticsearch/reference/5.0/ingest.html)
For more complex parsing/event cloning/grouping Logstash will probably be needed. Just writing a ruby filter for example is super easy and you can prototype fast. For optimizing super high production loads you might need to write a custom filter plugin, or perhaps you can try writing your own custom Processor to be used with Ingest Nodes (but I haven't tried that yet, I can tell you that writing a custom Logstash filter is pretty straightforward)
All the above points are related to ingesting file contents, but Logstash has many input/output plugins that you might need and are only available with Logstash
If all your files are located on the same node as the logstash process, than using the File Input Plugin could be an option ("xyzlogfile--->logstash-file--->ES--->kibana").
However for most deployments you want to collect data from many nodes with different roles and software stacks deployed on them. You do not want to deploy a Logstash instance on all those nodes, so "xyzlogfile--->fileBeat--->logstash-beats--->ES--->kibana" should be used (or another option is "xyzlogfile--->fileBeat--->ES--->kibana" with Ingest Node).
Based On Mastering Elastic Stack by Packt:
Beats are data shippers shipping data from a variety of inputs such as files, data streams, or logs whereas Logstash is a data parser. Though Logstash can ship data, it's not its primary usage.
Logstash consumes a lot of memory and requires a higher amount of resources , whereas Beats requires fewer resources and consumes low memory.

Resources