Monitoring health of logstash - ruby

I am going to be using logstash to send a high amount of events to a broker. I have monitoring of the broker to check the health status, but I can't find much information on how to see if the logstash process is healthy, if there are indicators of a failing process.
I was interested for those who use logstash, what are some ways you monitor it?

You can have a cronjob inject a heartbeat message and route such messages to some kind of monitoring system. If you already use Elasticsearch you could use it for this as well and write a script to ensure that you have reasonably recent heartbeat messages from all hosts that should be sending messages, but I'd prefer using e.g. Nagios or lovebeat-go.
This could be used to monitor the health of a single Logstash instance (i.e. you inject the heartbeat message into the same instance that feeds the monitoring software) but you could just as well use it to check the overall health of the whole pipeline.

Update: This got built into Logstash in 2015. See the announcement of the Logstash heartbeat plugin.

If you're trying to monitor logstash as a shipper, it's easy to write a script that would compare the contents of the .sincedb* file to the actual file on disk to make sure they're in sync.
As an indexer, I'd probably skip ahead and query ElasticSearch for the number of documents being inserted.
#magnus' idea for a latency check is also good. I've used the log's timestamp and compared it to ElasticSearch's timestamp to compute the latency.

Related

How to do Real Time Alerting in ELK

We have ELK(+XPACK) for our network devices syslog server (source/destination IP and port). I'm trying to implement a real-time alerting system when source_ip field equals a specific IP address. How can i accomplish this with ELK?
I tried to do it with watcher, but it isn't real-time and low intervals may cause performance problems(?).
Note: log rate ~ 500 log per second.
If watcher are not fast enough, then you need sth. what will fire at the moment of data incoming. Ingest pipelines can't execute external actions, but if you have Logstah in place, then clone (https://www.elastic.co/guide/en/logstash/current/plugins-filters-clone.html) the relevant event and issue an alert via email (https://www.elastic.co/guide/en/logstash/current/plugins-outputs-email.html) or whatever suits you.
Doing so, there will be the original event in elastic and the cloned one can be processed i a separate alert pipeline.

Why use Beats if i can post directly to Elasticsearch?

Recently i have been reading into Elastic stack and finding out about this thing called Beats, which basically used for lightweight shippers.
So the question is, if my service can directly hit to Elasticsearch, do i actually need beats for it? Since from what i have known it's just kinda a proxy (?)
Hopefully my question is clear enough
Not sure which beat you are specifically referring but let's take an example of Filebeat.
Suppose application logs need to be indexed into Elasticsearch. Options
Post the logs directly to Elasticsearch
Save the logs to a file, then use Filebeat to index logs
Publish logs to a AMQP service like RabbitMQ or Kafka, then use Logstash input plugins to read from RabbitMQ or Kafka and index into Elasticsearch
Option 2 Benefits
Filebeat ensures that each log message got delivered at-least-once. Filebeat is able to achieve this behavior because it stores the delivery state of each event in the registry file. In situations where the defined output is blocked and has not confirmed all events, Filebeat will keep trying to send events until the output acknowledges that it has received the events.
Before shipping data to Elasticsearh, we can do some additional processing or filtering. We want to drop some logs based on some text in the log message or add additional field (eg: Add Application Name to all logs, so that we can index multiple application logs into single index, then on consumption side we can filter the logs based on application name.)
Essentially beats provide the reliable way of indexing data without causing much overhead to the system as beats are lightweight shippers.
Option 3 - This also provides the same benefits as option2. This might be more useful in case if we want to ship the logs directly to an external system instead of storing it in a file in the local system. For any applications deployed in Docker/Kubernetes, where we do not have much access or enough space to store files in the local system.
Beats are good as lightweight agents for collecting streaming data like log files, OS metrics, etc, where you need some sort of agent to collect and send. If you have a service that wants to put things into Elastic, then yes by all means it can just use rest/java etc API directly.
Filebeat offers a way to centralize live logs from Multiple Servers
Let's say you are running multiple instances of an application in different servers and they are writing logs.
You can ship all these logs to a single ElasticSearch index and analyze or visualize them from there.
A single static file doesn't need Filebeat for moving to ElasticSearch.

Multiple Logstash instances vs Filebeats

I'm trying to establish the best architecture for our elastic stack implementation.
We have two distinct networks (lets call them internal and external) and several web / db / application servers (approx 10) on each of these networks.
I would like to consume IIS logs, our rabbitMQ messages and some other bits and bobs from machines in both networks and send them to a single server on the internal network where my elastic and kibana installation are located.
For the servers on both the internal and external networks I can see two main ways to get the logs sent to elastic.
Setup logstash on each server and send the output to the elastic server on the internal network.
Setup filebeats on each server and send the logs to a single server running logstash (this could be the same box that hosts elastic and kibana)
I'm unsure of the pros and cons of these approaches at the moment. I believe the correct approach is to use Filebeats, but I'm unaware why I wouldn't just put logstash in multiple places as it seems like I would be better distributing the processing of logs.
Then again, perhaps having one logstash with 20-30 inputs isn't a problem?
Interested in any thoughts or guidance in this area.
From what I read in the documentation, Logstash is much more demanding in term of memory than Filebeat, especially if you do some kind of treatment on the logs (like grok parsing). Logstash represent at least a JVM (with JRuby). For filebeat, I assume its footprint is much smaller, since it's optimized for shipping logs (I never used it, so I can't say).
Also it complicates any update you would want to do to the Logstash instances or their configurations.
For a centralized Logstash, the advantage would be that it is easy to change the adress of the Elasticsearch instance, redirect to a cache like redis or add another output. I also found Logstash (in version 2.+) required frequent restart, so that's easier if you only have one instance to deal with.
I have never used Logstash with multiple inputs, so I can't say.
In the job where I was responsible of a log centralisation system, we used beaver (a filebeat equivalent) to ship the logs to a redis server and we had two or three Logstash server sending everything to Elasticsearch. All of the comments above comes from that period.

When Logstash sends ACK to input source

I have read about the at-least-once-delivery commitment of filebeat and what I understood is that until the ack of sent logline is not received by filebeat, that line will be sent again (in case of filebeat re-start).
Now supppose, In my solution, I am using Filebeat, Logstash, and one other component that logstash is using for filtering. And after filtering the logstash sends the line to elasticsearch.
Now here are below checkpoints where we can loss data :
Filebeat got shutdown without receiving ack from logstash - In this case we know that line will be sent again by filebeat.
Suppose Filebeat sent a line, and logstash applies filtering on it with the external component and then when It tries to send to elasticsearch and the same time logstash/elasticsearch got crashed, So will we loss this data.
My question is:
Basically logstash processes data in below sequence:
INPUT --> FILTER --> OUTPUT
So I want to know at which step the logstash will send ACK to filebeat. I want to basically understand how the ACKS are being sent and when. I tried to search it on google and ELK official websites but didn't get the information in details.
Can somebody help me in understanding these details ?
thanks in advance.
The input will ACK when it pushes the events to the internal queue for the pipeline workers. That's when the plugin-input thread considers the event to be completed.
What happens with the pipeline workers kind of depends. If you have persistent queues configured and enabled, those jobs will be picked up again once logstash restarts and no data should be lost (if it is, that's a bug). If you don't have persistent queues, then that data will be lost.

Filebeat vs Rsyslog for forwarding logs

I am currently using filebeat to forward logs to logstash and then to elasticsearch.
Now, I am thinking about forwarding logs by rsyslog to logstash. The benefit of this would be that, I would not need to install and configure filebeat on every server, and also I can forward logs in JSON format which is easy to parse and filter.
I can use TCP/UDP to forward logs to logstash by rsyslog.
I want to know the more benefits and drawbacks of rsyslog over filebeat, in terms of performance, reliability and ease of use.
When you couple Beats with Logstash you have something called "back pressure management" - Beats will stop flooding the Logstash server with messages in case something goes wrong on the network, for instance.
Another advantage of using Beats is that in Logstash you can have persisted queues, which prevents you from losing log messages in case your elasticsearch cluster goes down. So Logstash will persist messages on disk. Be careful because Logstash can't ensure you wont lose messages if you are using UDP, this link will be helpful.
Rsyslog has In-Memory, disk Queues. That should takes care of buffering messages.
Rsyslog queue-modes

Resources