Is Elastic/Metricbeats suitable for process monitoring and alerting? - elasticsearch

Do you use Elastic and Metricbeats for process monitoring and alerting? How did you configure your data gathering and alerting?
I am currently trying to set this up, and running into some basic issues. These issues are making me question whether Elastic is a suitable tool for alerting. Here is my planned setup:
Use Metricbeats to gather process data
Create an Elastic dashboard/lens for certain processes
If the process.cpu.start_time from Metricbeats is very young (e.g. it has only been running for under 5 minutes), alert!
I have been working my way through this using the following approach:
From Metricbeats, the processes include process.cpu.start_time, as a text string in ISO date format. Elastic lens queries are very limited with dates.
Workaround: use Logstash to create a filter field process.cpu.start_epoch, which is an integer - the Unix epoch: "seconds since January 1, 1970".
Create a dashboard lens, querying only my process, and only the last metric. This works and gives me "the time that the process started, as a Unix epoch".
I next need to calculate the time difference between now and that integer. However I don't see anything in the lens documentation about doing date math. So I'm stuck.
The difficulties I am encountering are making me wonder if I am "doing it wrong"? Is Elastic/Metricbeats a suitable tool for what I am trying to achieve?

Answer: find the right hammer!
What I needed is called "Elastic runtime fields". There's a step-by-step writeup here: https://elastic-content-share.eu/elastic-runtime-field-example-repository/
Summary:
open index
click the "dots"
choose "add field to index pattern"
set output field name as desired
for me this is process.cpu.start.age
set output type
for me this is "long"
write your script in "painless"
for me this is emit(Date().getTime() - doc['process.cpu.start'].value.toEpochMilli());
PS: I deleted my logstash filters, because they were superfluous.

Related

How to monitor log files and set up an alert based on keywords using Elastic Search?

Recently, I just started to learn Elastic Search and I found it provides several monitoring and alerting function. For example, users can create a watcher to monitor a certain metric, a alert will be triggered if it exceeds the threshold.
Meanwhile, Elastic Search is designed for full text searching.
I wonder whether I can monitor log files and set up an alert based on eywords? For example, the alert triggered by the keyword "test", if the incoming log file contains word "test", the alert will be triggered.
If somebody have done anything related or have a clue about this, please give me a hint!

Connecting NiFi to ElasticSearch

I'm trying to solve one task and will appreciate any help - links to documentation, or links to forums, or other FAQs besides https://cwiki.apache.org/confluence/display/NIFI/FAQs, or any meaningful answer in this post =) .
So, I have the following task:
Initial part of my system collects data each 5-15 min from different DB sources. Then I remove duplicates, remove junk, combine data from different sources according to logic and then redirect it to second part of the system as several streams.
As far as I know, "NiFi" can do this task in the best way =).
Currently I can successfully get information from InfluxDB by "GetHTTP" processor. However I can't configure same kind of processor for getting information from Elastic DB with all necessary options. I'd like to receive data each 5-15 minutes for time period from "now-minus-<5-15 minutes>" to "now". (depends on scheduler period) with several additional filters. If I understand it right, this can be achieved either by subscription to "_index" or by regular requests to DB with desired interval.
I know that NiFi has several specific Processors designed for Elasticsearch (FetchElasticsearch5, FetchElasticsearchHttp, QueryElasticsearchHttp, ScrollElasticsearchHttp) as well as GetHTTP and PostHTTP Processors. However, unfortunately, I have lack of information or even better - examples - how to configure their "Properties" for my purposes =(.
What's the difference between FetchElasticsearchHttp, QueryElasticsearchHttp? Which one fits better for my task? What's the difference between GetHTTP and QueryElasticsearchHttp besides several specific fields? Will GetHTTP perform the same way if I tune it as I need?
Any advice?
I will be grateful for any help.
The ElasticsearchHttp processors try to make it easier to interact with ES by generating the appropriate REST API call based on the properties you set. If you know the full URL you need, you could use GetHttp or InvokeHttp. However the ESHttp processors let you put in just the stuff you're looking for, and it will generate the URL and return the results.
FetchElasticsearch (and its variants) is used to get a particular document when you know the identifier. This is sometimes used after a search/query, to return documents one at a time after you know which ones you want.
QueryElasticsearchHttp is for when you want to do a Lucene-style query of the documents, when you don't necessarily know which documents you want. It will only return up to the value of index.max_result_window for that index. To get more records, you can use ScrollElasticsearchHttp afterwards. NOTE: QueryElasticsearchHttp expects a query that will work as the "q" parameter of the URL. This "mini-language" does not support all fields/operators (see here for more details).
For your use case, you likely need InvokeHttp in order to issue the kind of query you describe. This article describes how to issue a query for the last 15 minutes. Once your results are returned, you might need some combination of EvaluateJsonPath and/or SplitJson to work with the individual documents, see the Elasticsearch REST API documentation (and NiFi processor documentation) for more details.

Logstash aggregation based on 'temporary id'

I'm not sure if this sort of aggregation is best done after being indexed by elasticsearch or if logstash is a good place to do it.
We are logging information about commands run against a server. Each set of metrics regarding a single command is logged as a single log event, there are multiple 'metric sets' per command. Each metric is of its own document type in ES (currently at least). So we will have multiple events across multiple documents regarding one command run against the server.
Each of these events will have a 'cmdno' field which is a temporary id given to the command we are logging about. Once the command has finished with all events logged, the 'cmdno' may be reused for other commands.
Is it possible to use logstash 'aggregate' plugin to link the events of a single command together using the 'cmdno'? (or any plugin)
All events that pertain to a single command will have the same timestamp + cmdno. I would like to add a UUID to the events as a permanent unique id for that command, so that a single query will give us all events for that single command.
Was thinking along the lines of:
if [cmdno] {
aggregate {
task_id => "%{cmdno}"
code => "map['cmdid'] ||= <some uuid generator>; event['cmdid'] == map['cmdid'] ? event['#timestamp'] == map['<stored timestamp for previous event from the same command>'] : continue"
}
}
Just started learning the ELK stack, not entirely sure as to the programming contructs logstash affords me yet.
I don't know if there is a better way to relate these events, this seemed the most suitable for our needs, if there are more ELK'y methods please let me know, they do need to stay as separate documents of different types though.
Any help much appreciated, let me know if I am missing anything.
Cheers,
Brett

Retain tag/field across events in logstash 1.5

I'm using logstash 1.5 to analyze logs.
I want to track two events which occur one after the other.
So I would like to set a flag/field/tag when first event occurs and retain the value across events.
I looked at this link but looks like grep and drop are not supported in logstash 1.5.
Is there a way of achieving this?
The closest you can get with logstash is the elapsed{} filter. You could use that code as a basis for your own filter if it doesn't meet your needs. I also run some external (python) post-processing to do more than elapsed{} can (or should) do.

Solution for graphing application events metrics in real time

We have an application that parses tweets and we want to see the activity in real time. We have tried several solution without success. Our main problems is that the graphing solution (example:graphite), needs a continious flow of metrics. When the db aggregates the metrics it's an average operation which is done, not a a sum.
We recently saw cube from square which would fit our requirement but it's too new.
Any alternatives?
I found the solution in the last version of graphite:
http://graphite.readthedocs.org/en/latest/config-carbon.html#storage-aggregation-conf
If I understood correctly, you cannot feed graphite in realtime, for instance as soon as you discover a new tweet?
If that's the case, it looks like you can specify a unix timestamp when updating graphite metric_path value timestamp\n so you could pass in the time of discovery/publication/whatever, regardless of when you process it.

Resources