How to approach metrics and alerting of produced and consumed messages in grafana - elasticsearch

I have problem with creating metrics and later trigger alerts base on that metric. I have two datasources, both are elasticsearch. One contains documents (logs from service) saying that message was produced to kafka, second contain documents (also logs from service) saying that message was consumed. What I want to achieve is to trigger alert if ratio of produced to consumed messages drop below 1.
Unfortunately it is impossible to use prometheus, for two reasons:
1) counter resets each time service is restarted.
2) second service doesn't have (and wont't have in reasonable time) prometheus integration.
Question is how to approach metrics and alerting based on that data sources? Is it possible? Maybe there is other way to achieve my goal?

The question is somewhat generic (meaning no mapping or code, in general, is provided), so I'll provide an approach.
You can use a watcher upon an aggregation that you will create.
It's relatively straightforward to create a percentage of consume/produce, and based upon that percentage you can trigger an alert via the watcher.
Take a look at this tutorial (official elasticsearch channel) on how to do this. Moreover, check the tutorials for your specific version of elasticsearch. From 5.x to 7.x setting alerts has been significantly improved (this means that for 7.x you might be able to do this via the UI of kibana, but for 5.x you'll probably need to add the alert via indexing json in the appropriate indices .watcher)
I haven't used grafana, but I believe the same approach can be applied. You'll need an aggregation as mentioned before and then you add the alert https://grafana.com/docs/grafana/latest/alerting/rules/

Related

Getting Web Session Details by using Scripted Metric Aggregation

I've been reviewing different ways to aggregate log messages together that have a start event but no end event. Been struggling with the logstash aggregate filter plugin not sorting correctly and was looking at retrofitting an old entity-centric model for a previous version of elasticsearch Entity-Centric Indexing - Mark Harwood | Elastic Videos when I realized elasticsearch 7.13 transforms introduce the concept of 'latest' which negates my need for a bunch of external scripts (hopefully) to do this.
I am looking at the "Getting Web Session Details by using Scripted Metric Aggregation" sample painless script https://www.elastic.co/guide/en/elasticsearch/reference/current/transform-painless-examples.html#painless-web-session which produces session details, including session duration. Because the logs do not have an end-time, I need to make use of a timeout interval, something like a 30 minute window for aggregating message events based on my group by.
Is this possible to do within the transform by adjusting that script and could anyone help?

Elasticsearch + Logstash - Indexing data from different sources when receiving an event

Good day,
I have a gotten into a bit of a headache when working on indexing some data in Elasticsearch and have some questions about a good approach.
As of now, an event is received on a Kafka topic with just a part of the data that should be stored in the document. The rest of the data needs to be collected after the event is received and is available from different APIs. To reduce the amount of work, it seems that Logstash could be a good approach.
Is there a way to configure Logstash to initiate data collection from different APIs and DBs when an event is received, and then assemble the document with the combined date, or am I stuck with writing time consuming custom logic for the indexing? I have searched around a bit, but couldn't find any good answer on the problem.
What you need in logstash is to lookup/enrich you message with data from external api's, right?
You could use logstash's http_filter plugin

Elastic search API Vs Spring data Vs logstash

I am planing to use elastic search for our dashboard using spring boot based rest services. After research i see top 3 options
Option A:
Use Elastic Search Java API ( from comment looks like going to go away)
Use Elastic Search Java Rest Client
Use spring-data-elasticsearch ( planing to use es 5.6 but challenging for latest es 6 as I don't see it's supports right now)
Option B:
Or shall I use logstash approach to
Sync data between postgressql and elastic search using logstash ?
Which one among them will be long term approach to get near real time data from ES in high load scenario ??
Usecase: I need to save some data from postgresql table to elastic search for my dashboard (near real time )
Update is frequent for both tables and es
to maintain current state
Load is going to increase in couple of week
The options you listed, in essence, are: should you go with a ready to use solution (logstash) or should you implement your own.
Try logstash first to see if it works for you - it'll take less time than implementing your own solution, and you can get working solution in minutes (if it's not hundreds of tables)
If you want near-real time, then you need to figure out if it allows you to:
handle incremental updates, i.e. if its 'tracking_column' configuration will work for your data structure and it will only load updated records in each run, not the whole table.
run it at the desired frequency
and in general, satisfies your latency requirements
If you decide to go with your own solution, keep in mind that spring-data-elasticsearch is a higher level wrapper for underlying elasticsearch client. If there are latency goals, then working on the lower level (elasticsearch clients) may give you better control and more options to tune the pipeline.
Otherwise, the client choice will not matter that much as data feed features (volume/update frequency) and db/es cluster configuration.

Are there conventions for naming/organizing Elasticsearch indexes which store log data?

I'm in the process of setting up Elasticsearch and Kibana as a centralized logging platform in our office.
We have a number of custom utilities and plug-ins which I would like to track the usage of and if users are encountering any errors. Not to mention there are servers, and scheduled jobs I would like to keep track of as well.
So if I have a number of different sources for log data all going to the same elasticsearch cluster what are the conventions or best practices for how this is organized into indexes and document types?
The default index value used by Logstash is "logstash-%{+YYYY.MM.dd}". So it seems like it's best to suffix any index names with the current date, as this makes it easy to purge old data.
However, Kibana allows for adding multiple "index patterns" that can be selected from in the UI. Yet all the tutorials I've read only mention creating a single pattern like logstash-*.
How are multiple index patterns used in practice? Would I just give names for all the sources for my data? Such as:
BackupUtility-%{+YYYY.MM.dd}
UserTracker-%{+YYYY.MM.dd}
ApacheServer-%{+YYYY.MM.dd}
I'm using nLog in a number of my tools which has an elastic search target. The convention for nLog and other similar logging frameworks is to have a "logger" for each class in the source code. Should these logger translate to indexes in elastic search?
MyCompany.CustomTool.FooClass-%{+YYYY.MM.dd}
MyCompany.CustomTool.BarClass-%{+YYYY.MM.dd}
MyCompany.OtherTool.BazClass-%{+YYYY.MM.dd}
Or is this too granular for elasticsearch index names, and it would be better to stick to just to a single dated index for the application?
CustomTool-%{+YYYY.MM.dd}
In my environment we're working through a similar question. We have a mix of system logs, metric alerts from Prometheus, and application logs from both client and server applications. In addition, we have some shared variables between the client and server apps that let us correlate the two (e.g., we know what server logs match some operation on the client that made requests to said server). We're experimenting with the following scheme to help Kibana answer questions for us:
logs-system-{date}
logs-iis-{date}
logs-prometheus-{date}
logs-app-{applicationName}-{date}
Where:
{applicationName} is the unique name of some application we wrote (these could be client or server side)
{date} is whatever date-based scheme you use for indexes
This way we can set up Kibana searches against logs-app-* and quickly search for logs among any of our applications. This is still new for us, but we started without this type of scheme and are already regretting it. It makes searching for correlated logs across applications much harder than it should be.
In my company we have worked lot about this topic. We agree the following convention:
Customer
-- Product
--- Application
---- Date
In any case, it is neccesary to review both how the data is organized and how the data is consulted inside the organization
Kind Regards
Dario Rodriguez
I am not aware of such conventions, but for my environment, we used to create two different type of indexes logstash-* and logstash-shortlived-*depending on the severity level. In my case, I create index pattern logstash-* as it will satisfy both kind of indices.
As these indices will be stored at Elasticsearch and Kibana will read them, I guess it should give you the options of creating the indices of different patterns.
Give it a try on your local machine. Why don't you try logstash-XYZ if you want more granularity otherwise you can always create indices with your custom name.

Summarization in Elasticsearch

I am a newbie to Elasticsearch. We are currently using Splunk platform for our analytics application and looking to migrate to ELK. Splunk provides options to schedule searches to run in background periodically and to store the search results in a separate summary index. Is similar functionality available in Elasticsearch? If so, please point me to the documentation containing the process.
Thanks,
Keerthana
This is a great use case. Of course Elasticsearch can perform such tasks, but it is more manual. You have to write your own script. So for example, if you want to summarize data, you can use ElasticSearch aggregations, and take the result (which comes in JSON format) and store it back into an index where you keep summary data. This way, even if you delete your raw data, your summary data lives on.
Elasticsearch comes with different clients. I like to use the Python Elasticsearch DSL library.

Resources