Kibana Watcher query for searching text - elasticsearch

I am looking for pointers to create a Kibana watcher where I want to look at my logs and I want to send an alert if I see the text "Security Alert" in my logs more than 10 times within any 30 mins period.
I am referring to this article
https://www.elastic.co/guide/en/kibana/current/watcher-ui.html#watcher-create-threshold-alert
It's not clear in the doc how I can 1> read through and filter and parse the string 2> how to set up counts for the same.

For this requirement you should use the advanced watchers over the more simple (and less powerful) threshold watchers. In the Kibana-Watcher UI you can choose between both types.
See
https://www.elastic.co/guide/en/kibana/current/watcher-ui.html#watcher-create-advanced-watch for an introduction and
https://www.elastic.co/guide/en/elasticsearch/reference/current/how-watcher-works.html for the syntax and the overal behaviour of advanced watchers.
So based on the requirements you described in your question, heres how you would implement the watcher (conceptually in a nutshell):
the 30 minutes would be the trigger interval.
The input section has to be an appropiate elasticsearch query where you match the "Security Alert" text
the condition would be like "numberOfHits gte 10". So the watcher gets triggered every 30 mins but only when the condition is met, the actions will be executed.
in the actions section you would need to choose between the available options (log, mail, slack messages etc.). If you want to send mails, then you need to setup mail accounts first.
I hope I could help you.

Related

How to properly create Prometheus metrics with unique field

I have a system that regularly downloads files and parses them. However, sometimes something might go wrong with the parsing and I have the task to create a Prometheus alert for when a certain file fails. My
initial idea is to create a custom counter alert in Prometheus - something like
processed_files_total and use status as label because if the file fails it has FAILED status and if it succeeds - SUCCESS, so supposedly the alert should look like
increase(processed_files_total{status=FAILED}[24h]) > 0 and I hope that this will alert me in case there is at least 1 file with failed status.
The problem comes from the fact that I also want to have the
exact filename in the alert message and since each file has a unique name I'm almost sure that it is not a good idea to put it as label e.g. filename={filename} - According to Prometheus docs -
Do not use labels to store dimensions with high cardinality (many different label values), such as user IDs, email addresses, or other unbounded sets of values.
is there any other way I can achieve getting the filename from the alert or this is the way to go ?
It's a good question.
I think the correct answer is that the alert should notify you that something failed and the resolution is to go to the app's logs to identify the specific file(s) that failed.
Lightning won't strike you for using the filename as a label value in Prometheus if you really must but, I think, as you are, using an unbounded value should give you pause as to whether you're abusing the tool.
Metrics seem intrinsically (hunch) about monitoring aggregate state (an unusual number of files are failing) rather than specific (why did this one fail); logs and tracing tools help with the specific cases.

Datadog event monitor aggregation

I have created a Multi alert event monitor
events('sources:rds event_source:db-instance').by('dbinstanceidentifier').rollup('count').last('1d') >= 1
I wanted it to be aggregated by "dbinstanceidentifier" but it shows the accumulative count for "* (Entire Infrastructure)". Basically it doesn't see any groups. But I can see them in the infrastructure. Is it a problem of datadog? May be it's only available in a kind of "premium" subscription?
I've tried with this query which is similar to yours:
events('sources:rds priority:all tags:event_source:db-instance').by('dbinstanceidentifier').rollup('count').last('1d') > 1
And this seems to give me a count by dbinstanceidentifier results.
Do you have more information to provide? Maybe an event list and a monitor result screenshot?

How to monitor log files and set up an alert based on keywords using Elastic Search?

Recently, I just started to learn Elastic Search and I found it provides several monitoring and alerting function. For example, users can create a watcher to monitor a certain metric, a alert will be triggered if it exceeds the threshold.
Meanwhile, Elastic Search is designed for full text searching.
I wonder whether I can monitor log files and set up an alert based on eywords? For example, the alert triggered by the keyword "test", if the incoming log file contains word "test", the alert will be triggered.
If somebody have done anything related or have a clue about this, please give me a hint!

What is the difference between startDate and a filter on "published" in the Okta Events API?

I've written a .NET app using the Okta.Core.Client 0.2.9 SDK to pull events from our organization's syslog for import into another system. We've got it running every 5 minutes, pulling events published since the last event received in the previous run.
We're seeing delays on some events showing up. If I do a manual run at the top of the hour for the previous hour's data, it'll include more rows than the 5-minute runs. While trying to figure out why I remembered the startDate param, mutually-exclusive with the filter one I've been using.
The docs don't mention much about it - just that it "Specifies the timestamp to list events after". Does it work the same as published gt "some-date"? We're capturing data for chunks of time, so I needed to include a "less than" filter and ignored startDate. But the delayed events have me looking for a workaround.
Are you facing delayed results using startDate or filter?
Yes published gt "some-date" and startDate work the same way. The following two API calls.
/api/v1/events?limit=100&startDate=2016-07-06T00:00:00.000Z
and
/api/v1/events?limit=100&filter=published gt "2016-07-06T00:00:00.000Z"
returns the same result. Since, they are mutually exclusive filter might come in handy in creating more specific queries including the other query parameters in your query using filter.

Immediately Display New Metrics

I am using graphite and coda hale metrics to try and track the number of times particular API's are called and also the top 10 callers. I have assigned a metric to each user who calls the API and use graphite to bring back the top 10.
The problem is, if it is a new user - ie a new metric, this will only be displayed in Graphite when the tool is refreshed - Has anyone come across a work around for this ? Is there some way Graphite can automatically detect new meters?
Just to be clear - I can see the top ten API callers for the last 30 minutes.........unless it is a brand new user that has never logged in before.
It seems that graphite-web uses an on disk index generated by a glorified find command. Another script is available so you can run it as cron to update the metric index file.
Whenever you update the index file, graphite-web process will detect it and reload it.
Since reloading the index might be heavy for large (1M) number of metrics, I would advise to modify the update script a bit to conditionnaly update the file (only if different for instance).
EDIT: after test, graphite does not seem to call the reloading code

Resources