Design an alarm system based on logs in Elasticsearch

Design an alarm system based on logs in Elasticsearch - elasticsearch

How would you design a system which can generate alarms based on certain conditions on data stored on Elasticsearch?
I'm thinking of a system similar to AWS CloudWatch.
Proposed alarming system should be able to work under following conditions:
There could be thousands of users using this system to create alarms.
There could be thousands of alarms active at any given time.
Shouldn't have high impact on query performance.
Large volume of data.
Naive approach would be to apply all the alarm conditions when a new record is added to Elasticsearch or a service/lambda function executing all the alarm rules at a specified time interval but I really doubt a system like this can satisfy above all conditions.

You might be interested in learning more about the Alerts feature in X-Pack. It includes Watchers, which are essentially the query you want to monitor.
Take control of your alerts by viewing, creating, and managing all of
them from a single UI. Stay in the know with real-time updates on
which alerts are running and what actions were taken.
Documentation: https://www.elastic.co/guide/en/x-pack/current/xpack-alerting.html
Sales Page: https://www.elastic.co/products/x-pack/alerting

Related

Experiments Feature stuck on collecting data

I am trying to split traffic from a given flow into different versions to measure statistical performance over time using the Experiment feature. However, it always shows the state "Collecting Data".
Here are the steps to reproduce the issue --
Create an Experiment on a flow and select different versions
Select Auto rollout and Select Steps option
Add steps for gradual progress of traffic increase and minimum duration
Save and Start the Experiment
Send queries to chatbot triggering the configured flow for the Experiment
The experiment should show some results in the Status tab and compare the performance of multiple flow versions. However, It does not produce any results. Always show the status as "Collecting Data" and Auto Rollout as "Not Started".
The only prerequisite for the Experiments feature to work is to enable the Interaction logs which are already enabled on my virtual agent.
About 2.5K sessions (~4K interactions ) were created in the last 48 hours. Are there any minimum requirements for it to generate results like the minimum number of sessions etc.?

Elastic search to Google big query

How do we send data from elastic search to google big query, Is there any specific connector?
I have been looking into various options and will need data to be available in google big query real time

I found google_bigquery output pligin that might be useful, but I have never use it personally.
Experiment with the settings depending on how much log data you generate, your needs to see "fresh" data, and how much data you could lose in the event of crash. For instance, if you want to see recent data in BQ quickly, you could configure the plugin to upload data every minute or so (provided you have enough log events to justify that)

Log analytics using Elasticsearch & Kibana - Few queries

I have just started playing around with ELK to develop our log analytics solution.
I had a few questions regarding the best practices so that I don't make any bad choice to begin with.
This tool will analyze various types of logs to find out and correlate any issue. It will run on multiple 'devices' and each device will be uniquely identifiable with a serial number.
Question 1) Is it possible to create a dashboard where the serial number is taken as an user input?
Details: I would like to have 1 dashboard created to analyze various fields and I should be able to specify the serial number of the device as an input. From what I see, I could use filter but then this would need the visualization to be 'edited'. So it appears to be me that right now, if I need to analyze multiple devices then I need to create a dashboard for each of the device. This will be a problem that if I need to modify the dashboard then I will have to make changes to all. The problem can be minimized by importing additional dashboards as a JSON file, still it is inconvenient.
Is there a better way that I am not aware of?
Question 2) On the main dashboard, I want to show a heatmap of various 'services' and their status as a time series. For e.g. say I am monitoring, CPU, memory, network and our service then I want to see something like below:
Now the heatmap visualization doesn't provide a way to uniquely specify the condition. I generated above image by populating dummy data where values were one of 0,1,2,3. Which means that I need to create such data periodically which the visualization can then use. Is there any built-in mechanism (scheduled jobs for e.g.) provided by ELK to do such processing. One option could be to run an external problem which queries Elasticsearch, fetches all the relevant information, analyzes it and puts it back into Elasticssearch. Is that the only way?
If there are any other suggestions, please feel free to share. Thanks.

What will be the wait time before big query executes a query?

Every time I execute a query in Google bigquery in the Explanation tab, I can see that their involves an average waiting time. Is it possible to know the percentage or seconds of wait time?

Since BigQuery is a managed service, around the glob a lot of customers are using it. It has an internal scheduling system based on the billingTier (explained here https://cloud.google.com/bigquery/pricing#high-compute) and other internals of your project. Based on this the query is scheduled to be executed based on the cluster availability. So there will be a minimum time until it finds a cluster of machines to execute your job.
I never seen there significant times. In case you have this issue then contact google support to see your project. If you edit your original question and add a job ID, a google enginner may check it out if there is an issue orn ot.

It's currently not exposed in the UI.
But you can find a similar concept from API (search "wait" from following page):
https://cloud.google.com/bigquery/docs/reference/v2/jobs#resource

Is it possible to reduce the big query execution wait time to the minimum?
Purchase more BigQuery Slots.
Contact your sales representative or support for more information.

Architectural Design -Building a real time free text search functionality in an application?

I am working on an application which is basically event management kind of application. Where user can post different types of events. Events can be short noticed events like a user throwing a birthday party in next 2 hours or pre planned events like marathon etc (This is just a dummy example).
Now the requirement is this that other user when search through free text search they will able to get both kind of results . The search will be more like linkedIn search , other than event there are other categories also.
I was thinking of using ElasticSearch or Solr. but both of them are lucene based search engine. When I will be writing a lot of data in real time to these search engines , the data will be written to cache and then periodically flush out to disk in different segments.
To optimize the search performance lucene based engine try to co-locate the data using segmentation process. In frequent write case these process become frequent and lot of resources will be consumed by this and the search performance will be degraded finally.
https://www.elastic.co/guide/en/elasticsearch/guide/current/indexing-performance.html
Is there any way by which I could achive real time free text search without costing much of performance in segmentation. Please suggest any architectural design which will best suit in this scenario

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio