ElasticSearch Watcher simulate fires the action, otherwise it's stuck - elasticsearch

I have a slack action configured. All aspects appear to be set up correctly. If I go to my watch's simulate section and choose execute (not ignoring the conditions) it executes fine and the message appears correctly templated in slack. If I save the config and let the watcher run it doesn't send. If I use the email action, it sends the email. If I use both, it sends neither.
{
"trigger": {
"schedule": {
"interval": "1m"
}
},
"input": {
"search": {
"request": {
"search_type": "query_then_fetch",
"indices": [
"elastic"
],
"rest_total_hits_as_int": true,
"body": {
"query": {
"bool": {
"must": {
"match": {
"level": "ERROR"
}
},
"filter": {
"range": {
"#timestamp": {
"gte": "now-1500m"
}
}
}
}
}
}
}
}
},
"condition": {
"compare": {
"ctx.payload.hits.total": {
"gte": 1
}
}
},
"actions": {
"notify-slack": {
"throttle_period_in_millis": 5000,
"slack": {
"account": "monitoring",
"proxy": {
"host": "proxy.example.com"
"port": 3128
},
"message": {
"from": "watcher",
"to": [
"#elk-cluster-alerts"
],
"text": "Elk Error Alerts",
"icon": ":chuck:",
"attachments": [
{
"color": "danger",
"title": "Elk Error Alerts",
"text": "Roundhouse kick!"
}
]
}
}
}
}
}
UPDATE:
Not a fix, but the configuration works when I use a webhook instead of the slack config

Related

Elasticsearch Watcher error while trying to send email attachment, dashboard.pdf

I have created a watcher alert from the advanced option which sends dashboard.pdf as an email attachment when the triggering condition is met. Now when the criteria is matching (threshold is exceeded) then it is throwing error as below in the watcher output.
"root_cause": [
{
"type": "connect_timeout_exception",
"reason": "Connect to mydomainname.com:443 [mydomainname.com/XX.X.XXX.XXX] failed: Connect timed out"
}
],
"type": "connect_timeout_exception",
"reason": "Connect to mydomainname.com:443 [mydomainname.com/XX.X.XXX.XXX] failed: Connect timed out",
"caused_by": {
"type": "socket_timeout_exception",
"reason": "Connect timed out"
Below is found from Elasticsearch log.
[2022-03-29T11:39:54,682][ERROR][o.e.x.w.a.e.ExecutableEmailAction] [node-1] failed to execute action [test_watcher_1_last10mins_gte5_tran_dt_accord_sof/email_admin]
org.apache.http.conn.ConnectTimeoutException: Connect to mydomainname.com:443 [mydomainname.com/XX.X.XXX.XXX] failed: Connect timed out
....
....
Caused by: java.net.SocketTimeoutException: Connect timed out
Below is the watcher script.
{
"trigger": {
"schedule": {
"interval": "6m"
}
},
"input": {
"search": {
"request": {
"search_type": "query_then_fetch",
"indices": [
"my_index"
],
"rest_total_hits_as_int": true,
"body": {
"size": 0,
"query": {
"bool": {
"must": [
{
"bool": {
"minimum_should_match": 1,
"should": [
{
"match_phrase": {
"partner.keyword": "RXGTY"
}
},
{
"match_phrase": {
"partner.keyword": "VGHUT"
}
}
]
}
},
{
"match": {
"state.keyword": {"query": "Fail"}
}
},
{
"match": {
"ops.keyword": {"query": "api_name"}
}
}
],
"filter": {
"range": {
"datetime": {
"gte": "{{ctx.trigger.scheduled_time}}||-5m",
"lte": "{{ctx.trigger.scheduled_time}}",
"format": "strict_date_optional_time||epoch_millis"
}
}
}
}
}
}
}
}
},
"condition": {
"script": {
"source": "if (ctx.payload.hits.total >= params.threshold) { return true; } return false;",
"lang": "painless",
"params": {
"threshold": 1
}
}
},
"actions": {
"email_admin": {
"email": {
"profile": "standard",
"attachments": {
"dashboard.pdf": {
"reporting": {
"url": "https://mydomainname.com/api/reporting/generate/printablePdf?jobParams= ..removing the rest portion of the url for security reason",
"auth": {"type":"basic","username":"elastic","password":"pass"}
}
},
"data.yml": {
"data": {
"format": "yaml"
}
}
},
"from": "from_email#xyz.com",
"to": [
"to_email_name <to_email#abc.com>"
],
"subject": "Elastic Watcher : Alert 1",
"body": {
"text": "Too many error in the system, see attached data."
}
}
}
},
"transform": {
"script": {
"source": "HashMap result = new HashMap(); result.result = ctx.payload.hits.total; return result;",
"lang": "painless",
"params": {
"threshold": 1
}
}
}
}
Our elastic stack version is 7.11.1 and the license is activated, basic stack security is enabled.
Note that, when I have tried the same from local kibana (7.10.1), where the trial license is activated, there this alerting action is working perfectly. Also note that, in my local stack, the security feature is not enabled.
Please help!!
Regards,
Souvik

Range #Timestamp is not giving results in Watcher Kibana

I am using below watcher json.
{
"trigger": {
"schedule": {
"interval": "2m"
}
},
"input": {
"search": {
"request": {
"search_type": "query_then_fetch",
"indices": [
"<log-abc.upr-dev-{now/d}>"
],
"types": [],
"body": {
"size": 20,
"query": {
"bool": {
"must": [
{
"match": {
"trailer_message": "SUCCESS"
}
},
{
"range": {
"#timestamp": {
"gte": "now-50m"
}
}
}
]
}
}
}
}
}
},
"condition": {
"compare": {
"ctx.payload.hits.total": {
"gt": 0
}
}
},
"actions": {
"notify-pagerduty": {
"webhook": {
"scheme": "https",
"host": "********",
"port": 443,
"method": "post",
"path": "******",
"params": {},
"headers": {
"Content-Type": "application/json"
},
"body": "{\r\n \"payload\": {\r\n \"summary\": \"{{ctx.payload.hits.total}} success \",\r\n \"source\": \"TEST TEST\",\r\n \"severity\": \"error\"\r\n },\r\n \"routing_key\": \"*******************\",\r\n \"event\": \"function\",\r\n \"client\": \"Watcher\"\r\n}"
}
}
}
}
My logs has Success values but after using Range i am not getting any results.
If i remove range it produces results from that day's logs.
I want to use range as well but it is not working.
Please let me know where is the problem.
I was able to solve this problem.
#timestamp was not the field being used in my logs.
there was a different index sessiontime. Once i pointed my Watcher to use sessiontime it started to work.

Getting "expected [END_OBJECT] but found [FIELD_NAME]" in Kibana

I am working on Kibana 6x and using SentiNL to generate email alerts. Below is my query to generate mail if my application generate log "CREDENTIALS ARE NOT DEFINED FOR PULL EVENT SOURCES" with threshold 1. When i play my watcher i get below error.
Error: Watchers: play watcher : execute watcher : execute advanced watcher : get elasticsearch payload : search : [parsing_exception] [match] malformed query, expected [END_OBJECT] but found [FIELD_NAME], with { line=1 & col=80 }
Query:
"input": {
"search": {
"request": {
"index": [
"filebeat-2019.03.21"
],
"body": {
"query": {
"match": {
"msg": "CREDENTIALS ARE NOT DEFINED FOR PULL EVENT SOURCES"
},
"minimum_number_should_match": 1,
"bool": {
"filter": {
"range": {
"#timestamp": {
"gte": "now-15m/m",
"lte": "now/m",
"format": "epoch_millis"
}
}
}
}
},
"size": 0,
"aggs": {
"dateAgg": {
"date_histogram": {
"field": "#timestamp",
"time_zone": "Europe/Amsterdam",
"interval": "1m",
"min_doc_count": 1
}
}
}
}
}
}
}
Also I have used "minimum_number_should_match" to track threshold value. Is that correct?
Found the solution(Here i have not added threshold value) :
{
"actions": {
"email_html_alarm_2daee075-0f24-408e-a362-59172b5e3a1d": {
"name": "email html alarm",
"throttle_period": "1m",
"email_html": {
"stateless": false,
"subject": "Error v1.9 conditon",
"priority": "high",
"html": "<p>{{payload.hits.hits}} test hits Hi {{watcher.username}}</p>\n<p>There are {{payload.hits.total}} results found by the watcher <i>{{watcher.title}}</i>.</p>\n\n<div style=\"color:grey;\">\n <hr />\n <p>This watcher sends alerts based on the following criteria:</p>\n <ul><li>{{watcher.wizard.chart_query_params.queryType}} of {{watcher.wizard.chart_query_params.over.type}} over the last {{watcher.wizard.chart_query_params.last.n}} {{watcher.wizard.chart_query_params.last.unit}} {{watcher.wizard.chart_query_params.threshold.direction}} {{watcher.wizard.chart_query_params.threshold.n}} in index {{watcher.wizard.chart_query_params.index}}</li></ul>\n</div>",
"to": "abc#qwe.com",
"from": "abc#qwe.com"
}
}
},
"input": {
"search": {
"request": {
"index": [
"file-2019.04.03"
],
"body": {
"query": {
"bool": {
"must": {
"query_string": {
"query": "CREDENTIALS ARE NOT FOUND",
"analyze_wildcard": true,
"default_field": "*"
}
},
"filter": [{
"range": {
"#timestamp": {
"gte": "now-1d",
"lte": "now/m",
"format": "epoch_millis"
}
}
}]
}
}
}
}
}
},
"condition": {
"script": {
"script": "payload.hits.total > 0"
}
},
"trigger": {
"schedule": {
"later": "every 2 minutes"
}
},
"disable": true,
"report": false,
"title": "watcher_title",
"save_payload": false,
"spy": false,
"impersonate": false
}

Set up watcher for alerting high CPU usage by some process

I'm trying to create a Watcher Alert that will be triggered when some process on a node uses over 0.95% of CPU for the last one hour.
Here is an example of my config:
{
"trigger": {
"schedule": {
"interval": "10m"
}
},
"input": {
"search": {
"request": {
"search_type": "query_then_fetch",
"indices": [
"metricbeat*"
],
"types": [],
"body": {
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"system.process.cpu.total.norm.pct": {
"gte": 0.95
}
}
},
{
"range": {
"system.process.cpu.start_time": {
"gte": "now-1h"
}
}
},
{
"match": {
"environment": "test"
}
}
]
}
}
}
}
}
},
"condition": {
"compare": {
"ctx.payload.hits.total": {
"gt": 0
}
}
},
"actions": {
"send-to-slack": {
"throttle_period_in_millis": 1800000,
"webhook": {
"scheme": "https",
"host": "hooks.slack.com",
"port": 443,
"method": "post",
"path": "{{ctx.metadata.onovozhylov-test}}",
"params": {},
"headers": {
"Content-Type": "application/json"
},
"body": "{ \"text\": \" ==========\nTest parameters:\n\tthrottle_period_in_millis: 60000\n\tInterval: 1m\n\tcpu.total.norm.pct: 0.5\n\tcpu.start_time: now-1m\n\nThe watcher:*{{ctx.watch_id}}* in env:*{{ctx.metadata.env}}* found that the process *{{ctx.system.process.name}}* has been utilizing CPU over 95% for the past 1 hr on node:\n{{#ctx.payload.nodes}}\t{{.}}\n\n{{/ctx.payload.nodes}}\n\nThe runbook entry is here: *{{ctx.metadata.runbook}}* \"}"
}
}
},
"metadata": {
"onovozhylov-test": "/services/T0U0CFMT4/BBK1A2AAH/MlHAF2QuPjGZV95dvO11111111",
"env": "{{ grains.get('environment') }}",
"runbook": "http://mytest.com"
}
}
This Watcher doesn't work when I set the metric system.process.cpu.start_time. Perhaps this metric is not a correct one... Unfortunately, I don't have relevant experience with Watcher to solve this issue on my own.
And another issue is that I don't know how to add the system.process.name into a message body.
Thanks in advance for any help!
Use timestamp field instead of system.process.cpu.start_time to check for all metrcibeat-* documents in the last 10 mins
"range": {
"timestamp": {
"gte": "now-10m",
"lte": "now"
}
}
To include system.process.name in your message body look at the {{ctx.payload}} and use the appropriate notation to refer to the process name. For ex. in one of our watcher configs we use {{_source.appname}} to refer to the application name.

How to filter on a date range for Sentinl?

So we've started to implement Sentinl to send alerts. I have managed to get a count of errors sent if it exceeds a specified threshold.
What I'm really struggling with, is filtering for the last day!
Could someone please point me in the right direction!
Herewith the script:
{
"actions": {
"Email Action": {
"throttle_period": "0h0m0s",
"email": {
"to": "juan#company.co.za",
"from": "elk#company.co.za",
"subject": "ELK - ERRORS caused by CreditDecisionServiceAPI.",
"body": "{{payload.hits.total}} ERRORS caused by CreditDecisionServiceAPI. Threshold is 100."
}
},
"Slack Action": {
"throttle_period": "0h0m0s",
"slack": {
"channel": "#alerts",
"message": "{{payload.hits.total}} ERRORS caused by CreditDecisionServiceAPI. Threshold is 100.",
"stateless": false
}
}
},
"input": {
"search": {
"request": {
"search_type": "query_then_fetch",
"index": [
"*"
],
"types": [],
"body": {
"size": 0,
"query": {
"bool": {
"must": [
{
"match": {
"appName": "CreditDecisionServiceAPI"
}
},
{
"match": {
"level": "ERROR"
}
},
{
"range": {
"timestamp": {
"from": "now-1d"
}
}
}
]
}
}
}
}
}
},
"condition": {
"script": {
"script": "payload.hits.total > 100"
}
},
"transform": {},
"trigger": {
"schedule": {
"later": "every 15 minutes"
}
},
"disable": true,
"report": false,
"title": "watcher_CreditDecisionServiceAPI_Errors"
}
So to be clear, this is the part that's being ignored by the query:
{
"range": {
"timestamp": {
"from": "now-1d"
}
}
}
You need to change it and add the filter Json tag before the range one, like that:
"filter": [
{
"range": {
"timestamp": {
"gte": "now-1d"
}
}
}
]
So we've FINALLY solved the problem!
Elastic search has changes their DSL multiple times, so please note that you need to look at what version you're using for the correct solution. We're on Version: 6.2.3
Below query finally worked:
"query": {
"bool": {
"must": [
{
"match": {
"appName": "CreditDecisionServiceAPI"
}
},
{
"match": {
"level": "ERROR"
}
},
{
"range": {
"#timestamp": {
"gte": "now-1d"
}
}
}
]
}
}

Resources