CloudWatch Event Rule: Disable itself - aws-lambda

I have a cloudwatch event-rule which runs a lambda-function and is configured f.e. like cron(0/2 21-22 15 4 ? 2020).
Inside the lambda function it checks on something. If that condition is fulfilled, I want to disable the rule. I cannot do it via
new CloudWatchEvents().disableRule({Name: 'MyRule'})
because of concurrency. Is there any other way to achieve it comfortably?

Related

Resilience Circuit breaker config update dynamically

I have configured the C.B for Client_Service which calls to Remote_Service, in Fallback situation, it calls to Health_Service and this (called service) provides you the Future_Timestamp of Remote_Service to come-back as up & running state.
Now I want to configure my C.B as per this Future_Timestamp, if it is 2 hours later then I want to keep my C.B as Open state (so for 2 hours, no calls to Remote_Service) and post 2 hours, request can call to Remote_Service if it succed C.B can be change into Half-Open then to Closed or if it fails again, Fallback method will call to Health_Service and same pattern will follow.
Is there anyway out for this problem statement or any alternative solution?

Firing many alerts from one rule by metric label

There is a way to fire many alerts from same rule using the metric label?
I have a prometheus counter metric with a label “client”.
How I configure one rule to fire for each client that separately satisfy a fire condition?
My version is 8.4.2
This is exactly the way alerts work in Prometheus. It will generate one alert for each label combination that satisfies a fire condition.
For example, the following rule:
- alert: InstanceIsDown
expr: probe_success{job="blackbox"} == 0
annotations:
summary: 'Instance {{ $labels.instance }} is down'
The Blackbox "probe_success" metric has an "instance" label. If the instances "xxx" and "yyy" are down, the rule will generate two alerts, one for each instance.

Firing Alerts for an activity which is supposed to happen during a particular time interval(using Prometheus Metrics and AlertManager)

I am fairly new to Prometheus alertmanager and had a doubt regarding firing alerts only during a particular period
I have a microservice which receives a file and does some processing on it, which is only invoked when it gets a message through a Kafka queue. The aforementioned is supposed to come every day between 5 am and 6 am(UTC time). The microservice has a metric which is incremented by 1 every time it receives a file. I want to raise an alert if it does not receive a file in the interval. I have created a query like this :
expr : sum(increase(metric_name[1m]) and on() hour(vector(time()))==5) < 1
for: 1h
My questions:-
1) Is it correct or is there a better way to do it
2) In case of no update, will it return 0 or "datapoints not found"
3) Is increase the correct function as it tends to give results in decimals due to extrapolation, but I understand if increase is 0, it will show 0
I can't really play around with scrape_intervals, which is set at 30s.
I have not run this expression but I expect it will cause an alert to fire at 06:00 only and then go off at 06:01. It is the only time the expression would hold true for one hour.
Answering your questions
It is correct if what you want is a single fire of alert (sending a mail by example) but then no longer firing. Even with that, the schedule is a bit tight and may get hurt by alertmanager delay causing the alert to be lost.
In case of no increase, you will get the expression will evaluate to 0. It will be empty when there is an update
Increase is the right function. It even takes into account reset of the counter.
Answering if there is a better way to do it.
Regarding your expression, you can have the same result, without for clause, with:
expr: increase(metric_name[1h])==0 and on() hour()==6 and on() minute()<1
It reads a : starting at 6am and for 1 minutes, if there was no increase of metric over the lasthour.
Alerting longer
If you want the alert to last longer (say for the day and you silence it when it is solved), you can use sub-queries;
expr: increase((metric and on() hour()==5)[18h:])==0 and on() hour()>5
It reads as : starting at 6am (hour()>5), compute the increase over 5-6am for the next 18 hours. If you like having a pending, you can drop the trailing on() hour()>5 and use a for: 1h clause.
If you want to alert until a file is submitted and thus detect a resolution, simply transform the expression to evaluate the increase until now:
expr: increase((metric and on() hour()>5)[18h:])==0 and on() hour()>5

Elastalert constant realerting.

I'm having some difficulties setting up an elastalert rule. It's quite a basic one, and I've read the documentation but clearly not understood it and I'm after some help.
I have a basic test rule that i want to alert when my data input to elastic from certain devices stops for more that 5 minutes.
es_host: localhost
es_port: 9200
name: Example rule
type: flatline
index: test_mapping-*
threshold: 1
timeframe:
minutes: 5
filter:
- term:
device: "ggYthy767b"
alert:
- command
command: ["/bin/test"]
realert:
minutes: 10
This works, so when data stops i get an alert, then that alert is silenced until 10 minutes later it realerts again. The issue is that it realerts every 10 minutes and i don't know how to stop it. Is there a way to get it to realert just once and then stop? Or have i misunderstood? Also I have 10+ different devices, and i want the same alert to apply if any of them stop sending data for 5 minutes, is that possible within one rule? Thanks very much in advance.
The question you need to ask to yourself is how often do you want to get alerted. Once a lifetime, a year, a month or fortnightly or what? So "realert" is the part you want to edit. You might want to change it to something like below. So even if the alert is triggered multiple times you'll only get it once a day. It uses simple English terms so you can update it how you like it (weeks, hours etc.).
realert:
days: 1
But if you're getting alerted much more than you want, either you're system is too unstable or your alerts are too paranoid. For example for this alert every 5 minutes you're looking for one record which actually doesn't get populated. You should raise your period or add less selective filters because it's a 'flatline' alert. You can also use it with "query_key" so it will be applied on a per key basis.

Is Parse providing 15 seconds for Cloud Code functions

I'm currently coding an app that utilizes Parse as a backend, but have run into a '124' error. I admit that I do a lot in my cloud functions, but, from what I've observed, it doesn't appear over 15 seconds. Could someone please confirm this? Below is the output.
E2015-03-06T03:49:52.644Z] v286: Ran cloud function createEvent for
user puZNjFVfSm with:
Input:
{"RSVPDate":{"__type":"Date","iso":"2015-03-06T04:49:52.000Z"},"description":"Sample event to showcase
functionality","group":{"max":5,"min":4},"max":50,"reoccur":{"day":1,"month":1,"stop":{"__type":"Date","iso":"2015-03-06T04:49:52.000Z"},"week":1},"title":"SampleFCFS"}
Failed with: Execution timed out I2015-03-06T03:49:52.716Z] begin
I2015-03-06T03:49:52.717Z] creating Event - initial checks completed
I2015-03-06T03:49:52.718Z] Finished advanced checks
I2015-03-06T03:49:52.719Z] Event creation start
I2015-03-06T03:49:52.770Z] begin event creation
I2015-03-06T03:49:52.873Z] Finding role: company_employee_z0Zx39OyuY
I2015-03-06T03:49:52.875Z] Added and secured event
I2015-03-06T03:49:52.931Z] attaching role to 425Qy9v9e4
I2015-03-06T03:49:52.934Z] Adding participant
From what I can tell, it looks like I'm only getting 300Z (is that milliseconds?) on all my runs. Shouldn't I be getting 15 seconds?
Update: I found that the issue was caused by using the addUnique function of Parse Objects with an array of pointers. By inserting ids instead of pointers, the issue was resolved.
Thank you for your help.

Resources