CloudWatch alarm for invocation of Lambdas with scheduling periods > 1 day - aws-lambda

I have a Lambda that is triggered to run every week, and I want to have a CloudWatch alarm if it ever does not run for more than 7 consecutive days.
My thinking was Alarm if < 1 invocation for 8 days but it does not seem to be possible to set it longer than 24 hours:
The alarm evaluation period (number of datapoints times the period of
the metric) must be no longer than 24 hours.
Is there another way to ensure execution of Lambdas that are triggered on a period of greater than 24 hours?

Maximum evaluation period is 24 hours.
You can get around that by creating a custom metric using CloudWatch PutMetricData API. You can publish the time elapsed since the last execution of your lambda function and then alarm when the value rises above 8 days.
One way of doing this would be to have your lambda function store the timestamp of execution to DynamoDB every time it triggers. Then you can create a new function that will read that timestamp from DynamoDB and publish the difference between it and current time to a custom metric (have that lambda trigger every 1h for example).
Once you have the new custom metric flowing, you can create an alarm that will fire if the value goes above 8 days for one 1h datapoint (this will solve your initial issue). You can also set the Treat missing data as option to bad - breaching threshold (this will alert you if the second lambda function doesn't trigger).
You should also set alarms on CloudWatch Events errors and Lambda errors. This will alert you if something goes wrong with the scheduling or the lambda itself. But the custom metric I mentioned above will also alert you in the case of human error where someone disables or deletes the event or the function by mistake for example.

Related

Is there any way to check if a lambda function has been idle for a given amount of time?

I have one use case where I am supposed to execute a piece of code based on idle time of a given lambda function, I mean if given function has been idle for say 5 mins, my piece of code should run.
Is there any way to check the lambda state/status?
I assume you are looking to avoid lambda cold starts, please leverage Provisioned Concurrency which will have lambda running up with the amount of concurrency setup
https://aws.amazon.com/blogs/aws/new-provisioned-concurrency-for-lambda-functions/
If you did not mean this, then I assume idleness as "no requests processed" by lambda, if yes, then use cloudwatch metric/alarm to monitor # of invocations over a timeframe and then do whatever in its action

Why aws lamba keep on executing after intervals even after removing cloudwatch event?

I have a Lambda function(written in java) which was scheduled to execute every 15mins and it will run for next 15 mins( as this is the highest value for lambda timeout). So this is how the architecture looks: CloudwatchEvent(every 15mins) -----> Lambda(Runs for 15mins)
I have removed the cloudwatch event from lambda console but still I can see it has been invoked for few hours every 15 mins. Is this a bug or some architectural concept issue?
Thanks

New Relic not dispatching NRQL alert condition for process when errors are triggered

I'm creating a monitoring for a process using New Relic. The process itself is an AWS Lambda that finishes running in around 15 seconds. Any time this process fails, I want to an alert to be triggered and an email to be sent to me per the policy I've configured.
For testing purposes I'm causing the lambda to fail in a QA environment multiple times in a row to see what gets picked up by New Relic, although in production the failure would only occur a couple (less than 3) times per week, potentially a few days apart.
Here is the chart that depicts all of the failures, the NRQL query, and the thresholds. As we can see, the summed errors are well above the threshold but for some reason the alert email is not being dispatched. Any ideas?
Try increasing your evaluation offset in Condition Settings -> Advanced Settings > Evaluation offset
New Relic polls for Lambda metrics every 5 minutes so if your offset is lower than this you may find that the alert doesn't fire.
In reality I've found this quite unreliable and I'd suggest setting quite a high offset initially to test the alert - maybe 20 or 30 minutes.
According to me the red highlighted area is the timeframe where the alert condition is being violated. Alert should had been triggered, check your notification channel and try sending test notification.

How can I monitor the lambda invocation metrics in cloudwatch?

I have created a lambda function in AWS then add a trigger event from cloudwatch which trigger the lambda function every minute. But I couldn't see the lambda function is called every minute from Monitoring view of lambda after running it for a whole night.
Below is the screenshot of Invocation metric:
You can see that the maximum number of invocations is only 5 during the last hour.
And below screenshot is the lambda configuration which has cloudwatch event as its trigger source.
The definition in cloudwatch event is shown below screenshot. It links to my lambda function and its status is Enabled. I don't understand why the Invocations is not showing the right number of calling. Or do I mis-understand anything here? Or isn't my lambda function called?
Your lambda is being called correctly. The invocation graph shows the number of calls per 5 minutes. The graph is showing that there are 5 invocations every 5 minutes, which means that your lambda is being called once every minute.
You can also take a look at the lambda's cloudwatch logs to check if your function is being called correctly.

Amazon EC2 AutoScaling CPUUtilization Alarm- INSUFFICIENT DATA

So I've been using Boto in Python to try and configure autoscaling based on CPUUtilization, more or less exactly as specified in this example:
http://boto.readthedocs.org/en/latest/autoscale_tut.html
However both alarms in CloudWatch just report:
State Details: State changed to 'INSUFFICIENT_DATA' at 2012/11/12
16:30 UTC. Reason: Unchecked: Initial alarm creation
Auto scaling is working fine but the alarms aren't picking up any CPUUtilization data at all. Any ideas for things I can try?
Edit: The instance itself reports CPU utilisation data, just not when I try and create an alarm in CloudWatch, programatically in python or in the interface. Detailed monitoring is also enabled just in case...
Thanks!
The official answer from AWS goes like this:
Hi, There is an inherent delay in transitioning into INSUFFICIENT_DATA
state (only) as alarms wait for a period of time to compensate for
metric generation latency. For an alarm with a 60 second period, the
delay before transition into I_D state will be between 5 and 10
minutes.
John.
Apparently this is a temporary state and will likely resolve itself.
I am not sure what's going on in the backend, but if you compare the alarm history you will see AWS remove the 'unit' column if you just modify the alarm without any change as at7000ft said. So remove the unit column of your script.
Make sure that the alarm's Namespace is 'AWS/EC2'.
I know this is a long time after the original question, but in case others find this via Google, I had the same problem, and it turned out I set alarm's Namespace improperly.
It is needed to publish data with the same unit used to create the alarm. If you didn't specify one, it will be a <None> unit.
Unit can be specified in aws put-metric-data and aws-put-metric-alarm with --unit <value>
Unit <value> can be:
Seconds
Bytes
Bits
Percent
Count
Bytes/Second (bytes per second)
Bits/Second (bits per second)
Count/Second (counts per second)
None (default when no unit is specified)
Units are also case-sensitive, be carefull about that in your scripts.
For CPUUtilization, you can use Percent.
After the first data-set is sent to your alarm (it can take up to 5 minutes for a non-detailed monitored instance), the alarm will switch to the OK or ALARM state instead of the INSUFFICIENT_DATA one.
I am having the same INSUFFICIENT_DATA alarm state show up in CloudWatch for an RDS CPUUtilization > 60 alarm created with CloudFormation. ("Reason: Unchecked: Initial alarm creation" shows up under details). This is a very crude fix but I found that by selecting the alarm, clicking the Modify button, and then the Save button (without changing anything) the alarm goes to the OK state and everything is file.
I had this problem. Make sure the metric name you use to create the alarm matches the actual metric name.
You can list your metrics with:
aws cloudwatch list-metrics --namespace=<NAMESPACE, e.g. System/Linux, etc>
Find the metric and the MetricName. Make sure your alarm is configured for that metric.
As far as I know, default metric resolution is 5 minutes (which can be lowered to 1 minute if you pay up, or something like that), so if your alarm's measurement period is lower than that, then it'll remain permanently in an INSUFFICIENT_DATA state. In my case, I had a 1 minute measurement period on CPU utilization, and changing it to 5 minutes has fixed the state issue.
I had a similar problem, my alarm was constantly in INSUFFICIENT_DATA status although I can see the metric in the GUI.
Come out that this happen, because I specified the wrong Unit for the metric, when I created the Alarm. No error was reported back but it never became GREEN.
Better to avoid to specify it, if you are not sure, and AWS will do the correct match in the background.
There is a directory /var/tmp/aws-mon/ that contains a couple files. One is instance-id. The instance I was on was created from an AMI and this file retained the old instance id. I just edited it and made sure /var/tmp/aws-mon/placement/availability-zone was also correct. The alarms changed to OK almost instantly.
Also ran into this problem but for a different reason: I passed ES cluster ARN instead of domain name in my Cloudformation template. It was pretty frustrating

Resources