Kinesis + AWS Lambda: monitoring Stream - aws-lambda

I just wonder what are good metrics to ensure the current number of lambda functions processing the stream are actually coping with the load.
With spark applications + Kinesis one can easily take a look at the throughput/current checkpoint of receivers within the stream.

The lambda metrics you get out of the box with Lambda are not very useful for this.
We publish our own custom CloudWatch metric from our Lambda functions called 'SecondsBehind' which is the difference between the current timestamp and the approximateArrivalTimestamp from the kinesis record.
This shows us if we are starting to fall behind.

Related

How to ensure DynamoDB Stream records are not lost forever when Lambda fails for over 24 hours?

I am using a DynamoDB Stream (non-Kinesis version) and I've mapped the stream to a Lambda to process events.
Two things I understand about this stream are:
If the Lambda fails, it will automatically retry with the stream event.
DynamoDB stream will only keep the record for up to 24 hours.
My concern is that I want to be able to make sure my Lambda never misses a DynamoDB event, even if the Lambda is failing for more than 24 hours.
How can I ensure that the stream records are not lost forever if my Lambda fails for an extended period of time?
My initial thought is to treat this like I would a Lambda that reads from an SQS queue. I'd like to add a retry policy and DLQ to the Lambda, which would store failed events in a DLQ to reprocess at a later time.
Is this all that needs to be done to achieve what I want? I am struggling on finding documentation on how to do this with DynamoDB Stream. Is DDB Stream behavior any different than an SQS queue?
Why would the lambda fail for 24 hours?
My guess is your lambda relies on something downstream which you’re anticipating might be down for a long duration. In that case I’d suggest the lambda decide when to “give up” and it can toss its work items to your own SQS queue for later processing. You can’t keep items in the DynamoDB Stream for longer than the 24 hours, nor does the Stream have a DLQ.
Another option: DynamoDB can stream via Kinesis which has longer retention. The automatic lambda invocation however is only for DynamoDB Streams.

Publishing high-volume metrics from Lambdas?

I have a bunch of Lambdas written in Go that produce certain events that are pushed out to various systems. I would like to publish metrics to CloudWatch that slice these by the event type. The volume is currently about 20000 events per second with peaks about twice that much.
Due to the load, I can't publish these metrics one-by-one on each Lambda invocation (each invocation produces a single event). What available approaches are there that cheap and don't hit any limits?
You can try to utilize shutdown phase from lambda lifecycle to publish you metric.
https://docs.aws.amazon.com/lambda/latest/dg/runtimes-context.html#runtimes-lifecycle-shutdown
To publish metric would suggest to utilize EMF(Embedded Metric Format) to combine multiple data points when calling PutMetricData API which takes also an array to act like a batch.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_PutMetricData.html

Posting to ElasticSearch using Lambda vs Firehose with CloudWatch subscription filter

I'm currently looking into real time processing of CloudWatch logs. We are generating around 30-40GB of logs per day (API Gateway full request/response) and have around ~100 requests/second of average. Ultimately we would like to process the logs to extract statistics from query strings as well as response headers and post the results to ElasticSearch, or S3.
I'm currently looking at two options and struggling to understand what the difference would be between these options
Create a CloudWatch subscription filter with Lambda function destination. Process data in the Lambda and post to ElasticSearch/S3.
Create a CloudWatch subscription filter and subscribe from a Firehose destination. Use a Lambda transformation function to extract the data, put it back into the stream and let Firehose post the data to ElasticSearch/S3.
The subscription filter will basically pass on records containing "Method request body before transformations" and "Endpoint response headers:" for further processing.
I don't have any insight into how the triggering of a Lambda function from a CloudWatch subscription filter would happen. Is there any batching involved, or will it be triggered for every single log event passed by the subscription filter? On the contrary I DO understand that Firehose does offer batching, which I can control to some extent.
Can someone offer advice on this? Are there any other options that I might have overlooked? Appreciate any input.
You can connect the cloudwatch logs with amazon kinesis streams using subscription filters. In this way, whenever a new log matches the subscription filter it is appended to your kinesis stream.
Once your logs are located in Kinesis streams you have a lot of options. For example you can send them to Kinesis Firehose Delivery Streams and probably transform them using a Datatransformation lambda or directly to Elastic Search e.t.c.
Assuming that you already have cloudwatch logs and a kinesis stream , the cloudformation resource you need is the following:
KinesisToCloudwatchSubscription:
Type: AWS::Logs::SubscriptionFilter
Properties:
DestinationArn: !GetAtt [your_kinesis_stream, Arn]
FilterPattern: your_cloudwatch_logstream
RoleArn: A role that permits the "kinesis:PutRecord" action
LogGroupName: "/aws/lambda/your_lambda_function"

Amazon Web Services: Spark Streaming or Lambda

I am looking for some high level guidance on an architecture. I have a provider writing "transactions" to a Kinesis pipe (about 1MM/day). I need to pull those transactions off, one at a time, validating data, hitting other SOAP or Rest services for additional information, applying some business logic, and writing the results to S3.
One approach that has been proposed is use Spark job that runs forever, pulling data and processing it within the Spark environment. The benefits were enumerated as shareable cached data, availability of SQL, and in-house knowledge of Spark.
My thought was to have a series of Lambda functions that would process the data. As I understand it, I can have a Lambda watching the Kinesis pipe for new data. I want to run the pulled data through a bunch of small steps (lambdas), each one doing a single step in the process. This seems like an ideal use of Step Functions. With regards to caches, if any are needed, I thought that Redis on ElastiCache could be used.
Can this be done using a combination of Lambda and Step Functions (using lambdas)? If it can be done, is it the best approach? What other alternatives should I consider?
This can be achieved using a combination of Lambda and Step Functions. As you described, the lambda would monitor the stream and kick off a new execution of a state machine, passing the transaction data to it as an input. You can see more documentation around kinesis with lambda here: http://docs.aws.amazon.com/lambda/latest/dg/with-kinesis.html.
The state machine would then pass the data from one Lambda function to the next where the data will be processed and written to S3. You need to contact AWS for an increase on the default 2 per second StartExecution API limit to support 1MM/day.
Hope this helps!

Spark Streaming with Kinesis - How to force checkpoint?

I have a streaming application that reads data from Aws Kinesis.
By default when you create the stream receivers you can choose at which interval to do the checkpoint, that is done on DynamoDB.
At a certain point I would like to stop my application (sparkStreamingContext.stop()) but before that I would like to force a checkpoint.
Is it possible to do that?
I know that if the checkpoint would be on a filesystem I should do sparkStreamingContext.checkpoint(directoryName) but the checkpoint for kinesis is on DynamoDB, so how can I do it?
Thanks!
Forcing a checkpoint isn't possible. Checkpointing is sort of an implementation detail of Spark in a means to do recovery and guarantee delivery of messages., thus you can't simply "invoke a checkpoint" as you wish.
If you really want to control when saving of the data happens, you'll need to also manage the state yourself.

Resources