AWS Kinesis stream sending data to Lambda at slower rate - performance

I needed to implement a stream solution using AWS Kinesis streams & Lambda.
Lambda function 1 -
It adds data to stream and is invoked every 10 seconds. I added 100 data request ( each one of 1kb) to stream. I am running two instances of the script which invokes the lambda function.
Lambda function 2 -
This lambda uses above stream as trigger. On small volume of data / interval second lambda get data on same time. But on above metrics, data reaches slower than usual ( 10 minutes slower after +1 hour streaming ).
I checked the logic of both lambda functions and verified that, first lambda does not add latency before pushing data to stream. I also verified this by stream packet in second lambda where approximateArrivalTimestamp & current time clearly have the time difference increasing..
Kinesis itself did not have any issues / throttling shown in analytics ( I am using 1 shard ).
Are their any architectural changes I need to make to have it go smoother as I need to scale up at least 10 times like 20 invocations of first lambda with 200 packets, timeout 1 - 10 seconds as later benchmarks.
I am using 100 as the batch size. Can increasing/decreasing it have advantage?
UPDATE : As I explored more online, I found ideas to implement some async / front facing lambda with kinesis which in-turn invoke actual lambda asynchronously, So lambda processing time will not become bottleneck. However, this approach also failed as I have the same latency issue. I have checked the execution time. Front facing lambda ended in 1 second. But still I get big gap between approximateArrivalTimestamp & current time in both lambdas.
Please help!

For one shard, there will one be one instance of 2nd lambda.
So it works like this for 2nd lambda. The lambda reads configured record size from stream and processes it. It won't read other records until the previous records have been successfully processed.
Adding a second shard, you would have 2 lambdas processing the records. Thus the way I see to scale the architecture is by increasing the number of shards, however make sure data is evenly distributed across shards.

Related

Design Pattern - Spring KafkaListener processing 1 million records in 1 hour

My spring boot application is going to listen to 1 million records an hour from a kafka broker. The entire processing logic for each message takes 1-1.5 seconds including a database insert. Broker has 64 partitions, which is also the concurrency of my #KafkaListener.
My current code is only able to process 90 records in a minute in a lower environment where I am listening to around 50k records an hour. Below is the code and all other config parameters like max.poll.records etc are default values:
#KafkaListener(id="xyz-listener", concurrency="64", topics="my-topic")
public void listener(String record) {
// processing logic
}
I do get "it is likely that the consumer was kicked out of the group" 7-8 times an hour. I think both of these issues can be solved through isolating listener method and multithreading processing of each message but I am not sure how to do that.
There are a few points to consider here. First, 64 consumers seems a bit too much for a single application to handle consistently.
Considering each poll by default fetches 500 records per consumer at a time, your app might be getting overloaded and causing the consumers to get kicked out of the group if a single batch takes more than the 5 minutes default for max.poll.timeout.ms to be processed.
So first, I'd consider scaling the application horizontally so that each application handles a smaller amount of partitions / threads.
A second way to increase throughput would be using a batch listener, and handling processing and DB insertions in batches as you can see in this answer.
Using both, you should be processing a sensible amount of work in parallel per app, and should be able to achieve your desired throughput.
Of course, you should load test each approach with different figures to have proper metrics.
EDIT: Addressing your comment, if you want to achieve this throughput I wouldn't give up on batch processing just yet. If you do the DB operations row by row you'll need a lot more resources for the same performance.
If your rule engine doesn't do any I/O you can iterate each record from the batch through it without losing performance.
About data consistency, you can try some strategies. For example, you can have a lock to ensure that even through a rebalance only one instance will process a given batch of records at a given time - or perhaps there's a more idiomatic way of handling that in Kafka using the rebalance hooks.
With that in place, you can batch load all the information you need to filter out duplicated / outdated records when you receive the records, iterate each record through the rule engine in memory, and then batch persist all results, to then release the lock.
Of course, it's hard to come up with an ideal strategy without knowing more details about the process. The point is by doing that you should be able to handle around 10x more records within each instance, so I'd definitely give it a shot.

Why are there holes in my cloudwatch logs?

I have been running lambdas using C# with serverless.com framework for some months now, and I consistently notice holes in the cloudwatch logs. So far it has only been an annoyance. I have been looking around for some explanation, but it is starting to get to the point where I need to understand/fix the problem.
For instance, today I can see the lambda monitor shows hundreds to thousands of executions between 7AM and 8AM, but the cloudwatch logs show logfiles up until 7:19AM and then nothing again until 8:52AM.
What is going on here?
Logs are by Invocation of the lambda and log group links are by concurrent executions. If you look at your lambda metrics, you will see a stat called ConcurrentExecution - this is the total number of simultaneous serverless lambda containers you have running at any given moment - but that does NOT equal the same as Invocations. The headless project im on is doing about 5k invocations an hour and we've never been above 5 concurrent executions of any of our 25ish lambda's (helps that they all run after start up at about 300ms)
So if you have 100 invocations in 10 seconds, but they all take less than a second to run, once a given lambda container is spun up it will be reused as long as it is continually receiving events. This is how AWS works around the 'cold start' problem as much as possible where a given lambda may take 10-15 or more seconds to start up. By trying to predict traffic flow (and you can manipulate these settings as well) AWS is attempting to have a warm lambda ready to go for you whenever you need it.
These concurrent executions are slowly shut down as their volume drops off, their calls brought back in to other ones that are still active.
What this means for Log Group logs is two fold:
you may see large 'gaps' in the times but if you look closely any given log group will have multiple invocations in it.
log groups are delayed by several seconds to several minutes depending on the server load, so at any given time you may not actually be seeing all the logs of a given moment.
The other possibility is that you logging is not set up correctly (Python lambda's in particular have difficulty in logging properly to cloudwatch - the default Logging Handler doesn't play nice with the way lambda boots up a handler to attach it to the logGroup) or what you are getting is a ton of hits that are not actually doing anything - only pings/keep alive events that do not actually trigger any of your log statement - at which you will generally only see the concurrent start up/shutdown log statements (as stated above they are far fewer)
What do you mean with gaps in log groups?
A log group gets its log by log streams and one of the same lambda container use the same log stream. So it may not be the most recent log stream in your log group that have the latest log entry.
Here you can read more about it:
https://dashbird.io/blog/how-to-save-hundreds-hours-debugging-lambda/
While trying to edit my question with screenshots and tallies of the data, I came upon the answer. I thought it would be helpful for this to be a separate answer as it is extremely specific and enlightening.
The crux of the problem is that I didn't expect such huge gaps between invocation times and log write times. 12 minutes is an eternity compared to the work I have done in the past.
Consider this graph:
12:59 UTC should be 7:59AM CST. Counting the invocations between 12:59 and 13:08, I get roughly ~110.
Cloudwatch shows these log streams:
Looking at these log streams, there seems to be a large gap. The timestamp on the log stream is the "file close" time. The logstream for 8:08:37 includes events from 12 minutes before.
So the timestamps on the log streams are not very useful for finding debug data. The search all has not been very helpful up until now either. Slow and very limited. I will look into some other method for crunching logs.

Limit AWS SQS messages visible per second of AWS Lambda invocations per second

I am implementing a solution that involves SQS that triggers a Lambda funcion, that uses a 3rd party API to perform some operations.
That 3rd party API has a limit of requests per second, so I would like to limit the amount of SQS messages processed by my Lambda funtion to a similar rate.
Is there any way to limit the number of messages visibles per second on the SQS or the number of invocations per second of a Lambda function?
[edited]
After some insights given in the comments about AWS Kinesis:
There is no lean solution by handling Kinesis parameters Batch Window, Batch size and payload size, due to the behaviour of Kinesis has that triggers the lambda execution if ANY of the thresholds and reached:
* Given N = the max number of request per second I can execute over the 3rd party api.
* Configuring a Batch Window = 1 second and a Batch Size of N, back presurre should trigger the execution with more than N_MAX requests.
* Configuring a Batch Windows = 1 secnd and a Batch Size of MAX_ALLOWED_VALUE, will be under performant and also does not guarantee executing less than N execution per second.
The simplest solution I have found is creating a Lambda with a fixed execution rate of 1 second, that reads a fixed number of messages N from SQS/Kinesis, and write those in another SQS/Kinesis, having those another Lambda as endpoint.
This is a difficult situation.
Amazon SQS can trigger multiple AWS Lambda functions in parallel, so there is not central oversight of how fast requests can be made to the 3rd-party API.
From Managing concurrency for a Lambda function - AWS Lambda:
To ensure that a function can always reach a certain level of concurrency, you can configure the function with reserved concurrency. When a function has reserved concurrency, no other function can use that concurrency. Reserved concurrency also limits the maximum concurrency for the function, and applies to the function as a whole, including versions and aliases.
Therefore, concurrency can be used to limit the number of simultaneous Lambda functions executing, but this does not necessarily map to "x API calls per second". That would depend upon how long the Lambda function takes to execute (eg 2 seconds) and how many API calls it makes in that time (eg 2 API calls).
It might be necessary to introduce delays either within the Lambda function (not great because you are still paying for the function to run while waiting), or outside the Lambda function (by triggering the Lambda functions in a different way, or even doing the processing outside of Lambda).
The easiest (but not efficient) method might be:
Set a concurrency of 1
Have the Lambda function retry the API call if it is rejected
Thanks to #John Rotenstein gave a comprehensive and detailed answer about SQS part.
If your design is limited to a single consumer than you may replace sqs with kinesis streams. By replacing it, you may use batch window option of kinesis to limit the requests made by consumer. Batch window option is used to reduce the number of invocations
Lambda reads records from a stream at a fixed cadence (e.g. once per second for Kinesis data streams) and invokes a function with a batch of records. Batch Window allows you to wait as long as 300s to build a batch before invoking a function. Now, a function is invoked when one of the following conditions is met: the payload size reaches 6MB, the Batch Window reaches its maximum value, or the Batch Size reaches its maximum value. With Batch Window, you can increase the average number of records passed to the function with each invocation. This is helpful when you want to reduce the number of invocations and optimize cost.

Set a timeout-delay before batch batch execution for AWS Kinesis

I am using AWS Kinesis (configured with the Serverless Framework) and I am using batchSize of 1.
processEvents:
handler: ...
events:
- stream:
type: kinesis
batchSize: 1
arn:
Fn::GetAtt: [KinesisStream, Arn]
Is there a way to set a timeout of 20 seconds, for example, before reading the next batch?
I want to have a time delay before each lambda executes, pulling basically a record from the stream.
Thank you!
Your Lambda is called synchronously by Kinesis, and it will only proceed to the next event if the Lambda returns successfully.
These circumstances give you the opportunity to write the delay yourself in your Lambda code.
await new Promise(done => setTimeout(done, 20000))
processMyEvent(event)
With a batch size of 1 this Lambda would have to finish first before the next event is run.
Note though that the additional Lambda run time will incur costs.
You could go the other direction too, execute your actual code and then delay the shutdown of your Lambda. You could even use the context object to see how long the Lambda has been running if your processing varies in running time.
Lastly, I would recommend changing your architecture. Besides the additional costs incurred you're artificially slowing down your platform.

How does AWS Lambda handle concurrency?

I have a lambda that queries a database for a count, and then submits to an SQS queue a number that represents an offset of a query that another lambda works on. The second lambda is triggered by a push onto the queue. If I set the concurrency to 10, does that mean the lambda will act like a threadpool and will continue restarting until the queue is empty?
Really accurate image of what I'm thinking about:
Example
Lambda A queries DB and finds that there are 10000 items in the table, so it submits 100 messages to the queue that go from 0 to 10000 in 100 chunks.
Lambda B has a concurrency of 10 and is triggered by puts in the queue, and each one pulls, does some work, and puts the result somewhere else and does whatever lambdas do after their job is done. After they're all done, there's still 90 tasks left, does another pool of 10 start, or once a lambda is done another takes it's place?
Since Lambda B has a concurrency of 10, then there will be a maximum of 10 Lambda functions running at any time.
When one Lambda function has completed, another will be triggered until there is nothing left in the SQS queue.
It is likely that AWS Lambda will create 10 Lambda containers, and each container will be re-used on subsequent calls.
See: Understanding Container Reuse in AWS Lambda | AWS Compute Blog

Resources