How to scale concurrent step function executions and avoid any maxConcurrent exceptions? - aws-lambda

Problem: I have a Lambda which produces an array of objects which can have the length of a few thousands (worst case). Each object in this array should be processed by a stepfunction.
I am trying to figure out what the best scalable and error prone solution is so that every object is processed by the stepfunction.
The complete stepfunction does not have a long execution time (under 5 min) but has to wait in some steps for other services to continue the execution (WaitForTaskToken). The stepfunction contains a few short running lambdas.
These are the possibilities I have at the moment:
1. Naive approach: In my head a few thousands or even ten thousands execution concurrent are not a big deal so why can't I just iterate over each element and start an execution directly from the lambda?
2. SQS. Lambda can put each object into SQS and another lambda processes a batch of 10 and starts 10 stepfunction executions. Then I could have some max concurrency of the processing lambda to avoid to many stepfunction executions. But this explains of some issues with such an approach where messages could not be processed, and overall this is alot of overhead I think.
3. Using a Map State: I just could give the array to a mapstate which runs for each object the statemachine with max 40 concurrent iterations. But what if the array is greater than 40? Can I just catch the error and retry with the objects which were not processed in a error catch state so long until all executions are either done or failed. This means if there is one failed execution I still want to have the other 39 executions to run.
4. Split the object in batches and run them parallel: Similar to 3. but instead of just giving all objects to the map state, there is another state which splits the array in 40s and forwards them to the map state and waits until they are finished to process the next batch. So there is one "main" state which runs for a longer time + 40 worker states at the same time.
All of those approaches only take the step function execution concurrency into account but not the lambda concurrencies. Since the stepfunctions uses lambdas there are also alot of concurrent lambdas running. Could this be an issue? And if so, how can I mitigate this?

Inline Map States can handle lots1 of iterations, but only up to 40 concurrently2. Iterations over the MaxConcurrency don't cause an error. They will be invoked with delay.
If your Step Function is only running ~40 concurrent iterations, Lambda concurrency should not be a constraint either.
I just tested a Map state with 1,000 items. Worked just fine. The Quotas page does not mention an upper limit.
In Distributed mode a Map State can handle 10,000 parallel child executions.

Related

Check how many Jobs are processed in one second?

I'm trying to implement Rate Limiting using Redis. I need to call an API that has TPS 100 or that can be called 100 times in every second. I've a limitation that I cannot call API for testing. How can I test that my API is called not more than 100 times in a second ?
Redis::throttle('key')->allow(100)->every(1)->then(function () {
// API called that has rate limit
}, function () {
// Could not obtain lock...
return $this->release(10);
});
Rate limiting is not a trivial task. Finding the exact count at the second level becomes difficult due to round trip time. In your case, as you've specified 100 as max count, Larval guarantees that it will process at the max 100 messages in a second.
Each method call will add an addition of 1-5 Ms in execution time unless your Redis is always providing a response in microsecond(s).
The best bet could be, you benchmark your Redis calls and see how many locks you can acquire using the Laravel throttle primitive. As a basic test set the throttle to 10_000 and simply run a print statement and check how many calls you can make (you should do in production), this number will give you the maximum number of the locks you can acquire in a second.
If you can acquire more than 100 locks in a second then nothing should stop you from using the primitive you have mentioned.
For testing, you can do a couple of things like
Find p99/p95/p90 response time of the said API
Add a dummy API/method in your systems, API/method can simply take either p99/p95/p90 seconds. For simplicity, it can just sleep for the required interval.
Now you can hit your dummy method/API, here you can do all sorts of counting to identify if you're exceeding throttle limit, you can use millisecond for logging and aggregate over seconds to identify any issue, run this for an hour or so.

DynamoDB Stream to Lambda slow/unsuable

I've connected a lambda to a DyDB table via a stream. When a record is written to the table, it triggers the lambda. The traffic is very bursty, so nothing might happen for a while, then I'll write several thousand records.
What I'm seeing is a few lambda instances will be triggered, but not enough to handle the burst. Then at random times, the number of lambda instances will jump an order of magnitude or two (from 2 to 90 or more), and it will catch up. The problem is the jump might not occur for 30 minutes or more.
I'm seeing the records written to the table very quickly (seconds). The processing of 20 records by the lambda shouldn't take more than 2 minutes. It seems like the lambdas are spending most of their time sitting around waiting for records to show up. The record key for the table is a GUID.
Things I've tried
Playing with the number of records to make sure there's no lambda timeouts (20 seems to be conservative, but 100 causes timeouts)
Moving the lambda to a different subnet
Batching the writes to the table (~500-1000 records in a batch)
Breaking up the writes in hopes it would trigger more lambdas (~20-100 records in a batch)
Increasing the lambda memory to the max (3GB)
Reducing memory to be larger than used (1GB, 300Mb used)
Is there a better pattern to be using? Should I skip the stream and just write SNS messages? I don't care about order, but would prefer to not run the job more than once.
So here's what I found out.
It looks like the problem is contention on the DynamoDB stream by the lambda instances.
My solution was to skip the DynamoDB stream and not use it, and post to an SNS queue. The lambdas pick up the messages, and scale much better. Times have gone from hours to seconds.

Placing custom execution rate limits on AWS Step Functions

I have a Step functions setup that is spawning a Lambda function. The Lambda functions are getting spawned too fast and causing other services to throttle, so I would like Step functions to have a rate limit on the number of job it kicks off at a given time.
How do I best approach this?
I would try setting this limit at Lambda Function side using concurrent execution limit - https://docs.aws.amazon.com/lambda/latest/dg/concurrent-executions.html. This way you can limit maximum number of executions of that specific Lambda Function thus leaving unreserved concurrency pool for rest of functions.

jmeter-no of threads and loop controller

I am running JMeter with number of threads=10,60,140 for the multiple thread groups and We are getting high response time.
If we changed recording controller to loop controller and same values given in loop count, then we are getting least response time.
Why there is a difference between them? Which response should we consider?
Threads are executed in parallel while loop is executed samplers sequentially.
Executing numerous calls in parallel on same machine versus sequentially is basically creating more stress on server (more hits per seconds).
When server is under stress there may appears waits/locks because of reaching max number of X, where X can be either database/server/resource/...
Therefore your response time will be higher when using threads over loop number.
Instead of this approaches, you probably should consider try to simulate real users behavior, see for more details an answer.

Google App Engine Task Queue

I want to run 50 tasks. All these tasks execute the same piece of code. Only difference will be the data. Which will be completed faster ?
a. Queuing up 50 tasks in a queue
b. Queuing up 5 tasks each in 10 different queue
Is there any ideal number of tasks that can be queued up in 1 queue before using another queue ?
The rate at which tasks are executed depends on two factors: the number of instances your app is running on, and the execution rate of the queue the tasks are on.
The maximum task queue execution rate is now 100 per queue per second, so that's not likely to be a limiting factor - so there's no harm in adding them to the same queue. In any case, sharding between queues for more execution rate is at best a hack. Queues are designed for functional separation, not as a performance measure.
The bursting rate of task queues is controlled by the bucket size. If there is a token in the queue's bucket the task should run immediately. So if you have:
queue:
- name: big_queue
rate: 50/s
bucket_size: 50
And haven't queue any tasks in a second all tasks should start right away.
see http://code.google.com/appengine/docs/python/config/queue.html#Queue_Definitions for more information.
Splitting the tasks into different queues will not improve the response time unless the bucket hadn't had enough time to completely fill with tokens.
I'd add another factor into the mix- concurrency. If you have slow running (more than 30 seconds or so) tasks, then AppEngine seems to struggle to scale up the correct number of instances to deal with the requests (seems to max out about 7-8 for me).
As of SDK 1.4.3, there's a setting in your queue.xml and your appengine-web.config you can use to tell AppEngine that each instance can handle more than one task at a time:
<threadsafe>true</threadsafe> (in appengine-web.xml)
<max-concurrent-requests>10</max-concurrent-requests> (in queue.xml)
This solved all my problems with tasks executing too slowly (despite setting all other queue params to the maximum)
More Details (http://blog.crispyfriedsoftware.com)
Queue up 50 tasks and set your queue to process 10 at a time or whatever you would like if they can run independently of each other. I see a similar problem and I just run 10 tasks at a time to process the 3300 or so that I need to run. It takes 45 minutes or so to process all of them but the CPU time used is negligible surprisingly.

Resources