How come I find so little examples of the KCL being used with AWS Lambda.
It does provide a fine implementation for keeping track of your position on the stream (checkpointing).
I want to use the KCL as a consumer. My set-up is a stream with multiple shards. On each shard a Lambda is consuming. I want to use the KCL in the Lambda's to track the position of the iterator on the shard.
Why can't I find anyone who use the KCL with Lambda.
Since you can directly consume from Kinesis in your lambdas (using Kinesis as event source) it doesn't make any sense to use KCL within lambda. The event source framework that AWS has built must be using something like KCL to bring lambdas up in response to kinesis events.
It would be super weird to bring up a lambda, initialize KCL in the handler and wait for events during the lambda runtime. Lambda will go down in 5 mins and you'll again do the same thing. Doing this from an EC2 instance makes sense but then you're reimplementing the Lambda - Kinesis integration by yourself. That is what Lambda is, behind the scene.

I do not work for AWS, so obviously I do not know the exact reason why there is no documentation, but here are my thoughts.
First of all, to run the KCL, you need to have the JVM running. This means you can only do this in a lambda using Java because (to my knowledge at this point) there is no way to pull in other sdk, runtimes, etc into a lambda. You chose one runtime at setup. So already they would only be creating documentation for just java lambdas.
Now for the more technical reason. You need to think about what a lambda is doing, and then what the KCL is doing.
Let's start with the Lambda. Lambdas are by design, ephemeral. They can (and will) spin up and go down consistently throughout the day. Of course, you could set up a warming scheme so the lambdas stay up, but they will still have the ephemeral nature to them and this is completely out of your control. In other words, AWS controls when and if a lambda stays active and the exact methods for this is not published. So you can only try to keep things warmed.
What does a KCL do?
Connects to the stream
Enumerates the shards
Coordinates shard associations with other workers (if any)
Instantiates a record processor for every shard it manages
Pulls data records from the stream
Pushes the records to the corresponding record processor
Checkpoints processed records
Balances shard-worker associations when the worker instance count changes
Balances shard-worker associations when shards are split or merged
After reading through this list, lets now go back to the ephemeral nature of lambdas. This means that every single time a lambda comes up or goes down, all of this work needs to happen. This includes a complete rebalance between the shards and workers, pulling data records from the streams, setting checkpoints, etc. You would also need to make sure that you don't ever have more lambdas spun up than the number of shards as they would be worthless (never used in the best case or registered as workers in the worst case potentially causing lost messages. Think what would happen in this scenario with a rebalance.)
OK, technically could you pull it off? If you used Java and you did everything in your power to keep your lambdas warm, it could technically be possible. But back to your question. Why is there no documentation? I never want to say 'never', but generally speaking, Lambdas, with their ephemeral nature, are just not the best use case for the KCL. And if you don't go deep into the weeds on how the KCL works, you'll probably miss something, causing rebalancing issues and potentially causing messages to get lost.
If there is anything inaccurate here please let me know so I can update. Thanks and I hope this helps somebody.


Rate-Limiting / Throttling SQS Consumer in conjunction with Step-Functions

Given following architecture:
The issue with that is that we reach throttling due to the maximum number of concurrent lambda executions (1K per account).
How can this be address or circumvented?
We want to have full control of the rate-limiting.
1) Request concurrency increase.
This would probably be the easiest solution but it would increase the potential workload quite much. It doesn't resolve the root cause nor does it give us any flexibility or room for any custom rate-limiting.
2) Rate Limiting API
This would only address one component, as the API is not the only trigger of the step-functions. Besides, it will have impact to the clients, as they will receive a 4x response.
3) Adding SQS in front of SFN
This will be one of our choices nevertheless, as it is always good to have a queue on top of such number of events. However, a simple queue on top does not provide rate-limiting.
As SQS can't be configured to execute SFN directly a lambda in between would be required, which then triggers then SFN by code. Without any more logic this would not solve the concurrency issues.
4) FIFO-SQS in front of SFN
Something along the line what this blog-post is explaining.
Summary: By using a virtually grouped items we can define the number of items that are being processed. As this solution works quite good for their use-case, I am actually not convinced it would be a good approach for our use-case. Because the SQS-consumer is not the indicator of the workload, as it only triggers the step-functions.
Due to uneven workload this is not optimal as it would be better to have the concurrency distributed by actual workload rather than by chance.
5) Kinesis Data Stream
By using Kinesis data stream with predefined shards and batch-sizes we can implement the logic of rate-limiting. However, this leaves us with the exact same issues described in (3).
6) Provisioned Concurrency
Assuming we have an SQS in front of the SFN, the SQS-consumer can be configured with a fixed provision concurrency. The value could be calculated by the account's maximum allowed concurrency in conjunction with the number of parallel tasks of the step-functions. It looks like we can find a proper value here.
But once the quota is reached, SQS will still retry to send messages. And once max is reached the message will end up in DLQ. This blog-post explains it quite good.
7) EventSourceMapping toogle by CloudWatch Metrics (sort of circuit breaker)
Assuming we have a SQS in front of SFN and a consumer-lambda.
We could create CW-metrics and trigger the execution of a lambda once a metric is hit. The event-lambda could then temporarily disable the event-source-mapping between the SQS and the consumer-lambda. Once the workload of the system eases another event could be send to enable the source-mapping again.
Something like:
However, I wasn't able to determine proper metrics to react on before the throttling kicks in. Additionally, CW-metrics are dealing with 1-minute frames. So the event might happen too late already.
8) ???
Question itself is a nice overview of all the major options. Well done.
You could implement throttling directly with API Gateway. This is the easiest option if you can afford rejecting the client every once in a while.
If you need stream and buffer control, go for Kinesis. You can even put all your events in S3 bucket and trigger lambdas or Step Function when a new event has been stored (more here). Yes, you will ingest events differently and you will need a bridge lambda function to trigger Step Function based on Kinesis events. But this is relatively low implementation effort.

API waiting for a specific record on DynamoDb without pooling

I am inheriting a workflow that has a reasonable amount of data stored in DynamoDb. The data is periodically refreshed by Lambdas calling third parties when needed. The lambdas are triggered by both SQS and DynamoDB streams and go through four or five steps before the data is updated.
I'm given the task to write an API that can forcibly update N items and return their status. The obvious way to do this without reinventing the wheel and honoring DRY is to trigger an event that spawns off a refresh for each item so that the lambdas can do their thing.
The trouble is that I'm not sure the best pub/sub approach to handle being notified that end state of each workflow is met. Do I read from an update/insert stream of dynamodb to see if the records are updated? Do I create some sort of pub/sub model like Reddis or SNS to listen for the end state of each lambda being triggered?
Since I'm writing a REST API, timeouts, if there are failures along the line, arefine. But at the same time I want to make sure I can handle the following.
Be guaranteed that I can be notified that an update occurred for my targets after my call (in the case of multiple forced updates being called at once I only care about the first one to arrive).
Not be bogged down by listening for updates for record updates that are not contextually relevant to the API call in question.
Have an amortized time complexity of 1
In other words, in terms of cap theory i care about C & A but not P (because a 502 isn't that big a deal). But getting the timing wrong or missing a subscription is a problem.
I know I can just listen to a dynamodb event stream but I'm concerned that when things get noisy there will be more irrelevant stuff slowing me down. And I'm not sure if having every single record getting it's own topic is scalable (or how messy that would be).
You can use DynamoDB streams in combination with Lambda Event Filtering so the Lambda function only executes for the relevant change you are interested in. More information is available here:

Turn recovery on after first message

I have a persistent actor which receives many messages. Fist message is CREATE (case class) and next messages are UPDATEs (case classes). So if it receives CREATE then it should not go into persistence to run recovery because the storage is empty for this actor. It's performance wasting from my perspective.
Is there any possibility to do not call recovery for particular input message (the first one which is CREATE), please?
A persistent actor will always have to hit the database, because there is no other way to know whether it having existed before - it could have been created in a previous instance of the application that was stopped or it could have been created on a different node in a cluster.
In general a good pattern for performance is to keep the actor in memory after it has been hit the first time, as that will allow as fast responses as possible. The most common way to do this is using Cluster Sharding (which you can read more about in the docs here:
I have never heard of anyone seeing the hit for an empty persistent actor as a performance problem and I'm not sure it is possible to solve that in a general way, so if you have such a problem and somehow can know the actor was never created before you can not do that with Akka Persistence but would have to build a special solution for that yourself.

What is the purpose of spring cloud stream instanceCount?

In Spring cloud stream, what exactly is the usage of that property
I mean if that value become wrong because at a moment one or more micro services instances are down, how could this affect the behavior of our infrastructure?
instanceCount is used to partition data across different consumers. Having one or more services down should not really impact your producers, that's the job of the broker.
So let's say you have a source that sends data to 3 partitions, so you'd have instanceCount=3 and each instance would have it's own partition assigned via instanceIndex.
Each instance would be consuming data, but if instance 2 crashes, 0,1 would still be reading data from the partitions, and source would still be sending data as usual.
Assuming your platform has some sort of recoverability in place, your crashed instance should come back to life and resume it's operations.
What we still don't support is dynamic allocation of partitions on runtime, we are investigating this as a story for a future release.

Amazon Web Services: Spark Streaming or Lambda

I am looking for some high level guidance on an architecture. I have a provider writing "transactions" to a Kinesis pipe (about 1MM/day). I need to pull those transactions off, one at a time, validating data, hitting other SOAP or Rest services for additional information, applying some business logic, and writing the results to S3.
One approach that has been proposed is use Spark job that runs forever, pulling data and processing it within the Spark environment. The benefits were enumerated as shareable cached data, availability of SQL, and in-house knowledge of Spark.
My thought was to have a series of Lambda functions that would process the data. As I understand it, I can have a Lambda watching the Kinesis pipe for new data. I want to run the pulled data through a bunch of small steps (lambdas), each one doing a single step in the process. This seems like an ideal use of Step Functions. With regards to caches, if any are needed, I thought that Redis on ElastiCache could be used.
Can this be done using a combination of Lambda and Step Functions (using lambdas)? If it can be done, is it the best approach? What other alternatives should I consider?
This can be achieved using a combination of Lambda and Step Functions. As you described, the lambda would monitor the stream and kick off a new execution of a state machine, passing the transaction data to it as an input. You can see more documentation around kinesis with lambda here:
The state machine would then pass the data from one Lambda function to the next where the data will be processed and written to S3. You need to contact AWS for an increase on the default 2 per second StartExecution API limit to support 1MM/day.
Hope this helps!
