AWS Kinesis - Avoiding stalled shards

AWS Kinesis - Avoiding stalled shards - aws-lambda

I am using Kinesis to process events in a micro-service architecture. Events are partitioned at a client project level to ensure all events related to the same project occur in the correct sequence. Currently if there is an error processing one of the events, this can cause the events from other partitions to also become blocked. I had hoped that by increasing the parallelisation factor and bisecting the batch on error, this would allow the other lambda processors to continue processing events from other partitions. This is largely the case, but there are still times when multiple partitions become stuck, presumably because kinesis is sometimes deciding to always allocate several partitions to the same lambda processor.
My question is, is there any way to avoid this in kinesis, or will I need to start making use of a dead letter queue, and removing events that are repeatedly failing? Downside to this is that I don't really want to continue processing further events for the same partition once there is a failure, as the state of the micro-service is likely to be corrupt at this point, and I would rather out team manually address whatever issue has occurred before continuing to play events from the failed partition.

Related

Many producers single consumer fair job scheduling in Golang

I have multiple producers that stage objects (jobs) for processing, and a single consumer that takes objects one-by-one. I need to design a sort of a scheduler in golang.
Scheduling is asynchroneous, i.e. each producer works in a separate gorourine.
Scheduler interface is "good" in terms of golang-way (I'm new in Go).
A producer can remove or replace its staged object (if not yet consumed) with zero or minimal lost in the position in a queue. If a producer misses its slot because it canceled and then restaged an object, it still keeps a privilege to stage as soon as possible early till the end of the particular round.
"Fair" scheduling between producers.
Customizable multi-level weighting/prioritization
I'd like some hints and examples on right design of such a scheduler.
I feel I need every producer to wait for a token in a channel, then write (or don't write) an object to a shared consumer channel, then dispose the token, so it is routed to a next producer. Still, I'm not shure this is the best approach. Besides, it takes 3 sequential syncrhoneous operations per producer, so I'm afraid I'll have performance pitfalls because of the token traveling too slowly between producers. Also, 3 steps for one operation is probably not a good golang-way.

Looking for a real time streaming solution

We have a spark-streaming micro batch process which consumes data from kafka topic with 20 partitions. The data in the partitions are independent and can be processed independently. The current problem is the micro batch waits for processing to be complete in all 20 partitions before starting next micro batch. So if one partition completes processing in 10 seconds and other partition takes 2 mins then the first partition will have to wait for 110 seconds before consuming next offset.
I am looking for a streaming solution where we can process the 20 partitions independently without having to wait for other partition to complete a process. The steaming solution should consume data from each partition and progress offsets at its own rate independent of other partitions.
Anyone have suggestion on which streaming architecture would allow to achieve my goal?

Any of Flink (AFAIK), KStreams, and Akka Streams will be able to progress through the partitions independently: none of them does Spark-style batching unless you explicitly opt in.
Flink is similar to Spark in that it has a job server model; KStreams and Akka are both libraries that you just integrate into your project and deploy like any other JVM application (e.g. you can build a container and run on a scheduler like kubernetes). I personally prefer the latter approach: it generally means less infrastructure to worry about and less of an impedance mismatch to integrate with observability tooling used elsewhere.
Flink is an especially good choice when it comes to time-window based processing and joins.
KStreams fundamentally models everything as a transformation from one kafka topic to another: the topic topology is managed by KStreams, but there can be some gotchas there (especially if you're dealing with anything time-seriesy).
Akka is the most general and (in some senses) the least opinionated of the toolkits: you will have to make more decisions with less handholding (I'm saying this as someone who could probably fairly be called an Akka cheerleader); as a pure stream processing library, it may not be the ideal choice (though in terms of resource consumption, being able to more explicitly manage backpressure (basically, what happens when data comes in faster than it can be processed) may make it more efficient than the alternatives). I'd probably tend to only choose it if you were going to also take advantage of cluster sharded (and almost certainly event-sourced) actors: the benefit of doing that is that you can completely decouple your processing parallelism from the number of input Kafka partitions (e.g. you may be able to deploy 40 instances of processing and have each working on half of the data from Kafka).

Achieve concurrency in Kafka consumers

We are working on parallelising our Kafka consumer to process more number of records to handle the Peak load. One way, we are already doing is through spinning up as many consumers as many partitions within the same consumer group.
Our Consumer deals with making an API call which is synchronous as of now. We felt making this API call asynchronous will make our consumer handle more load. Hence, we are trying to making the API call Asynchronous and in its response we are increasing the offset. However we are seeing an issue with this:
By making the API call Asynchronous, we may get the response for the last record first and none of the previous record's API calls haven't initiated or done by then. If we commit the offset as soon as we receive the response of the last record, the offset would get changed to the last record. In the meantime if the consumer restarts or partition rebalances, we will not receive any record before the last record we committed the offset as. With this, we will miss out the unprocessed records.
As of now we already have 25 partitions. We are looking forward to understand if someone have achieved parallelism without increasing the partitions or increasing the partitions is the only way to achieve parallelism (to avoid offset issues).

First, you need to decouple (if only at first) the reading of the messages from the processing of these messages. Next look at how many concurrent calls you can make to your API as it doesn't make any sense to call it more frequently than the server can handle, asynchronously or not. If the number of concurrent API calls is roughly equal to the number of partitions you have in your topic, then it doesn't make sense to call the API asynchronously.
If the number of partitions is significantly less than the max number of possible concurrent API calls then you have a few choices. You could try to make the max number of concurrent API calls with fewer threads (one per consumer) by calling the API's asynchronously as you suggest, or you can create more threads and make your calls synchronously. Of course, then you get into the problem of how can your consumers hand their work off to a greater number of shared threads, but that's exactly what streaming execution platforms like Flink or Storm do for you. Streaming platforms (like Flink) that offer checkpoint processing can also handle your problem of how to handle offset commits when messages are processed out of order. You could roll your own checkpoint processing and roll your own shared thread management, but you'd have to really want to avoid using a streaming execution platform.
Finally, you might have more consumers than max possible concurrent API calls, but then I'd suggest that you just have fewer consumers and share partitions, not API calling threads.
And, of course, you can always change the number of your topic partitions to make your preferred option above more feasible.
Either way, to answer your specific question you want to look at how Flink does checkpoint processing with Kafka offset commits. To oversimplify (because I don't think you want to roll your own), the kafka consumers have to remember not only the offsets they just committed, but they have to hold on to the previous committed offsets, and that defines a block of messages flowing though your application. Either that block of messages in its entirety is processed all the way through or you need to rollback the processing state of each thread to the point where the last message in the previous block was processed. Again, that's a major oversimplification, but that's kinda how it's done.

You have to look at kafka batch processing. In a nutshell: you can setup huge batch.size with a little number (or even single) of partitions. As far, as whole batch of messages consumed at consumer side (i.e. in ram memory) - you can parallelize this messages in any way you want.
I would really like to share links, but their number rolls over the web hole.
UPDATE
In terms of committing offsets - you can do this for whole batch.
In general, kafka doesn't achieve target performance requirements by abusing partitions number, but rather relying on batch processing.
I already saw a lot of projects, suffering from partitions scaling (you may see issues later, during rebalancing for example). The rule of thumb - look at every available batch setting first.

Spring Cloud Stream: Proper handling for StreamListener when it takes a long time to process the message

When StreamListener is taking a long time (longer than max.poll.interval.ms) to process a message, thus that particular consumer is occupied and other new messages will be assigned to other partitions. After the time is greater than max.poll.interval.ms, rebalance happened and the same situation will happened to another consumer. So this message will circulate around all the partitions and keep on hogging the resources.
However, this situation is not happening very often, only a few messages somehow is taking such a long time to process and it's uncontrollable.
Can we commit the offset and throw it to DLQ after a few times of rebalancing? If yes, how can we do that? If no, what's the proper handling for this kind of situation?

Increasing max.poll.interval.ms will have no impact on performance (except it will take longer to detect a consumer that is really dead).
Going through a rebalance each time you process this "bad" record is much more damaging to performance.
You can, however, do what you want with a custom SeekToCurrentErrorHandler together with a recoverer such as the DeadLetterPublishingRecoverer. You would also need a rebalance listener to count the rebalances and some mechanism to share the state from the error handler across instances (the standard one only keeps state in memory).
Quite complicated, I think.

Storm Trident and Spark Streaming: distributed batch locking

After doing lots of reading and building a POC we are still unsure as to whether Storm Trident or Spark Streaming can handle our use case:
We have an inbound stream of sensor data for millions of devices (which have unique identifiers).
We need to perform aggregation of this stream on a per device level. The aggregation will read data that has already been processed (and persisted) in previous batches.
Key point: When we process data for a particular device we need to ensure that no other processes are processing data for that particular device. This is because the outcome of our processing will affect the downstream processing for that device. Effectively we need a distributed lock.
In addition the event device data needs to be processed in the order that the events occurred.
Essentially we can’t have two batches for the same device being processed at the same time.
Can trident/spark streaming handle our use case?
Any advice appreciated.

Since you have unique id's, can you divide them up? Simply divide the id by 10, for example, and depending on the remainder, send them to different processing boxes? This should also take care making sure each device's events are processed in order, as they will be sent to the same box. I believe Storm/Trident allows you to guarantee in-order processing. Not sure about Spark, but I would be surprised if they don't.
Pretty awesome problem to solve, I have to say though.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio