Dynamically adapt the number of consumer thread to the number of Kafka partitions - spring

I have a Kafka topic with 50 partitions.
My Spring Boot application uses Spring Kafka to read those messages with a #KafkaListener
The number of instances of my application autoscale in my Kubernetes.
By default, it seems that Spring Kafka launch 1 consumer thread per topic.
org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1
So, with a unique instance of the application, one thread is reading the 50 partitions.
With 2 instances, there is a load balancing and each instance listen to 25 partitions. Still with 1 thread per instance.
I know I can set the number of thread using the concurrency parameter on #KafkaListener.
But this is a fixed value.
Is there any way to tell Spring to dynamically adapt the number of consumer threads to the number of partition the client is currently listening?

I think there might be a better way of approaching this.
You should figure out how many records / partitions in parallel one instance of your application can handle optimally, through load / performance tests.
Let's say one instance can handle 10 threads / records in parallel optimally. Now if you scale out your app to 50 instances, in your approach, each instance will get one partition, and each instance will be performing below its capacity, wasting resources.
Now consider the opposite - only one instance is left, and it spawns 50 threads to consume from all partitions in parallel. The app's performance will be severally degraded, it might become unresponsive or even crash.
So, in this hypotethical scenario, what you might want to do is, for example, start with one or two instances handling all partitions with 10 threads each, and have it scale to up to 5 instances if there's consumer lag, so that each partition has a dedicated thread processing it.
Again, the actual figures should be determined through load / performance testing.

Related

Looking for a real time streaming solution

We have a spark-streaming micro batch process which consumes data from kafka topic with 20 partitions. The data in the partitions are independent and can be processed independently. The current problem is the micro batch waits for processing to be complete in all 20 partitions before starting next micro batch. So if one partition completes processing in 10 seconds and other partition takes 2 mins then the first partition will have to wait for 110 seconds before consuming next offset.
I am looking for a streaming solution where we can process the 20 partitions independently without having to wait for other partition to complete a process. The steaming solution should consume data from each partition and progress offsets at its own rate independent of other partitions.
Anyone have suggestion on which streaming architecture would allow to achieve my goal?
Any of Flink (AFAIK), KStreams, and Akka Streams will be able to progress through the partitions independently: none of them does Spark-style batching unless you explicitly opt in.
Flink is similar to Spark in that it has a job server model; KStreams and Akka are both libraries that you just integrate into your project and deploy like any other JVM application (e.g. you can build a container and run on a scheduler like kubernetes). I personally prefer the latter approach: it generally means less infrastructure to worry about and less of an impedance mismatch to integrate with observability tooling used elsewhere.
Flink is an especially good choice when it comes to time-window based processing and joins.
KStreams fundamentally models everything as a transformation from one kafka topic to another: the topic topology is managed by KStreams, but there can be some gotchas there (especially if you're dealing with anything time-seriesy).
Akka is the most general and (in some senses) the least opinionated of the toolkits: you will have to make more decisions with less handholding (I'm saying this as someone who could probably fairly be called an Akka cheerleader); as a pure stream processing library, it may not be the ideal choice (though in terms of resource consumption, being able to more explicitly manage backpressure (basically, what happens when data comes in faster than it can be processed) may make it more efficient than the alternatives). I'd probably tend to only choose it if you were going to also take advantage of cluster sharded (and almost certainly event-sourced) actors: the benefit of doing that is that you can completely decouple your processing parallelism from the number of input Kafka partitions (e.g. you may be able to deploy 40 instances of processing and have each working on half of the data from Kafka).

Kafka-stream Threading model

Can it be safe to say that in all and all in Kafka Stream, Tasks represent subscriptions to partitions, while Threads represent consumers ?
That is, if there is 8 partition there will always be 8 Tasks. However the number of consumers is determine by the number of Thread available. Those are spread across Application instance. So one application instance may represent 2 consumer provided that is has 2 Thread associated to it.
For full parallelism, with a topic with 8 partitions we could have 2 application instance with each having 4 Thread, or one application instance with 8 Threads and so on.
Yeah, Number of tasks will be equal to maximum number of partitions in any Kafka stream app
In case there are two topics "A" and "B" each having 8 partitions. So no. of tasks will be max(8,8) = 8. Now each consumer represents a thread. If you set of threads as 2, so 2 threads will distribute the tasks between each other. Each thread will get 4 tasks to process.
For full parallelism, with a topic with 8 partitions we could have 2
application instance with each having 4 Thread, or one application
instance with 8 Threads and so on.
You should use number of threads to the maximum number of partitions always in order to achieve the full parallelism. You can either do it in several application instances or one.
Here is a nicely explained Threading model of Kstream.
https://docs.confluent.io/current/streams/architecture.html#parallelism-model

Stream thread calculation

I'm using Stream DSL. I have three source topic with partition 17, 100, 40.
I will be running three instances and 2 standby instances.
How can I calculate how many stream threads I will need so that each thread gets exactly one task or highest parallelism is achieved?
This depends on the structure of your application. You can run the application with a single thread and observe the number of created tasks. The number of task is the maximum number of threads you can use.
The task that are created are logged or you obtain them via KafkaStream#localThreadMetadata().
I will try to discuss an approach here in short
You are asking for maximum parallelism
This can be achieved by separating out each topic in a separate
topology
Each topology having separate thread count (one thread per
consumer per topic) - 17/3, 100/3, 40/3 - topic partition/instances
This will make sure that each topology gets separate thread count and
separate parallelism
each topology will act as separate consumer
group

Kafka work queue with a dynamic number of parallel consumers

I want to use Kafka to "divide the work". I want to publish instances of work to a topic, and run a cloud of identical consumers to process them. As each consumer finishes its work, it will pluck the next work from the topic. Each work should only be processed once by one consumer. Processing work is expensive, so I will need many consumers running on many machines to keep up. I want the number of consumers to grow and shrink as needed (I plan to use Kubernetes for this).
I found a pattern where a unique partition is created for each consumer. This "divides the work", but the number of partitions is set when the topic is created. Furthermore, the topic must be created on the command line e.g.
bin/kafka-topics.sh --zookeeper localhost:2181 --partitions 3 --topic divide-topic --create --replication-factor 1
...
for n in range(0,3):
consumer = KafkaConsumer(
bootstrap_servers=['localhost:9092'])
partition = TopicPartition('divide-topic',n)
consumer.assign([partition])
...
I could create a unique topic for each consumer, and write my own code to assign work to those topic. That seems gross, and I still have to create topics via the command line.
A work queue with a dynamic number of parallel consumers is a common architecture. I can't be the first to need this. What is the right way to do it with Kafka?
The pattern you found is accurate. Note that topics can also be created using the Kafka Admin API and partitions can also be added once a topic has been created (with some gotchas).
In Kafka, the way to divide work and allow scaling is to use partitions. This is because in a consumer group, each partition is consumed by a single consumer at any time.
For example, you can have a topic with 50 partitions and a consumer group subscribed to this topic:
When the throughput is low, you can have only a few consumers in the group and they should be able to handle the traffic.
When the throughput increases, you can add consumers, up to the number of partitions (50 in this example), to pick up some of the work.
In this scenario, 50 consumers is the limit in terms of scaling. Consumers expose a number of metrics (like lag) allowing you to decide if you have enough of them at any time
Thank you Mickael for pointing me in the correct direction.
https://www.safaribooksonline.com/library/view/kafka-the-definitive/9781491936153/ch04.html
Kafka consumers are typically part of a consumer group. When multiple
consumers are subscribed to a topic and belong to the same consumer group,
each consumer in the group will receive messages from a different subset of
the partitions in the topic.
https://dzone.com/articles/dont-use-apache-kafka-consumer-groups-the-wrong-wa,
Having consumers as part of the same consumer group means providing the
“competing consumers” pattern with whom the messages from topic partitions
are spread across the members of the group. Each consumer receives messages
from one or more partitions (“automatically” assigned to it) and the same
messages won’t be received by the other consumers (assigned to different
partitions). In this way, we can scale the number of the consumers up to the
number of the partitions (having one consumer reading only one partition); in
this case, a new consumer joining the group will be in an idle state without
being assigned to any partition.
Example code for dividing the work among 3 consumers, up to a maximum of 100:
bin/kafka-topics.sh --partitions 100 --topic divide-topic --create --replication-factor 1 --zookeeper localhost:2181
...
for n in range(0,3):
consumer = KafkaConsumer(group_id='some-constant-group',
bootstrap_servers=['localhost:9092'])
...
I think, you are on right path -
Here are some steps involved -
Create Kafka Topic and create the required partitions. The number of partitions is the unit of parallelism. In other words you run these many number of consumers to process the work.
You can increase the partitions if the scaling requirements increased. BUT it comes with caveats like repartitioning. Please read the kafka documentation about the new partition addition.
Define a Kafka Consumer group for the consumer. Kafka will assign partitions to available consumers in the consumer group and automatically rebalance. If the consumer is added/removed, kafka does the rebalancing automatically.
If the consumers are packaged as docker container, then using kubernetes helps in managing the containers especially for multi-node environment. Other tools include docker-swarm, openshift, Mesos etc.
Kafka offers the ordering for partitions.
Check out the delivery guarantees - At-least once, Exactly once based on your use cases.
Alternatively, you can use Kafka Streams APIS. Kafka Streams is a client library for processing and analyzing data stored in Kafka. It builds upon important stream processing concepts such as properly distinguishing between event time and processing time, windowing support, and simple yet efficient management and real-time querying of application state.
Since you have a slow consumer use case, it's a great fit for Confluent's Parallel Consumer (PC). PC directly solves for this, by sub partitioning the input partitions by key and processing each key in parallel. So processing can take as long as you like. It also tracks per record acknowledgement. Check Parallel Consumer GitHub (it's open source BTW, and I'm the author).

multiple consumers per kinesis shard

I read you can have multiple consumer apps per kinesis stream.
http://docs.aws.amazon.com/kinesis/latest/dev/developing-consumers-with-kcl.html
however, I heard you can only have on consumer per shard. Is this true? I don't find any documentation to support this, and can't imagine how that could be if multiple consumers are reading from the same stream. Certainly, it doesn't mean the producer needs to repeat content in different shards for different consumers.
Kinesis Client Library starts threads in the background, each listens to 1 shard in the stream. You cannot connect to a shard over multiple threads, that is by-design.
http://docs.aws.amazon.com/kinesis/latest/dev/kinesis-record-processor-scaling.html
For example, if your application is running on one EC2 instance, and
is processing one Amazon Kinesis stream that has four shards. This one
instance has one KCL worker and four record processors (one record
processor for every shard). These four record processors run in
parallel within the same process.
In the explanation above, the term "KCL worker" refers to a Kinesis consumer application. Not the threads.
But below, the same "KCL worker" term refers to a "Worker" thread in the application; which is a runnable.
Typically, when you use the KCL,
you should ensure that the number of instances does not exceed the
number of shards (except for failure standby purposes). Each shard is
processed by exactly one KCL worker and has exactly one corresponding
record processor, so you never need multiple instances to process one
shard.
See the Worker.java class in KCL source.
Late to the party, but the answer is that you can have multiple consumers per kinesis shard. A KCL instance will only start one process per shard, but you can have another KCL instance consuming the same stream (and shard), assuming the second one has permission.
There are limits, though, as laid out in the docs, including:
Each shard can support up to 5 transactions per second for reads, up to a maximum total data read rate of 2 MB per second.
If you want a stream with multiple consumers where each message will be processed once, you're probably better off with something like Amazon Simple Queue Service.
to keep it simple, you can have multiple/different lambda functions get triggered on kinesis data. this way your both the lambdas are going to get all the data from the kinesis. The downside is that now you will have to increase the throughput at the kinesis level which is going to pricey. Use SQS instead for your use case.

Resources