Kafka-stream Threading model - apache-kafka-streams

Can it be safe to say that in all and all in Kafka Stream, Tasks represent subscriptions to partitions, while Threads represent consumers ?
That is, if there is 8 partition there will always be 8 Tasks. However the number of consumers is determine by the number of Thread available. Those are spread across Application instance. So one application instance may represent 2 consumer provided that is has 2 Thread associated to it.
For full parallelism, with a topic with 8 partitions we could have 2 application instance with each having 4 Thread, or one application instance with 8 Threads and so on.

Yeah, Number of tasks will be equal to maximum number of partitions in any Kafka stream app
In case there are two topics "A" and "B" each having 8 partitions. So no. of tasks will be max(8,8) = 8. Now each consumer represents a thread. If you set of threads as 2, so 2 threads will distribute the tasks between each other. Each thread will get 4 tasks to process.
For full parallelism, with a topic with 8 partitions we could have 2
application instance with each having 4 Thread, or one application
instance with 8 Threads and so on.
You should use number of threads to the maximum number of partitions always in order to achieve the full parallelism. You can either do it in several application instances or one.
Here is a nicely explained Threading model of Kstream.
https://docs.confluent.io/current/streams/architecture.html#parallelism-model

Related

Dynamically adapt the number of consumer thread to the number of Kafka partitions

I have a Kafka topic with 50 partitions.
My Spring Boot application uses Spring Kafka to read those messages with a #KafkaListener
The number of instances of my application autoscale in my Kubernetes.
By default, it seems that Spring Kafka launch 1 consumer thread per topic.
org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1
So, with a unique instance of the application, one thread is reading the 50 partitions.
With 2 instances, there is a load balancing and each instance listen to 25 partitions. Still with 1 thread per instance.
I know I can set the number of thread using the concurrency parameter on #KafkaListener.
But this is a fixed value.
Is there any way to tell Spring to dynamically adapt the number of consumer threads to the number of partition the client is currently listening?
I think there might be a better way of approaching this.
You should figure out how many records / partitions in parallel one instance of your application can handle optimally, through load / performance tests.
Let's say one instance can handle 10 threads / records in parallel optimally. Now if you scale out your app to 50 instances, in your approach, each instance will get one partition, and each instance will be performing below its capacity, wasting resources.
Now consider the opposite - only one instance is left, and it spawns 50 threads to consume from all partitions in parallel. The app's performance will be severally degraded, it might become unresponsive or even crash.
So, in this hypotethical scenario, what you might want to do is, for example, start with one or two instances handling all partitions with 10 threads each, and have it scale to up to 5 instances if there's consumer lag, so that each partition has a dedicated thread processing it.
Again, the actual figures should be determined through load / performance testing.

Long delays between processing of two consecutive kafka batches (using ruby/karafka consumer)

I am using karafka to read from a topic, and call an external service. Each call to external service takes roughly 300ms. And with 3 consumers (3 pods in the k8s) running in the consumer group, I expect to achieve 10 events per second. I see these loglines , which also confirm the 300ms expectation for processing each individual event.
However, the overall throughput doesn't add up. Each karafka processes seems stuck for a long time between processing two batches of events.
Following instrumentation around the consume method, implies that the consumer code itself is not taking time.
https://github.com/karafka/karafka/blob/master/lib/karafka/backends/inline.rb#L12
INFO Inline processing of topic production.events with 8 messages took 2571 ms
INFO 8 messages on production.events topic delegated to xyz
However, I notice two things:
When I tail logs on the 3 pods, only one of the 3 pods seems to emit logs a time. This does not make sense to me. As all partitions have enough events, and each consumer should be able to consumer in parallel.
Though, the above message roughly shows 321ms (2571/8) per event, in reality I see the logs stalled for a long duration between processing of two batches. I am curious, where is that time going?
======
Edit:
There is some skew in the distribution of data across brokers - as we recently expanded our brokers from 3 to total of 6. However, none of the brokers is under cpu or disk pressure. This is a new cluster, and hardly 4-5% cpu is used at peak times.
Our data is evenly distributed in 3 partitions - I say this as the last offset is roughly the same across each partition.
Partition
FirstOffset
LastOffset
Size
LeaderNode
ReplicaNodes
In-syncReplicaNodes
OfflineReplicaNodes
PreferredLeader
Under-replicated
[0]
2174152
3567554
1393402
5
5,4,3
3,4,5
Yes
No
1
2172222
3566886
1394664
4
4,5,6
4,5,6
Yes
No
[2]
2172110
3564992
1392882
1
1,6,4
1,4,6
Yes
No
However, I do see that one consumer perpetually lags behind the other two.
Following table shows the lag for my consumers. There is one consumer process for each partition:
Partition
First Offset
Last Offset
Consumer Offset
Lag
0
2174152
3566320
2676120
890200
1
2172222
3565605
3124649
440956
2
2172110
3563762
3185587
378175
Combined lag
1709331
Here is a screenshot of the logs from all 3 consumers. You can notice the big difference between time spent in each invocation of consume function and interval between two adjacent invocations. Basically, i want to explain and/or reduce that waiting time. There are 100k+ events in this topic and my dummy karafka applications are able to quickly retrieve them, so kafka brokers are not an issue.
Update after setting max_wait_time to 1 second (previously 5 second)
It seems that the issue is resolved after reducing the wait config. Now the difference between two consecutive logs is roughly equal to the time spent in consume
2021-06-24 13:43:23.425 Inline processing of topic x with 7 messages took 2047 ms
2021-06-24 13:43:27.787 Inline processing of topic x with 11 messages took 3347 ms
2021-06-24 13:43:31.144 Inline processing of topic x with 11 messages took 3344 ms
2021-06-24 13:43:34.207 Inline processing of topic x with 10 messages took 3049 ms
2021-06-24 13:43:37.606 Inline processing of topic x with 11 messages took 3388 ms
There are a couple of problems you may be facing. It is a bit of a guessing from my side without more details but let's give it a shot.
From the Kafka perspective
Are you sure you're evenly distributing data across partitions? Maybe it is eating up things from one partition?
What you wrote here:
INFO Inline processing of topic production.events with 8 messages took 2571 ms
This indicates that there was a batch of 8 processed altogether by a single consumer. This could indicate that the data is not distributed evenly.
From the performance perspective
There are two performance properties that can affect your understanding of how Karafka operates: throughput and latency.
Throughput is the number of messages that can be processed in a given time
Latency is the time it takes a message from the moment it was produced to it been processed.
As far as I understand, all messages are being produced. You could try playing with the Karafka settings, in particular this one: https://github.com/karafka/karafka/blob/83a9a5ba417317495556c3ebb4b53f1308c80fe0/lib/karafka/setup/config.rb#L114
From the logger perspective
Logger that is being used flushes data from time to time, so you won't see it immediately but after a bit of time. You can validate this by looking at the log time.

Queue that that serves producers evenly

So I have multiple producers who generate some tasks
and there is consumer (or multiple) who executes these tasks (f.e. count number of lines in a file)
My problem is that consumers should be treated evenly, meaning if one producer generate 10 tasks and other only 2 - consumer should first do 1 task from producer1, then task from producer2, then producer1 and then the rest
Basically for each producer system must guaranty that created tasks will not wait for large chunk of tasks from other producers
Can you help me with algorithm or ready to use broker/queue software that can achieve this goal ?

Stream thread calculation

I'm using Stream DSL. I have three source topic with partition 17, 100, 40.
I will be running three instances and 2 standby instances.
How can I calculate how many stream threads I will need so that each thread gets exactly one task or highest parallelism is achieved?
This depends on the structure of your application. You can run the application with a single thread and observe the number of created tasks. The number of task is the maximum number of threads you can use.
The task that are created are logged or you obtain them via KafkaStream#localThreadMetadata().
I will try to discuss an approach here in short
You are asking for maximum parallelism
This can be achieved by separating out each topic in a separate
topology
Each topology having separate thread count (one thread per
consumer per topic) - 17/3, 100/3, 40/3 - topic partition/instances
This will make sure that each topology gets separate thread count and
separate parallelism
each topology will act as separate consumer
group

Difference between ThreadCount and StepCount in TIBCO BW Engine

Can anyone explain me the difference between StepCount and ThreadCount property of TIBCO BW Engine . I had tried to understand through TIBCO docs but unable to understand.
So, Please if anyone can explain me this would be great .
Thanks in advance.
The ThreadCount property defines the amount of threads (java threads) which execute all you processes. So with the default value of 8 threads you can run 8 job simultaneously.
The StepCount on the other hand defines the amount of activities executed before a thread can context switch into another job.
Sample scenario:
a process with 5 activities
ThreadCount is 2
StepCount is 4
If there are 3 incoming requests, the first two requests spawn 1 job each. The third job is spawned, but gets paused due to insufficient threads.
After the first job completes the forth activity, the thread is freed and can be assigned to another paused job.
So the first job will be paused and the third job starts to execute.
When the second job reaches the forth activity, this thread will be freed and is available for re-assignement. So the second jobs pauses and the first resumes.
After the third job reaches its forth activity, the thread is freed again and resumes job number one (and completes this one). Afterwards Job number 3 get completed.
All of this is a theoretical scenario. What you usually need is to set the amount of concurrent jobs (so ThreadCount). The StepCount is close to irrelevant, because the engine will take care of the pooling and mapping of physical threads to virtual BW jobs.
ThreadCount
The Thread Count concept states the number of thread a TIBCO BW engine can allocate. The default number of threads is eight.
The number of threads means the number of jobs that can be executed simultaneously in the engine. So the maximum number of jobs that can concurrently in the engine are limited to number threads, that is eight. This property specifies the size of the job thread pool, and is applied to all the AppNodes in the AppSpace if set at the AppSpace level.
Threads carries out a limited number of tasks or activities uninterrupted and then yield to the next job that is ready. Starting with a default value of eight threads the thread count can be adjusted to optimum value and now it can be doubled until CPU maximum level is reached.
StepCount
The StepCount concept states the number of activities that are accomplished by an engine thread, without any disruption, before yielding the engine thread to another job that is ready in the job pool. The default value of the step counter is -1. When the value is set to -1, the engine can decide the required StepCount value. A low StepCount value may humiliate engine performance due to frequent thread exchange depending on the situation.

Resources