I have built an application which consists of one publisher, several queues and several consumers for each queue. Consumers on a queue (including the queue) share the channel. Other queues use a different channel. I am observing that for different queues, tasks are being worked on parallel but for a specific queue this is not happening. If I publish several messages at once to a specific queue, only one consumer works while the other ones wait until the work is ended. What should I do in order for consumers to work on parallel?
workers.each do |worker|
worker.on_delivery() do |delivery_info, metadata, payload|
perform_work(delivery_info, metadata, payload)
end
queue.subscribe_with(worker)
end
This is how I register all the consumers for a specific queue. The operation perform_work(_,_,_) is rather expensive and takes several seconds to complete.
RabbitMQ works off the back of the concept of channels, and channels are generally intended to not be shared between threads. Moreover, channels by default have a work thread pool size of one. A channel is an analog to a session.
In your case, you have multiple consumers sharing a queue and channel, and performing a long-duration job within the event handler for the channel.
There are two ways to work around this:
Allocate a channel per consumer, or
Set the work pool size of the channel on creation See this documentation
I would advocate 1 channel per consumer since it has a lower chance of causing unintended side-effects.
Related
I have been exploring EventStoreDB and trying to understand more about the ordering of messages on the consumer side. Read about persistent subscriptions and also the Pinned consumer strategy here.
I have a scenario wherein inventory updates get pushed to eventstore and different streams get created by the different unique inventoryIds in the inventory event.
We have multiple consumers with the same consumerGroup name to read these inventory events. We are using Pinned Persistent Subscription with ResolveLinkTos enabled.
My question:
Will every message from a particular stream always go to the same consumer instance of the consumerGroup?
If the answer to the above question is yes, will every message from that particular stream reach the particular consumer instance in the same order as the events were ingested?
The documentation has a warning that ordered message processing using persistent subscriptions is not guaranteed. Any strategy delivers messages with the best-effort level of ordering guarantees, if applicable.
There are a few reasons for this, some of those are:
Spreading out messages across consumer groups lead to a non-linearised checkpoint commit. It means that some messages can be processed before other messages.
Persistent subscriptions attempt to buffer messages, but when a timeout happens on the client side, the whole buffer is redelivered, which can eventually break the processing order
Built-in retry policies essentially can break the message order at any time
Most event log-based brokers, if not all, don't even attempt to guarantee ordered message delivery across multiple consumers. I often hear "but Kafka does it", ignoring the fact that Kafka delivers messages from one partition to at most one consumer in a group. There's no load balancing of one partition between multiple consumers due to exactly the same issue. That being said, EventStoreDB is still not a broker, but a database for events.
So, here are the answers:
Will every message from a particular stream always go to the same consumer instance of the consumer group?
No. It might work most of the time, but it will eventually break.
will every message from that particular stream reach the particular consumer instance in the same order as the events were ingested?
Most of the time, yes, but again, if a message is being retried, you might get the next message before the previous one is Acked.
Overall, load-balancing ordered processing of messages, which aren't pre-partitioned on the server is not an easy task. At most, you get messages re-delivered if the checkpoint fails to persist at some point, and the consumers restart.
I'm using spring-kafka to consume messages from two Kafka topics, which sends same message format as below.
#KafkaListener(topics = {"topic_country1", "topic_country2"}, groupId = KafkaUtils.MESSAGE_GROUP)
public void onCustomerMessage(String message, Acknowledgment ack) throws Exception {
log.info("Message : {} is received", message);
ack.acknowledge();
}
Can KafkaListener allocate the number of consumer threads according to the number of topics that it listens by it's own and parallel process messages in two topics? Or it doesn't support parallel processing and messages have to wait in the topic till one message gets processed?
In case if the number of messages in the topic is higher, I need to autoscale my micro-service to start new instances (till the number of partitions). What are the parameters (CPU, memory) I can depend on to find out the number of messages in the topics is higher from KafkaListener point of view? (i.e In an API I can auto-scale the service by monitoring the HTTP latency)
You can set the concurrency property to run more threads; but each partition can only be processed by one thread. To increase concurrency you must increase the number of partitions in each topic. When listening to multiple topics in the same listener, if those topics only have one partition, you may not get the concurrency you desire unless you change the kafka consumer partition assignor.
See https://docs.spring.io/spring-kafka/docs/2.5.0.RELEASE/reference/html/#using-ConcurrentMessageListenerContainer
When listening to multiple topics, the default partition distribution may not be what you expect. For example, if you have three topics with five partitions each and you want to use concurrency=15, you see only five active consumers, each assigned one partition from each topic, with the other 10 consumers being idle. This is because the default Kafka PartitionAssignor is the RangeAssignor (see its Javadoc). For this scenario, you may want to consider using the RoundRobinAssignor instead, which distributes the partitions across all of the consumers. Then, each consumer is assigned one topic or partition. ...
If you want to scale horizontal beyond the partition count and dynamically - consider using something like Parallel Consumer (PC). It can be used within a Spring context.
By using PC, you can processing all your keys in parallel, regardless of how long it takes to process, and you can be as concurrent as you wish - and this can scale dynamically.
PC directly solves for this, by sub partitioning the input partitions by key and processing each key in parallel.
It also tracks per record acknowledgement. Check out Parallel Consumer on GitHub (it's open source BTW, and I'm the author).
I have ActiveMQ Artemis. Producer generates 1000 messages and consumer one by one processing their. Now I want to process this queue with help of two consumers. I start new consumer and new messages are distributed between two runned consumers. My question: is it posible redistribute old messages between all started consumers?
Once messages are dispatched by the broker to a consumer then the broker can't simply recall them as the consumer may be processing them. It's up to the consumer to cancel the messages back to the queue (e.g. by closing its connection/session).
My recommendation would be to tune your consumerWindowSize (set on the client's URL) so that a suitable number of messages are dispatched to your consumers. The default consumerWindowSize is 1M (1024 * 1024 bytes). A smaller consumerWindowSize would mean that more clients would be able to receive messages concurrently, but it would also mean that clients would need to conduct more network round-trips to tell the broker to dispatch more messages when they run low. You'll need to run benchmarks to find the right consumerWindowSize value for your use-case and performance needs.
When using DirectMessageListenerContainer with consumersPerQueue property of 25, I noticed 25 rabbit channels get created per listener container's subscribed queue. The rabbit channel count quickly grows out of hand in our setup, as more queues are added to the listener container dynamically. We had to increase broker channel limit to accommodate the channel growth.
What is the relationship between channels and consumers in the DirectMessageListenerContainer. From my observations it appears to be 1 channel per consumer.
Does DirectMessageListenerContainer offer any channel pooling/recycling/rebalancing to keep channel growth under control. Specifically for queues that are mostly idle.
Does the simple SimpleMessageListenerContainer handle channels pooling differently, since it can dynamically resize the consumer count.
The DMLC uses a separate channel for each consumer.
No.
The SMLC uses one channel per concurrentConsumers; since 2.0, each channel is used for multiple consumers (when there is more than one queue listened to).
However dynamically adding or removing queues is much less efficient with the SMLC because the consumer(s) are canceled and re-created when changes are made.
I'm trying to configure a queue that is aware of the events that are being processed.
Questions
Does this make sense? :)
Is it possible to configure/customize ActiveMQ?
Are there any other library that can be "easily" configured to handle such cases? Kafka?
Problem
The queue contains events. Each event is associated with an object. A consumer takes the event from the queue and performs a task. Each event should be taken only by exactly one consumer.
Constraints
Events for the same object cannot be processed concurrently.
But events for different objects should be processed in parallel.
Example
The queue is
ObjectA-Event1
ObjectA-Event2
ObjectB-Event1
ObjectC-Event1
The Consumer1 should receive ObjectA-Event1 from the queue. The Consumer2 should receive ObjectB-Event1 from the queue and not the ObjectA-Event2. The ObjectA-Event2 should be available for consumers only when the first consumer completes the task for the ObjectA-Event1.
It looks to me like you should use message groups. Messages for each object should be in the same group so that they are received by the same consumer and processed serially. Messages in different groups are free to be processed by different consumers.