why does SwiftMQ show flow control behaviour even when flow control is disabled? - jms

I'm trying to benchmark the performance of swiftMQ 5.0.0 with producer and consumer application I wrote so that I can vary the number of producer threads and consumer threads. I have added a delay on the consumer to simulate the time taken to process a message. I have run a test by setting the producer threads fixed at 2, and by varying the number of consumer threads from 20 to 92 in steps of 4.
Initially, the producer rate starts high and consumer rate is low (as expected due to the delay added and less number of consumer threads).
As the number of consumer threads increase, the producer rate drops and consumer rate increases and they become equal at around 48 consumer threads.
After that, as the number of consumer threads further increase, both producer and consumer rates keep increasing linearly. I am wandering what the reason for this behavior is?
see this image for the
result graph
Notes:
I have disabled flow control at queue level by setting flowcontrol-start-queuesize="-1" .
I also have not set a value to inbound-flow-control-enabled in routing swiftlet. (I believe it
defaults to false)
Any help on this matter is much appreciated. TIA

Related

How to configure Spring SimpleMessageListenerContainer receiveTimeout in order to scale up to a reasonable number of consumers

Use case
A backend consuming messages at various rate and inserting the messages in a DB.
Today in production my SimpleMessageListenerContainer scales to maxConcurrentConsumers even if is not necessary to handle the traffic rate.
Problem
I try to find the proper configuration of spring SimpleMessageListenerContainer in order to let spring scale up/down the number of consumers to the adequate number in order to handle the incoming traffic.
With a a fix injection rate, on a single node rabbitmq I have noticed that the scaling process stabilize at
numberOfConsumers = (injectionRate * receiveTimeoutInMilliseconds) / 1000
For example :
injection rate : 100 msg/s
container.setReceiveTimeout(100L); // 100 ms
--> consumers 11
--> Consumer capacity 100%
injection rate : 100 msg/s
container.setReceiveTimeout(1000L); // 1 s - default
--> consumers 101
--> Consumer capacity 100%
Knowing that more consumers means more threads and more amqp channels
I am wondering why the scaling algorithm is not linked to the consumerCapacity metric and why is the default receive timeout set to 1 second ?
See the documentation https://docs.spring.io/spring-amqp/docs/current/reference/html/#listener-concurrency
In addition, a new property called maxConcurrentConsumers has been added and the container dynamically adjusts the concurrency based on workload. This works in conjunction with four additional properties: consecutiveActiveTrigger, startConsumerMinInterval, consecutiveIdleTrigger, and stopConsumerMinInterval. With the default settings, the algorithm to increase consumers works as follows:
If the maxConcurrentConsumers has not been reached and an existing consumer is active for ten consecutive cycles AND at least 10 seconds has elapsed since the last consumer was started, a new consumer is started. A consumer is considered active if it received at least one message in batchSize * receiveTimeout milliseconds.
With the default settings, the algorithm to decrease consumers works as follows:
If there are more than concurrentConsumers running and a consumer detects ten consecutive timeouts (idle) AND the last consumer was stopped at least 60 seconds ago, a consumer is stopped. The timeout depends on the receiveTimeout and the batchSize properties. A consumer is considered idle if it receives no messages in batchSize * receiveTimeout milliseconds. So, with the default timeout (one second) and a batchSize of four, stopping a consumer is considered after 40 seconds of idle time (four timeouts correspond to one idle detection).
Practically, consumers can be stopped only if the whole container is idle for some time. This is because the broker shares its work across all the active consumers.
So, when you reduce the receiveTimeout you would need a corresponding increase in the idle/active triggers.
The default is 1 second to provide a reasonable compromise between spinning an idle consumer while retaining responsive behavior to a container stop() operation (idle consumers are blocked for the timeout). Increasing it will cause a less responsive container (for stop()).
It is generally unnecessary to set it lower than 1 second.

My topology's processing rate is about 2500 messages per seconds, but Complete latency is about 7ms. Shouldn't it be equal 1000 / 2500 = 0.4ms?

My topology reads from RabbitMQ, and it's processing rate is about 2500 messages per seconds, but Complete latency is about 7ms. Shouldn't it be equal 1000 / 2500 = 0.4ms?
Topology summary:
Please, help me to understand, what does mean parameter Complete latency in my case.
Topology process messages from RabbitMQ queue with rate about 2500/sec
RabbitMQ screenshot:
According to Storm docs The complete latency is just for spouts. It is the average amount of time it took for ack or fail to be called for a tuple after it was emitted.
So, it is the time between your rabbitmq-spout emitted tuple and the last bolt acked it.
Storm has an internal queue to make pressure, the maximum size of this queue is defined in topology.max.spout.pending variable in configs. If you set it to a high value your rabbit consumer would read messages from the rabbit to fulfil this queue ahead of real processing with bolts in topology, causing the wrong measure of real latency of your topology.
In the RabbitMQ panel, you see how fast messages are consumed from it, not how they are processed, you compare hot and round.
To measure latency I would recommend running your topology for a couple of days, 202 seconds according to your screenshot is too tight.

Decrease consume rate on RabbitMq server

We are running production single server RabbitMQ (3.7) where around 500 mobile applications are connected as producers (MQTT) and around 10 server applications as consumers. Those 500 publishers push messages basically into one queue and less often in the another one.
Recently we had issue with spikes of stacked messages in all our queues. Numbers of stacked messages went from 1 to 1000. This spike was caused by decrease of consumer rate.
I tired to find what happened and how to eliminate spikes in queues and I should limit queue length or eliminate connections. But we can’t limit we have to perform better. I took a look into RabbitMQ memory usage, cpu and same for consumers everything looks fine and RabbitMq was running around 50% on total load same for memory. Also consumers doesn’t seems to be a bottleneck because consume rate went event higher after the queue length grown.
I have a couple of questions:
Is RabbitMQ designed for such a large amount of consumers?
I read that each queue is single threaded is it possible that rabbit just can’t handle 500 producers in one queue and throughput gets lower?
What else I can use to tackle the cause of lower consumer rate? Number of threads in Rabbit?
What do you recon to measure or test benchmark/performance of RabbitMQ server?

Number of Kafka consumer's clients decrease after rebalancing

I've noticed that after a period of time -for example two days- consumergroup concurrency become lower that one I config.
I use spring boot and here is my code sample
factory.setConcurrency(10);
when I use following kafka command after stating kafka consumer it show exactly 10 different consumer client
bin/kafka-consumer-groups.sh --describe --group samplaConsumer --bootstrap-server localhost:9092
after a period of time when I run upper command consumer clients become lower, for example 6 distinct client and manage those 10 partitions.
how can I fix this so after re-balancing or whatever number of clients remain constant
I find out that if a consumer client takes more time than max.poll.interval.ms to process the polled data the consumer considers failed and the group will rebalance.
max.poll.interval.ms The maximum delay between invocations of poll() when using consumer group management. This places an upper bound on the amount of time that the consumer can be idle before fetching more records. If poll() is not called before the expiration of this timeout, then the consumer is considered failed and the group will rebalance in order to reassign the partitions to another member.
And I found out that if this happens a lot that consumer client considers dead and no more rebalancing happens so the number of consumer concurrent client will decrease.
One solution that I came into is that I can decrease the number of max.poll.records so that processing of records took less time than max.poll.interval.ms.
props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, 50); // default is 200

Kafka Producer 0.9.0 performance, large number of waiting-threads

We are writing messages at the rate of about 9000 records/sec into our kafka cluster, at times we see that the producer performance degrades considerably and then it never recovers. When this happens we see the following error "unable to allocate buffer within timeout". Below are the JMX producer metrics taken when the process is running well and when it reaches the bad state. The "waiting-threads" metric is very high when the process degrades, any inputs would be appreciated.
The producer parameters are
batch.size=1000000
linger.ms=30000
acks=-1
metadata.fetch.timeout.ms=1000
compression.type=none
max.request.size=10000000
Athough the buffer is fully available the errors are "org.apache.kafka.common.errors.TimeoutException: Failed to allocate memory within the configured max blocking time"
at one point you are starting to send batches of 1.000.000 messages I think that that's why you performance gets degraded. Try lowering that number or set the linger.ms lower.

Resources