How to increase mdb consumers in weblogic? - jms

Have a problem with my mdb where I have observed newer messages getting consumed before the older ones and suspect the round robin load balancing policy to be the culprit. I can see weblogic console has 16 consumers, but sometimes if 1st consumer is occupied 17th message remains in queue and 18th is picked up for processing. Planning to solve this by increasing the number of consumers as my maximum volume is 45 messages in short span and each message may take from 5 to 110 minutes to process.

Related

How to configure Spring SimpleMessageListenerContainer receiveTimeout in order to scale up to a reasonable number of consumers

Use case
A backend consuming messages at various rate and inserting the messages in a DB.
Today in production my SimpleMessageListenerContainer scales to maxConcurrentConsumers even if is not necessary to handle the traffic rate.
Problem
I try to find the proper configuration of spring SimpleMessageListenerContainer in order to let spring scale up/down the number of consumers to the adequate number in order to handle the incoming traffic.
With a a fix injection rate, on a single node rabbitmq I have noticed that the scaling process stabilize at
numberOfConsumers = (injectionRate * receiveTimeoutInMilliseconds) / 1000
For example :
injection rate : 100 msg/s
container.setReceiveTimeout(100L); // 100 ms
--> consumers 11
--> Consumer capacity 100%
injection rate : 100 msg/s
container.setReceiveTimeout(1000L); // 1 s - default
--> consumers 101
--> Consumer capacity 100%
Knowing that more consumers means more threads and more amqp channels
I am wondering why the scaling algorithm is not linked to the consumerCapacity metric and why is the default receive timeout set to 1 second ?
See the documentation https://docs.spring.io/spring-amqp/docs/current/reference/html/#listener-concurrency
In addition, a new property called maxConcurrentConsumers has been added and the container dynamically adjusts the concurrency based on workload. This works in conjunction with four additional properties: consecutiveActiveTrigger, startConsumerMinInterval, consecutiveIdleTrigger, and stopConsumerMinInterval. With the default settings, the algorithm to increase consumers works as follows:
If the maxConcurrentConsumers has not been reached and an existing consumer is active for ten consecutive cycles AND at least 10 seconds has elapsed since the last consumer was started, a new consumer is started. A consumer is considered active if it received at least one message in batchSize * receiveTimeout milliseconds.
With the default settings, the algorithm to decrease consumers works as follows:
If there are more than concurrentConsumers running and a consumer detects ten consecutive timeouts (idle) AND the last consumer was stopped at least 60 seconds ago, a consumer is stopped. The timeout depends on the receiveTimeout and the batchSize properties. A consumer is considered idle if it receives no messages in batchSize * receiveTimeout milliseconds. So, with the default timeout (one second) and a batchSize of four, stopping a consumer is considered after 40 seconds of idle time (four timeouts correspond to one idle detection).
Practically, consumers can be stopped only if the whole container is idle for some time. This is because the broker shares its work across all the active consumers.
So, when you reduce the receiveTimeout you would need a corresponding increase in the idle/active triggers.
The default is 1 second to provide a reasonable compromise between spinning an idle consumer while retaining responsive behavior to a container stop() operation (idle consumers are blocked for the timeout). Increasing it will cause a less responsive container (for stop()).
It is generally unnecessary to set it lower than 1 second.

Long delays between processing of two consecutive kafka batches (using ruby/karafka consumer)

I am using karafka to read from a topic, and call an external service. Each call to external service takes roughly 300ms. And with 3 consumers (3 pods in the k8s) running in the consumer group, I expect to achieve 10 events per second. I see these loglines , which also confirm the 300ms expectation for processing each individual event.
However, the overall throughput doesn't add up. Each karafka processes seems stuck for a long time between processing two batches of events.
Following instrumentation around the consume method, implies that the consumer code itself is not taking time.
https://github.com/karafka/karafka/blob/master/lib/karafka/backends/inline.rb#L12
INFO Inline processing of topic production.events with 8 messages took 2571 ms
INFO 8 messages on production.events topic delegated to xyz
However, I notice two things:
When I tail logs on the 3 pods, only one of the 3 pods seems to emit logs a time. This does not make sense to me. As all partitions have enough events, and each consumer should be able to consumer in parallel.
Though, the above message roughly shows 321ms (2571/8) per event, in reality I see the logs stalled for a long duration between processing of two batches. I am curious, where is that time going?
======
Edit:
There is some skew in the distribution of data across brokers - as we recently expanded our brokers from 3 to total of 6. However, none of the brokers is under cpu or disk pressure. This is a new cluster, and hardly 4-5% cpu is used at peak times.
Our data is evenly distributed in 3 partitions - I say this as the last offset is roughly the same across each partition.
Partition
FirstOffset
LastOffset
Size
LeaderNode
ReplicaNodes
In-syncReplicaNodes
OfflineReplicaNodes
PreferredLeader
Under-replicated
[0]
2174152
3567554
1393402
5
5,4,3
3,4,5
Yes
No
1
2172222
3566886
1394664
4
4,5,6
4,5,6
Yes
No
[2]
2172110
3564992
1392882
1
1,6,4
1,4,6
Yes
No
However, I do see that one consumer perpetually lags behind the other two.
Following table shows the lag for my consumers. There is one consumer process for each partition:
Partition
First Offset
Last Offset
Consumer Offset
Lag
0
2174152
3566320
2676120
890200
1
2172222
3565605
3124649
440956
2
2172110
3563762
3185587
378175
Combined lag
1709331
Here is a screenshot of the logs from all 3 consumers. You can notice the big difference between time spent in each invocation of consume function and interval between two adjacent invocations. Basically, i want to explain and/or reduce that waiting time. There are 100k+ events in this topic and my dummy karafka applications are able to quickly retrieve them, so kafka brokers are not an issue.
Update after setting max_wait_time to 1 second (previously 5 second)
It seems that the issue is resolved after reducing the wait config. Now the difference between two consecutive logs is roughly equal to the time spent in consume
2021-06-24 13:43:23.425 Inline processing of topic x with 7 messages took 2047 ms
2021-06-24 13:43:27.787 Inline processing of topic x with 11 messages took 3347 ms
2021-06-24 13:43:31.144 Inline processing of topic x with 11 messages took 3344 ms
2021-06-24 13:43:34.207 Inline processing of topic x with 10 messages took 3049 ms
2021-06-24 13:43:37.606 Inline processing of topic x with 11 messages took 3388 ms
There are a couple of problems you may be facing. It is a bit of a guessing from my side without more details but let's give it a shot.
From the Kafka perspective
Are you sure you're evenly distributing data across partitions? Maybe it is eating up things from one partition?
What you wrote here:
INFO Inline processing of topic production.events with 8 messages took 2571 ms
This indicates that there was a batch of 8 processed altogether by a single consumer. This could indicate that the data is not distributed evenly.
From the performance perspective
There are two performance properties that can affect your understanding of how Karafka operates: throughput and latency.
Throughput is the number of messages that can be processed in a given time
Latency is the time it takes a message from the moment it was produced to it been processed.
As far as I understand, all messages are being produced. You could try playing with the Karafka settings, in particular this one: https://github.com/karafka/karafka/blob/83a9a5ba417317495556c3ebb4b53f1308c80fe0/lib/karafka/setup/config.rb#L114
From the logger perspective
Logger that is being used flushes data from time to time, so you won't see it immediately but after a bit of time. You can validate this by looking at the log time.

why does SwiftMQ show flow control behaviour even when flow control is disabled?

I'm trying to benchmark the performance of swiftMQ 5.0.0 with producer and consumer application I wrote so that I can vary the number of producer threads and consumer threads. I have added a delay on the consumer to simulate the time taken to process a message. I have run a test by setting the producer threads fixed at 2, and by varying the number of consumer threads from 20 to 92 in steps of 4.
Initially, the producer rate starts high and consumer rate is low (as expected due to the delay added and less number of consumer threads).
As the number of consumer threads increase, the producer rate drops and consumer rate increases and they become equal at around 48 consumer threads.
After that, as the number of consumer threads further increase, both producer and consumer rates keep increasing linearly. I am wandering what the reason for this behavior is?
see this image for the
result graph
Notes:
I have disabled flow control at queue level by setting flowcontrol-start-queuesize="-1" .
I also have not set a value to inbound-flow-control-enabled in routing swiftlet. (I believe it
defaults to false)
Any help on this matter is much appreciated. TIA

Decrease consume rate on RabbitMq server

We are running production single server RabbitMQ (3.7) where around 500 mobile applications are connected as producers (MQTT) and around 10 server applications as consumers. Those 500 publishers push messages basically into one queue and less often in the another one.
Recently we had issue with spikes of stacked messages in all our queues. Numbers of stacked messages went from 1 to 1000. This spike was caused by decrease of consumer rate.
I tired to find what happened and how to eliminate spikes in queues and I should limit queue length or eliminate connections. But we can’t limit we have to perform better. I took a look into RabbitMQ memory usage, cpu and same for consumers everything looks fine and RabbitMq was running around 50% on total load same for memory. Also consumers doesn’t seems to be a bottleneck because consume rate went event higher after the queue length grown.
I have a couple of questions:
Is RabbitMQ designed for such a large amount of consumers?
I read that each queue is single threaded is it possible that rabbit just can’t handle 500 producers in one queue and throughput gets lower?
What else I can use to tackle the cause of lower consumer rate? Number of threads in Rabbit?
What do you recon to measure or test benchmark/performance of RabbitMQ server?

JMS Priority Messages Causing Starvation of Lower Priority Message

I have a queue that is loaded with high priority JMS messages throughout the day, I want to get them out the door quickly. The queue is also being loaded periodically with lower priority messages in large batches. The problem that I see on busy days, is that there are always enough high priority messages at the front of the queue that none of the lower priority messages get selected until that volume drops off. Often they will sit on the queue until they middle of the night. The app is distributed over a number of servers, but the CPUs are not even breathing hard, the JMS seems to be the choak point.
My hunch is to implement some sort of aging algorithm that increases priority for messages that have been on the queue for a very long time, but of course, that is what middleware is supposed to do for me. I can't imagine that the JMS provider (IBM WebsphereMQ) or the application server (TIBCO BusinessWorks) doesn't have some sort of facility to cope with this. So before I go write some code, I thought I would ask, is there any way to get either of these technologies to help me out with this problem?
The BusinessWorks activity that is reading the queue is a JMS SOAP Event Source, but I could turn it into a JMS Queue Receiver activity or whatever.
All thoughts on how to solve this are welcome :-) TIA
That's like tying 1 hand behind your back and then complaining that you cannot swim properly. D'oh! First off, who's bright idea was it to mix messages. Just because you can do something does not mean you should.
The app is distributed over a number of servers, but the CPUs are not
even breathing hard, the JMS seems to be the choak point.
Well then, the solution is easy. Put high priority messages into queue "A" (the existing queue) and low priority messages into a new queue "B". Next, startup another instance of your JMS application to read the messages off queue "B".
Also, JMS is probably not the choke-point. It is what the application is doing with the message data after the JMS layer picks up the message that is taking a long time (i.e. backend work).
Finally, how many instances of your JMS application is running against the existing queue? If you are only running 1 instance, why? If you have lots of CPU capacity then why don't you run 10 instances of your JMS application. Do some true parallel processing of messages.
If you really want to keep you messages mixed on the same queue and have the high priority messages processed first, and yet your volume of messages is such that you cannot work through all the volume sometimes until the middle of the night, then you quite simply do not have enough processing applications. MQ is a parallel processing system, it is designed to allow many applications to put or get from a queue at once. Make use of this by running more of your getting applications at the same time. They will work through your high priority messages quicker and then get back to processing the lower priority ones.
From your description it's clear that you want the high priority messages to processed first. In such a case lower priority messages will have to wait.
MQ will not increase the priority of messages if they are sitting in queue for long time. How will it know that it has to change property of a message :)?. You will need to develop an application to do that.
I would think segregating messages based on priority, for example, high priority messages are put to one queue and lower priority messages to another queue could be one option you could look at.
Second option would be to look at the changing the delivery sequence (MSGDLVSQ) to FIFO. This makes to messages to be delivered to consumers in the order they arrived into queue. But note this will ignore the message priority, meaning if there is a lower priority message followed by a higher priority message, then higher priority message will wait till the lower priority message is delivered.

Resources