MQ selective dequeue speed is sometimes woeful - ibm-mq

I have a process that uses JMSTemplate to selectively dequeue from an MQ queue based on JMS header values.
When the dequeue query matches messages at the front of the queue, the dequeue rate is approximately 60-70 msg/second. However, when the query matches messages only 50, 100 or 200 messages deep the dequeue rate drops to 1 msg / 3-4 seconds.
The fast dequeue query is ThreadId='24' or ThreadId='PRIMARY'. The slow dequeue query is ThreadId='24'.
The real reason for the slow processing times might be something else, but I observe the change in processing times with nothing more than the change in deselect query.
I suspect this processing speed is not usual. What could possibly be going wrong?

Querying deep queues by headers is not really recommended as the headers are not indexed. This might be the issue. Queries on CorrelationId and MessageId (if they are on the format 'ID:48-hex-digits') will be indexed and are very quick (~1ms / query on very deep queues, depending on setup).
We faced this issue as well and choose to encode a correlation identifier in the correlation id header instead of in JMS string properties (MQRFH2/usr) headers.
This was on MQ 7.0

Related

ActiveMQ - Competing Consumers with Selector - messages starve in the queue

ActiveMQ 5.15.13
Context: I have a single queue with multiple Consumers. I want to stop some consumers from processing certain messages. This has to be dynamic, I don't want to create separate queues for this. This works without any problems. e.g. Consumer1 ignores Stocks -> Consumer1 can process all invoices and Consumer2 can process all Stocks
But if there is a large number of messages already in the Queue (of one type, e.g. stocks) and I send a message of another type (e.g. invoices), Consumer1 won't process the message of type invoices. It will instead be idle until Consumer2 has processed all Stocks messages. It does not happen every time, but quite often.
Is there any option to change the order of the new messages coming into the queue, such that an idle consumer with matching selector picks up the new message?
Things I've already tried:
using a PendingMessageLimitStrategy -> it seems like it does not work for queues
increasing the maxPageSize and maxBrowsePageSize in the hope that once all Messages are in RAM, the Consumers will search for their messages.
Exclusive Consumers aren't an option since I want to be able to use more than one Consumer per message type.
Im pretty sure that there is some configuration which allows this type of usage. I'm aware that there are better solutions for this issue, but sadly I can't use them easily due to other constraints.
Thanks a lot in advance!
EDIT: I noticed that when I'm refreshing on the localhost queue browser, the stuck messages get executed immediately. It seems like this action performs some sort of queue refresh where the messages get filtered based on their selector again. So I just need this action whenever a new message enters the queue...
This is a 'window' problem where the next set of 'stocks' data needs to be processed before the 'invoicing' data can be processed.
The gotcha with window problems like this is that you need to account for the fact that some messages may never come through, or a consumer may never come back online either. Also, eventually you will be asked 'how many invoices or stocks are left to be processed'-- aka observability.
ActiveMQ has you covered-- check out wild-card destinations and consumers.
Produce 'stocks' to:
queue://data.stocks.input
Produce 'invoices' to:
queue://data.invoices.input
You then setup consumes to connect:
queue://data.*.input
note: the wildard '*'.
ActiveMQ will match queues based on the wildcard pattern, and then process data accordingly. As a bonus, you can still use a selector.

Retry after delay on back pressure with Spring Project Reactor?

Background
I'm trying to implement something similar to a simple non-blocking rate-limiter with Spring Project Reactor version 3.3.0. For example, to limit the number to 100 requests per second I use this implementation:
myFlux
.bufferTimeout(100, Duration.ofSeconds(1))
.delayElements(Duration.ofSeconds(1))
..
This works fine for my use case but if the subscriber doesn't keep up with the rate of the myFlux publisher it'll (rightly) throw an OverflowException:
reactor.core.Exceptions$OverflowException: Could not emit buffer due to lack of requests
at reactor.core.Exceptions.failWithOverflow(Exceptions.java:215)
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
Assembly trace from producer [reactor.core.publisher.FluxLift] :
reactor.core.publisher.Flux.bufferTimeout(Flux.java:2780)
In my case it's important that all elements are consumed by the subscriber so e.g. dropping on back pressure (onBackpressureDrop()) is not acceptable.
Question
Is there a way to, instead of dropping elements on back pressure, just pause the publishing of messages until the subscriber has caught up? In my case myFlux is publishing a finite, but large set of, elements persisted in a durable database so dropping elements should not be required imho.
bufferTimeout(int maxSize, Duration maxTime) requests an unbounded amount of messages, thus being insensitive to backpressure. That makes it unsuitable for your case.
On a conceptual level, bufferTimeout cannot be backpressure sensitive, because you clearly instruct the publisher to emit one batch (even if it is empty) for every elapsed duration. If the subscriber is too slow, this will - rightfully - cause an overflow.
Instead, try:
myFlux
.delayElements(Duration.ofMillis(10))
.buffer(100)
or
myFlux
.buffer(100)
.delayElements(Duration.ofSeconds(1))
buffer(int maxSize) requests the correct amount upstream (request * maxSize), and so is sensitive to backpressure from the subscribers.

How to slow down or set given speed on the Kafka stream consumer?

I am trying to control number of messages which are consumed by the KStream and I am not very succesful.
I am using:
max.poll.interval.ms=100
and
max.poll.records=20
to get like 200 messages per second.
But it seems to be not very good, as I see that there are like 500 messages per second also in my statistics.
What else shall I set on the side of the stream consumer?
I am using: max.poll.interval.ms=100 and max.poll.records=20 to get
like 200 messages per second.
max.poll.interval.ms and max.poll.records properties do not work this way.
max.poll.interval.ms indicates the maximum time interval in milliseconds the consumer has to wait in between each consumer poll of the topic.
max.poll.records indicates the maximum number of records the consumer can consume during each consumer poll of the topic.
The interval between each poll is not controlled by the above two properties but by the time taken by your consumer to acknowledge the fetched records.
For example, let's say a topic X exists with 1000 records in it, and the time taken by the consumer to acknowledge the fetched records is 20ms. With max.poll.interval.ms = 100 and max.poll.records = 20, the consumer will poll the Kafka topic every 20ms and in every poll, max of 20 records will be fetched. In case, the time taken to acknowledge the fetched records is greater than the max.poll.interval.ms, the polling will be considered as failed and that particular batch will re-polled again from the Kafka topic.
A KafkaConsumer (also the one that is internally used by KafkaStreams reads record as fast as possible.
The parameter you mention can have an impact on performance, but you cannot control the actual data rate. Also note, that max.poll.records only configures how many records poll() return, but it has no impact on client-broker communication. A KafkaConsumer can fetch more records when talking to the broker, and then return buffered messages on poll() as long as records are in the buffer (ie, for this case, poll() is a client-side operator that only ensures that you don't timeout via max.poll.interval.ms). Thus, you might be more interested in fetch.max.bytes, that determines the size of bytes fetches from the broker. If you reduce this parameter, the consumer is less efficient and thus throughput should decrease. (it's not recommended though).
Another way to configure throughput are quotas (https://kafka.apache.org/documentation/#design_quotas) It's a broker side configuration that allows you limit the amount of data a client can read and/or write.
The best thing to do in Kafka Streams (and also when using a plain KafkaConsumer) is to throttle calls to poll() manually. For Kafka Streams, you can add a Thread.sleep() into any UDF. If you don't want to piggyback this into an existing operator, you can just add an foreach() with ephemeral state (ie, a class member variable) to track the throughput and compute how much you need to sleep to throttle the throughput accordingly.
You can use something like akka-stream-kafka (aka reactive-kafka) on the consumer side. akka-streams has nice throttling capabilities which will come in handy here:
http://doc.akka.io/docs/akka/snapshot/java/stream/stream-quickstart.html#time-based-processing
In Kafka there is new concept of Kafka Quota.
All details are here Kafka -> 4.9 Quotas

One slow ActiveMQ consumer causing other consumers to be slow

I'm looking for help regarding a strange issue where a slow consumer on a queue causes all the other consumers on the same queue to start consuming messages at 30 second intervals. That is all consumers but the slow one don't consumer messages as fast as they can, instead they wait for some magical 30s barrier before consuming.
The basic flow of my application goes like this:
a number of producers place messages onto a single queue. Messages can have different JMSXGroupIDs
a number of consumers listen to messages on that single queue
as standard practice the JMSXGroupIDs get distributed across the consumers
at some point one of the consumers becomes slow and can't process messages very quickly
the slow consumer ends up filling its prefetch buffer on the broker and AMQ recognises that it is slow (default behaviour)
at that point - or some 'random' but close time later - all consumers except the slow one start to only consume messages at the same 30s intervals
if the slow consumer becomes fast again then things very quickly return to normal operation and the 30s barrier goes away
I'm at a loss for what could be causing this issue, or how to fix it, please help.
More background and findings
I've managed to reliably reproduce this issue on AMQ 5.8.0, 5.9.0 (where the issue was originally noticed) and 5.9.1, on fresh installs and existing ops-managed installs and on different machines some vm and some not. All linux installs, different OSs and java versions.
It doesn't appear to be affected by anything prefetch related, that is: changing the prefetch value from 1 to 10 to 1000 didn't stop the issue from happening
[red herring?] Enabling debug logs on the amq instance shows logs relating to the periodic check for messages that can be expired. The queue doesn't have an expiry policy so I can only think that the scheduled expireMessagesPeriod time is just waking amq up in such a way that it then sends messages to the non-slow consumers.
If the 30s mode is entered then left then entered again the seconds-past-the-minute time is always the same, for example 14s and 44s past the minute. This is true across all consumers and all machines hosting those consumers. Those barrier points do change after restarts of amq.
While not strictly a solution to the problem, further investigation has uncovered the root cause of this issue.
TL;DR - It's known behaviour and won't be fixed before Apollo
More Details
Ultimately this is caused by the maxPageSize property and the fact that AMQ will only apply selection criteria to messages in memory. Generally these are message selectors (property = value), but in my case they are JMSXGroupID=>Consumer assignments.
As messages are received by the queue they get paged into memory and placed into a collection (named pagedInPendingDispatch in the source). To dispatch messages AMQ will scan through this list of messages and try to find a consumer that will accept it. That includes checking the group id, message selector and prefetch buffer space. For our use case we aren't using message selectors but we are using groups. If no consumer can take the message then it is left in the collection and will be checked again at the next tick.
In order to stop the pagedInPendingDispatch collection from eating up all the resources available there is a suggested limit to the size of this queue configured via the maxPageSize property. This property isn't actually a maximum, it's more a hint as to whether, under normal conditions, new message arrivals should be paged in memory or paged to disk.
With these two pieces of information and a slow consumer it turns out that eventually all the messages in the pagedInPendingDispatch collection end up only being consumable by the slow consumer, and hence the collection effectively gets blocked and no other messages get dispatched. This explains why the slow consumer wasn't affected by the 30s interval, it had maxPageSize messages waiting delivery already.
This doesn't explain why I was seeing the non-slow consumers receive messages every 30s though. As it turns out, paging messages into memory has two modes, normal and forced. Normal follows the process outlined above where the size of the collection is compared to the maxPageSize property, when forced, however, messages are always paged into memory. This mode exists to allow you to browse through messages that aren't in memory. As it happens this forced mode is also used by the expiry mechanism to allow AMQ to expire messages that aren't in memory.
So what we have now is a collection of messages in memory that are all targeted for dispatch to the same consumer, a consumer that won't accept them because it is slow or blocked. We also have a backlog of messages awaiting delivery to all consumers. Every expireMessagesPeriod milliseconds a task runs that force pages messages into memory to check if they should be expired or not. This adds those messages onto the pages in collection which now contains maxPageSize messages for the slow consumer and N more messages destined for any consumer. Those messages get delivered.
QED.
References
Ticket referring to this issue but for message selectors instead
Docs relating to the configuration properties
Somebody else with this issue but for selectors

ActiveMQ: Slow processing consumers

Concerning ActiveMQ: I have a scenario where I have one producer which sends small (around 10KB) files to the consumers. Although the files are small, the consumers need around 10 seconds to analyze them and return the result to the producer. I've researched a lot, but I still cannot find answers to the following questions:
How do I make the broker store the files (completely) in a queue?
Should I use ObjectMessage (because the files are small) or blob messages?
Because the consumers are slow processing, should I lower their prefetchLimit or use a round-robin dispatch policy? Which one is better?
And finally, in the ActiveMQ FAQ, I read this - "If a consumer receives a message and does not acknowledge it before closing then the message will be redelivered to another consumer.". So my question here is, does ActiveMQ guarantee that only 1 consumer will process the message (and therefore there will be only 1 answer to the producer), or not? When does the consumer acknowledge a message (in the default, automatic acknowledge settings) - when receiving the message and storing it in a session, or when the onMessage handler finishes? And also, because the consumers are so slow in processing, should I change some "timeout limit" so the broker knows how much to wait before giving the work to another consumer (this is kind of related to my previous questions)?
Not sure about others, but here are some thoughts.
First: I am not sure what your exact concern is. ActiveMQ does store messages in a data store; all data need NOT reside in memory in any single place (either broker or client). So you should actually be good in that regard; earlier versions did require that all ids needed to fit in memory (not sure if that was resolved), but even that memory usage would be low enough unless you had tens of millions of in-queue messages.
As to ObjectMessage vs blob; raw byte array (blob) should be most compact representation, but since all of these get serialized for storage, it only affects memory usage on client. Pre-fetch mostly helps with access latency; but given that they are slow to process, you probably don't need any prefetching; so yes, either set it to 1 or 2 or disable altogether.
As to guarantees: best that distributed message queues can guarantee is either at-least-once (with possible duplicates), or at-most-once (no duplicates, can lose messages). It is usually better to take at-least-once, and make clients to de-duping using client-provided ids. How acknowledgement is sent is defiend by JMS specification so you can read more about JMS; this is not ActiveMQ specific.
And yes, you should set timeout high enough that worker typically can finish up work, including all network latencies. This can slow down re-transmit of dropped messages (if worked dies), but it is probably not a problem for you.

Resources