503: Max Client Queue and Topic Endpoint Flow Exceeded - jms

Sometimes I am getting the following error:
503: Max Client Queue and Topic Endpoint Flow Exceeded
What I need to configure to prevent such issue?

The number of "flows" is, roughly speaking, the number of endpoints to which you are subscribed. There are two types: ingress (for messages from your application into Solace) and egress (for messages from Solace into your application). You violated one of those limits. You can tell which by looking at the stack trace.
By default the limit on flows is 100. Before you increase this limit, ask yourself: are you really supposed to be subscribed to more than 100 queues/topics? If not, you may have a leak. Just as you wouldn't fix a memory leak by increasing memory, you shouldn't fix this leak by increasing the max flow. Are you forgetting to close your subscriptions? Are you using temporary queues? Despite their name, temporary queues last for the life of the client session unless you close them.
But if you really are supposed to be subscribed to so many endpoints, you may increase the max ingress and/or max egress. This can be done in SolAdmin by editing the Client Profile and selecting the Advanced Properties tab, or in solacectl by setting max-ingress or max-egress under configure/client-profile/message-spool (as explained here). (There is also max setting per message spool, but you are unlikely to have violated that.)

It looks like the "Max Egress Flows" setting in your client-profile has been exceeded. One egress flow will be used up for each endpoint that your application is binding to.
The "Max Egress Flows" setting can be located under the "Advanced Properties" tab, when you edit the client-profile.

We hit the same issue during our load test. With mere few hundred messages we started getting 503 error. We identified the issue was in our producer topic creation. Once we added caching to topic destination object and the issue was resolved.

Related

Apache.NMS.AMQP setting prefetch size

I am using Apache.NMS.AMQP (v1.8.0) to connect to AWS managed ActiveMQ (v5.15.9) broker but am having problems with setting prefetch size for connection/consumer/destination (couldn't set custom value on either of them).
While digging through source code I've found that default prefetch value (DEFAULT_CREDITS) is set to 200.
To test this behavior I've written test that enqueues 220 messages on a single queue, creates two consumers and then consumes messages. The result was, as expected, that first consumer dequeued 200 messages and second dequeued 20 messages.
After that I was looking for a way to set prefetch size on my consumer without any success since LinkCredit property of ConsumerInfo class is readonly.
Since my usecase requires me to set one prefetch size for connection that is what I've tried next according to this documentation page, but no success. This are URLs that I've tried:
amqps://*my-broker-url*.amazonaws.com:5671?transport.prefetch=50
amqps://*my-broker-url*.amazonaws.com:5671?jms.prefetchPolicy.all=50
amqps://*my-broker-url*.amazonaws.com:5671?jms.prefetchPolicy.queuePrefetch=50
After trying everything stated above I've tried setting prefetch for my queue destinations by appending
?consumer.prefetchSize=50 to queue name. Resulting in something like this:
queue://TestQueue?consumer.prefetchSize=50
All of above attempts resulted with effective prefetch size of 200 (determined through test described above).
Is there any way to set custom prefetch size per connection when connecting to broker using AMQP? Is there any other way to configure broker than through query parameters stated on this documentation page?
From a quick read of the code there isn't any means of setting the consumer link credit in the NMS.AMQP client implementation at this time. This seems to be something that would need to be added as it currently seems to just use a default value to supply to the AmqpNetLite receiver link for auto refill.
Their issue reporter is here.

Kafka Producer is not retrying after Timeout

Intermittently(once or twice in a month) I am seeing the error
org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for cart-topic-0: 5109 ms has passed since batch creation plus linger time
in my logs due to which the corresponding message was not processed by Kafka Producer.
Though all the brokers are up and available I'm not sure why this error is being observed. Even the load is not much during this period.
I have set the retries property value to 10 in Producer configs but still, the message was not been retried. Is there anything else I need to add for the Kafka send method? I have gone through the similar issues raised, but there is no proper conclusion for this error.
Can someone please help on how to fix this.
From the KIP proposal which is now addressed
We propose adding a new timeout delivery.timeout.ms. The window of enforcement includes batching in the accumulator, retries, and the inflight segments of the batch. With this config, the user has a guaranteed upper bound on when a record will either get sent, fail or expire from the point when send returns. In other words we no longer overload request.timeout.ms to act as a weak proxy for accumulator timeout and instead introduce an explicit timeout that users can rely on without exposing any internals of the producer such as the accumulator.
So basically, post this now you can additionally be able to configure a delivery timeout and retries for every async send you execute.
I had an issue where retries were not being obeyed, but in my particular case it was because we were calling the get() method on send for synchronous behaviour. We hadn't realized it would impact retries.
In investigating the issue through various paths I came across the definition of the sorts of errors that are retrial
https://kafka.apache.org/11/javadoc/org/apache/kafka/common/errors/RetriableException.html
What had confused me is that timeout was listed as a retrial one.
I would normally have suggested you would want to look into if the delivery of your batches was taking too long and messages in your buffer were expiring due to increased volume, but you've mentioned that the volume isn't particularly high.
Did you determine if increasing the request.timeout.ms has an impact on the frequency of occurrence? It might be more of a treating the symptom step than the cause.

Spring JMS Websphere MQ open input count issue

I am using Spring 3.2.8 with JDK 6 and Websphere MQ 7.5.0.5. In my application I am making some jms calls using jmsTemplate via ThreadPool. First I faced condition that "Current queue depth" count increases as I hit jms calls. I tracked all objects I am initiating via ThreadPool and interrupt or cancel all threads/future objects. So this "Current queue depth" count controlled.
Now problem is "Open input count" value increases nearly to the number of requests I am sending. When I stops my server this count becomes 0.
In all this case I am able to send request and get response till count of 80 and my ThreadPool size is 30. After reaching request count somewhere to 80 I keep receiving error of future object rejections and not able to receive responses. In fact null responses receive for remaining calls.
Please suggest.
I am using queue in my application and filter on correlation id has been applied. I read more on it and found when we make a call to jmsTemplate.receiveSelected (queue, filter) then this has serious impact on performance. Once I removed this filter the thread conjunction issue resolved. But now filtering is still a problem for me.
Now I will be applying filter in a different way with some limitation of the application but not using receiveSelected instead now I am using jmsTemplate.receive.
Update on 14-Sep
All - I find this as a solution and like to post here.
One of my colleague helped in rectifying this issue which is great help. What we observed after debugging that if cacheConsumer is true then based on combination of
queue + message-selector + session
consumers are cached by Spring. And even calling close() method does not do any thing; basically empty method and causing thread to be hanged/stuck.
After setting cacheConsumer to false, I reverted my code back to original i.e. jmsTemplate.receiveSelected (destination, messageSelector), now when I hit 100 request count of threads only increased between 5 to 10 during multiple iterations of test.
So - this property need to be used carefully.
hope this helps !!
First I faced condition that "Current queue depth" count increases as
I hit jms calls. I tracked all objects I am initiating via ThreadPool
and interrupt or cancel all threads/future objects.
I have no idea what you are talking about but you should NOT be using/monitoring the 'current queue depth' value from your application. Bad, bad design. Only MQ monitoring tools should be using it.
Now problem is "Open input count" value increases nearly to the number
of requests I am sending. When I stops my server this count becomes 0.
Bad programming. You are 'opening a queue then putting a message' over and over and over again. How about you put some code to CLOSE the queue or better yet, REUSE the open queue!!!!!!!

One slow ActiveMQ consumer causing other consumers to be slow

I'm looking for help regarding a strange issue where a slow consumer on a queue causes all the other consumers on the same queue to start consuming messages at 30 second intervals. That is all consumers but the slow one don't consumer messages as fast as they can, instead they wait for some magical 30s barrier before consuming.
The basic flow of my application goes like this:
a number of producers place messages onto a single queue. Messages can have different JMSXGroupIDs
a number of consumers listen to messages on that single queue
as standard practice the JMSXGroupIDs get distributed across the consumers
at some point one of the consumers becomes slow and can't process messages very quickly
the slow consumer ends up filling its prefetch buffer on the broker and AMQ recognises that it is slow (default behaviour)
at that point - or some 'random' but close time later - all consumers except the slow one start to only consume messages at the same 30s intervals
if the slow consumer becomes fast again then things very quickly return to normal operation and the 30s barrier goes away
I'm at a loss for what could be causing this issue, or how to fix it, please help.
More background and findings
I've managed to reliably reproduce this issue on AMQ 5.8.0, 5.9.0 (where the issue was originally noticed) and 5.9.1, on fresh installs and existing ops-managed installs and on different machines some vm and some not. All linux installs, different OSs and java versions.
It doesn't appear to be affected by anything prefetch related, that is: changing the prefetch value from 1 to 10 to 1000 didn't stop the issue from happening
[red herring?] Enabling debug logs on the amq instance shows logs relating to the periodic check for messages that can be expired. The queue doesn't have an expiry policy so I can only think that the scheduled expireMessagesPeriod time is just waking amq up in such a way that it then sends messages to the non-slow consumers.
If the 30s mode is entered then left then entered again the seconds-past-the-minute time is always the same, for example 14s and 44s past the minute. This is true across all consumers and all machines hosting those consumers. Those barrier points do change after restarts of amq.
While not strictly a solution to the problem, further investigation has uncovered the root cause of this issue.
TL;DR - It's known behaviour and won't be fixed before Apollo
More Details
Ultimately this is caused by the maxPageSize property and the fact that AMQ will only apply selection criteria to messages in memory. Generally these are message selectors (property = value), but in my case they are JMSXGroupID=>Consumer assignments.
As messages are received by the queue they get paged into memory and placed into a collection (named pagedInPendingDispatch in the source). To dispatch messages AMQ will scan through this list of messages and try to find a consumer that will accept it. That includes checking the group id, message selector and prefetch buffer space. For our use case we aren't using message selectors but we are using groups. If no consumer can take the message then it is left in the collection and will be checked again at the next tick.
In order to stop the pagedInPendingDispatch collection from eating up all the resources available there is a suggested limit to the size of this queue configured via the maxPageSize property. This property isn't actually a maximum, it's more a hint as to whether, under normal conditions, new message arrivals should be paged in memory or paged to disk.
With these two pieces of information and a slow consumer it turns out that eventually all the messages in the pagedInPendingDispatch collection end up only being consumable by the slow consumer, and hence the collection effectively gets blocked and no other messages get dispatched. This explains why the slow consumer wasn't affected by the 30s interval, it had maxPageSize messages waiting delivery already.
This doesn't explain why I was seeing the non-slow consumers receive messages every 30s though. As it turns out, paging messages into memory has two modes, normal and forced. Normal follows the process outlined above where the size of the collection is compared to the maxPageSize property, when forced, however, messages are always paged into memory. This mode exists to allow you to browse through messages that aren't in memory. As it happens this forced mode is also used by the expiry mechanism to allow AMQ to expire messages that aren't in memory.
So what we have now is a collection of messages in memory that are all targeted for dispatch to the same consumer, a consumer that won't accept them because it is slow or blocked. We also have a backlog of messages awaiting delivery to all consumers. Every expireMessagesPeriod milliseconds a task runs that force pages messages into memory to check if they should be expired or not. This adds those messages onto the pages in collection which now contains maxPageSize messages for the slow consumer and N more messages destined for any consumer. Those messages get delivered.
QED.
References
Ticket referring to this issue but for message selectors instead
Docs relating to the configuration properties
Somebody else with this issue but for selectors

ActiveMQ: Is MessageConsumer's selector process on the broker or client side?

Could someone please confirm if I'm right or wrong on this. It seems to me that "selector" operation is done within MessageConsumer implementation. (i.e. ALL messages are still dispatched from Message Broker to MessageConsumer and then "selector" operation are performed against those messages). The problem occurs when we have a bunch of messages that we are not interested in (i.e. not match our selector), those messages will eventually fill up MessageConsumer's internal queue due to prefetch or cache limit. As a result, we will not be able to receive any new messages, particularly the ones we're interested in with the selector.
So, is there a way to configure AMQ to perform the selector operation on MessageBroker side? Should I start looking at "interceptor" and create my own BrokerPlugin? Any advice on how to workaround this issue?
I'm really appreciate any answer.
Thanks,
Soonthorn A.
Selectors actually are applied at the broker, not on the client side. If your selector is sparse and the destination sees a lot of traffic its likely that the broker has not paged in messages that match the selector and you consumer won't see any matches until more messages are consumed from the destination.
The issue lies in the Destination Policy in play for your broker. By default the broker will only page in 200 message for a browser to avoid using up all available memory and avoid impacting overall performance. You can increase this number via your own DestinationPolicy in activemq.xml, see the documentation page here.

Resources