What is the expire-groups-on-timeout equivalent in Java Config? - spring

As per the docs for expire-groups-on-timeout :
"When a group is completed due to a timeout (or by a MessageGroupStoreReaper), the group is expired (completely removed) by default. Late arriving messages will start a new group. Set this to false to complete the group but have its metadata remain so that late arriving messages will be discarded. Empty groups can be expired later using a MessageGroupStoreReaper together with the empty-group-min-timeout attribute. Default: 'true'."
How do I achieve that in with Java Config? Basically after a group times out, I want the late arriving messages to be discarded and also the group to be expired once all the messages have arrived so that it doesnt produce a memory leak. For the later part, I guess having the MessageGroupStoreReaper will work.

In general, hyphenated properties are converted to camelCase, so
ab-cd-ef
is generally a property
abCdEf
However, there's a typo in the reference manual, it's expire-groups-upon-timeout not expire-groups-on-timeout.
So, you need setExpireGroupsUponTimeout().
I want the late arriving messages to be discarded and also the group to be expired once all the messages have arrived so that it doesnt produce a memory leak.
expireGroupsUponCompletion will remove the metadata for a complete group. To discard late messages after a timeout, but also clean up at some time later, you need a reaper and an appropriate setting in setMinimumTimeoutForEmptyGroups().

Related

ActiveMQ - Competing Consumers with Selector - messages starve in the queue

ActiveMQ 5.15.13
Context: I have a single queue with multiple Consumers. I want to stop some consumers from processing certain messages. This has to be dynamic, I don't want to create separate queues for this. This works without any problems. e.g. Consumer1 ignores Stocks -> Consumer1 can process all invoices and Consumer2 can process all Stocks
But if there is a large number of messages already in the Queue (of one type, e.g. stocks) and I send a message of another type (e.g. invoices), Consumer1 won't process the message of type invoices. It will instead be idle until Consumer2 has processed all Stocks messages. It does not happen every time, but quite often.
Is there any option to change the order of the new messages coming into the queue, such that an idle consumer with matching selector picks up the new message?
Things I've already tried:
using a PendingMessageLimitStrategy -> it seems like it does not work for queues
increasing the maxPageSize and maxBrowsePageSize in the hope that once all Messages are in RAM, the Consumers will search for their messages.
Exclusive Consumers aren't an option since I want to be able to use more than one Consumer per message type.
Im pretty sure that there is some configuration which allows this type of usage. I'm aware that there are better solutions for this issue, but sadly I can't use them easily due to other constraints.
Thanks a lot in advance!
EDIT: I noticed that when I'm refreshing on the localhost queue browser, the stuck messages get executed immediately. It seems like this action performs some sort of queue refresh where the messages get filtered based on their selector again. So I just need this action whenever a new message enters the queue...
This is a 'window' problem where the next set of 'stocks' data needs to be processed before the 'invoicing' data can be processed.
The gotcha with window problems like this is that you need to account for the fact that some messages may never come through, or a consumer may never come back online either. Also, eventually you will be asked 'how many invoices or stocks are left to be processed'-- aka observability.
ActiveMQ has you covered-- check out wild-card destinations and consumers.
Produce 'stocks' to:
queue://data.stocks.input
Produce 'invoices' to:
queue://data.invoices.input
You then setup consumes to connect:
queue://data.*.input
note: the wildard '*'.
ActiveMQ will match queues based on the wildcard pattern, and then process data accordingly. As a bonus, you can still use a selector.

Kafka Producer is not retrying after Timeout

Intermittently(once or twice in a month) I am seeing the error
org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for cart-topic-0: 5109 ms has passed since batch creation plus linger time
in my logs due to which the corresponding message was not processed by Kafka Producer.
Though all the brokers are up and available I'm not sure why this error is being observed. Even the load is not much during this period.
I have set the retries property value to 10 in Producer configs but still, the message was not been retried. Is there anything else I need to add for the Kafka send method? I have gone through the similar issues raised, but there is no proper conclusion for this error.
Can someone please help on how to fix this.
From the KIP proposal which is now addressed
We propose adding a new timeout delivery.timeout.ms. The window of enforcement includes batching in the accumulator, retries, and the inflight segments of the batch. With this config, the user has a guaranteed upper bound on when a record will either get sent, fail or expire from the point when send returns. In other words we no longer overload request.timeout.ms to act as a weak proxy for accumulator timeout and instead introduce an explicit timeout that users can rely on without exposing any internals of the producer such as the accumulator.
So basically, post this now you can additionally be able to configure a delivery timeout and retries for every async send you execute.
I had an issue where retries were not being obeyed, but in my particular case it was because we were calling the get() method on send for synchronous behaviour. We hadn't realized it would impact retries.
In investigating the issue through various paths I came across the definition of the sorts of errors that are retrial
https://kafka.apache.org/11/javadoc/org/apache/kafka/common/errors/RetriableException.html
What had confused me is that timeout was listed as a retrial one.
I would normally have suggested you would want to look into if the delivery of your batches was taking too long and messages in your buffer were expiring due to increased volume, but you've mentioned that the volume isn't particularly high.
Did you determine if increasing the request.timeout.ms has an impact on the frequency of occurrence? It might be more of a treating the symptom step than the cause.

What is the default value of the apns-expiration field?

The apns-expiration field governs how long Apple will hold on to an apns message before giving up on delivering it (for example, if the device is turned off).
According to their docs, a value of zero means "no retention": meaning that if the message can't be delivered immediately, its discarded.
But what happens if the header isn't specified? In other words, what is the default behavior?
My information isn't based on documentation but rather on stats gathered from a multi-million users system. The policy at this time is to retain push messages for a long time (exactly how long I dont know - we've seen 1M seconds retention in some cases). Of course, as this isn't documented it could change in the future.
Note that this default value is similar to Google's policy (where the default is 2419200 seconds), with the exception that Google's policy is documented.
https://developer.apple.com/library/archive/documentation/NetworkingInternet/Conceptual/RemoteNotificationsPG/CommunicatingwithAPNs.html#//apple_ref/doc/uid/TP40008194-CH11-SW1
"If this value is nonzero, APNs stores the notification and tries to deliver it at least once, repeating the attempt as needed if it is unable to deliver the notification the first time."
Literally this means that the absence of the value equals to 0.

Aggregator behavior on server restart - spring integration

Premise -
In spring integration,if i have a aggregator with a message group which is incomplete. Before group release stratergy is met, server is restarted.
Current Behavior->
all the messages posted to the aggregator go to the same message group and not a new one, since it is not marked complete, messages keep flowing in.
Expected->
If server is restarted, aggregator picks the left over messages from message store, marks already persisted ones complete & then cater new ones,
Is my expectation incorrect? Can somebody guide?
I think we can reach your requirements with MessageGroupStoreReaper, which you will run just on the server startup, e.g. via catching ContextRefreshedEvent:
The MessageGroupStore maintains a list of these callbacks which it applies, on demand, to all messages whose timestamp is earlier than a time supplied as a parameter (see the registerMessageGroupExpiryCallback(..) and expireMessageGroups(..) methods above).
The expireMessageGroups method can be called with a timeout value. Any message older than the current time minus this value will be expired, and have the callbacks applied. Thus it is the user of the store that defines what is meant by message group "expiry".
http://docs.spring.io/spring-integration/reference/html/messaging-routing-chapter.html#reaper

One slow ActiveMQ consumer causing other consumers to be slow

I'm looking for help regarding a strange issue where a slow consumer on a queue causes all the other consumers on the same queue to start consuming messages at 30 second intervals. That is all consumers but the slow one don't consumer messages as fast as they can, instead they wait for some magical 30s barrier before consuming.
The basic flow of my application goes like this:
a number of producers place messages onto a single queue. Messages can have different JMSXGroupIDs
a number of consumers listen to messages on that single queue
as standard practice the JMSXGroupIDs get distributed across the consumers
at some point one of the consumers becomes slow and can't process messages very quickly
the slow consumer ends up filling its prefetch buffer on the broker and AMQ recognises that it is slow (default behaviour)
at that point - or some 'random' but close time later - all consumers except the slow one start to only consume messages at the same 30s intervals
if the slow consumer becomes fast again then things very quickly return to normal operation and the 30s barrier goes away
I'm at a loss for what could be causing this issue, or how to fix it, please help.
More background and findings
I've managed to reliably reproduce this issue on AMQ 5.8.0, 5.9.0 (where the issue was originally noticed) and 5.9.1, on fresh installs and existing ops-managed installs and on different machines some vm and some not. All linux installs, different OSs and java versions.
It doesn't appear to be affected by anything prefetch related, that is: changing the prefetch value from 1 to 10 to 1000 didn't stop the issue from happening
[red herring?] Enabling debug logs on the amq instance shows logs relating to the periodic check for messages that can be expired. The queue doesn't have an expiry policy so I can only think that the scheduled expireMessagesPeriod time is just waking amq up in such a way that it then sends messages to the non-slow consumers.
If the 30s mode is entered then left then entered again the seconds-past-the-minute time is always the same, for example 14s and 44s past the minute. This is true across all consumers and all machines hosting those consumers. Those barrier points do change after restarts of amq.
While not strictly a solution to the problem, further investigation has uncovered the root cause of this issue.
TL;DR - It's known behaviour and won't be fixed before Apollo
More Details
Ultimately this is caused by the maxPageSize property and the fact that AMQ will only apply selection criteria to messages in memory. Generally these are message selectors (property = value), but in my case they are JMSXGroupID=>Consumer assignments.
As messages are received by the queue they get paged into memory and placed into a collection (named pagedInPendingDispatch in the source). To dispatch messages AMQ will scan through this list of messages and try to find a consumer that will accept it. That includes checking the group id, message selector and prefetch buffer space. For our use case we aren't using message selectors but we are using groups. If no consumer can take the message then it is left in the collection and will be checked again at the next tick.
In order to stop the pagedInPendingDispatch collection from eating up all the resources available there is a suggested limit to the size of this queue configured via the maxPageSize property. This property isn't actually a maximum, it's more a hint as to whether, under normal conditions, new message arrivals should be paged in memory or paged to disk.
With these two pieces of information and a slow consumer it turns out that eventually all the messages in the pagedInPendingDispatch collection end up only being consumable by the slow consumer, and hence the collection effectively gets blocked and no other messages get dispatched. This explains why the slow consumer wasn't affected by the 30s interval, it had maxPageSize messages waiting delivery already.
This doesn't explain why I was seeing the non-slow consumers receive messages every 30s though. As it turns out, paging messages into memory has two modes, normal and forced. Normal follows the process outlined above where the size of the collection is compared to the maxPageSize property, when forced, however, messages are always paged into memory. This mode exists to allow you to browse through messages that aren't in memory. As it happens this forced mode is also used by the expiry mechanism to allow AMQ to expire messages that aren't in memory.
So what we have now is a collection of messages in memory that are all targeted for dispatch to the same consumer, a consumer that won't accept them because it is slow or blocked. We also have a backlog of messages awaiting delivery to all consumers. Every expireMessagesPeriod milliseconds a task runs that force pages messages into memory to check if they should be expired or not. This adds those messages onto the pages in collection which now contains maxPageSize messages for the slow consumer and N more messages destined for any consumer. Those messages get delivered.
QED.
References
Ticket referring to this issue but for message selectors instead
Docs relating to the configuration properties
Somebody else with this issue but for selectors

Resources