Filter / drop duplicate messages from ActiveMQ queue based on custom properties - jms

Problem
When my web application updates an item in the database, it sends a message containing the item ID via Camel onto an ActiveMQ queue, the consumer of which will get an external service (Solr) updated. The external service reads from the database independently.
What I want is that if the web application sends another message with the same item ID while the old one is still on queue, that the new message be dropped to avoid running the Solr update twice.
After the update request has been processed and the message with that item ID is off the queue, new request with the same ID should again be accepted.
Is there a way to make this work out of the box? I'm really tempted to drop ActiveMQ and simply implement the update request queue as a database table with a unique constraint, ordered by timestamp or a running insert id.
What I tried so far
I've read this and this page on Stackoverflow. These are the solutions mentioned there:
Idempotent consumers in Camel: Here I can specify an expression that defines what constitutes a duplicate, but that would also prevent all future attempts to send the same message, i.e. update the same item. I only want new update requests to be dropped while they are still on queue.
"ActiveMQ already does duplicate checks, look at auditDepth!": Well, this looks like a good start and definitely closest to what I want, but this determines equality based on the Message ID which I cannot set. So either I find a way to make ActiveMQ generate the Message ID for this queue in a certain way or I find a way to make the audit stuff look at my item ID field instead of the Message ID. (One comment in my second link even suggests using "a well defined property you set on the header", but fails to explain how.)
Write a custom plugin that redirects incoming messages to the deadletter queue if they match one that's already on the queue. This seems to be the most complete solution offered so far, but it feels so overkill for what I perceive as a fairly mundane and every-day task.
PS: I found another SO page that asks the same thing without an answer.

What you want is not message broker functionality, repeat after me, "A message broker is not a database, A message broker is not a database", repeat as necessary.
The broker's job is get messages reliably from point A to point B. The client offers some filtering capabilities via message selectors but this is minimal and mainly useful in keeping only specific messages that a single client is interested in from flowing there and not others which some other client might be in charge of processing.
Your use case calls for a more stateful database centric solution as you've described. Creating a broker plugin to walk the Queue to check for a message is reinventing the wheel and prone to error if the Queue depth is large as ActiveMQ might not even page in all the messages for you based on memory constraints.

Related

ActiveMQ - Competing Consumers with Selector - messages starve in the queue

ActiveMQ 5.15.13
Context: I have a single queue with multiple Consumers. I want to stop some consumers from processing certain messages. This has to be dynamic, I don't want to create separate queues for this. This works without any problems. e.g. Consumer1 ignores Stocks -> Consumer1 can process all invoices and Consumer2 can process all Stocks
But if there is a large number of messages already in the Queue (of one type, e.g. stocks) and I send a message of another type (e.g. invoices), Consumer1 won't process the message of type invoices. It will instead be idle until Consumer2 has processed all Stocks messages. It does not happen every time, but quite often.
Is there any option to change the order of the new messages coming into the queue, such that an idle consumer with matching selector picks up the new message?
Things I've already tried:
using a PendingMessageLimitStrategy -> it seems like it does not work for queues
increasing the maxPageSize and maxBrowsePageSize in the hope that once all Messages are in RAM, the Consumers will search for their messages.
Exclusive Consumers aren't an option since I want to be able to use more than one Consumer per message type.
Im pretty sure that there is some configuration which allows this type of usage. I'm aware that there are better solutions for this issue, but sadly I can't use them easily due to other constraints.
Thanks a lot in advance!
EDIT: I noticed that when I'm refreshing on the localhost queue browser, the stuck messages get executed immediately. It seems like this action performs some sort of queue refresh where the messages get filtered based on their selector again. So I just need this action whenever a new message enters the queue...
This is a 'window' problem where the next set of 'stocks' data needs to be processed before the 'invoicing' data can be processed.
The gotcha with window problems like this is that you need to account for the fact that some messages may never come through, or a consumer may never come back online either. Also, eventually you will be asked 'how many invoices or stocks are left to be processed'-- aka observability.
ActiveMQ has you covered-- check out wild-card destinations and consumers.
Produce 'stocks' to:
queue://data.stocks.input
Produce 'invoices' to:
queue://data.invoices.input
You then setup consumes to connect:
queue://data.*.input
note: the wildard '*'.
ActiveMQ will match queues based on the wildcard pattern, and then process data accordingly. As a bonus, you can still use a selector.

MassTransit MessageData Management

I have been starting to make greater use of the message data feature of masstransit and am getting to the point needing to manage the message data in the store - i.e. remove old data.
The obvious choice is to have some outside process tidy up data, but clearly a scheduled (or not) clean up could remove data still in use or referenced by error or dead letter queues.
Ideally I would like to limit stored message data retention to messages only in error or dead letter queues, and automatically remove data for messages that have been successfully processed.
What would be the best approach to achieve this with MassTransit? Perhaps with a MiddleWare approach or similar, and if that is the case what is the correct approach?
Manual cleanup is recommended, using whatever makes sense for the repository in use. Because messages may still be in queues, or in error/dead-letter queues as you pointed out, it is really up to development/operations team to know when the right time is to remove older message data.
I'd suggest monitoring and managing the error/dead-letter queues more aggressively, keeping them empty. And then, just figure a good timeframe to delete old message data - one week, ten days, whatever - and deal with it that way.
I have had a backlog item to come up with a way to automatically manage message data, but since message data can be forwarded (using the same stored data) either via publish or send, there is no good way to track references.

JMS rewrite message

I know that JMS messages are immutable. But I have a task to solve, which requires rewrite message in queue by entity id. Maybe there is a problem with system design, help me please.
App A sends message (with entity id = 1) to JMS. App B checks for new messages every minute.
App A might send many messages with entity id = 1 in a minute, but App B should see just the last one.
Is it possible?
App A should work as fast as possible, so I don't like the idea to perform removeMatchingMessages(String selector) before new message push.
IMO the approach is flawed.
Even if you did accept clearing off the queue by using a message selector to remove all messages where entity id = 1 before writing the new message, timing becomes an issue: it's possible that whichever process writes the out-dated messages would need to complete before the new message is written, some level of synchronization.
The other solution I can think of is reading all messages before processing them. Every minute, the thread takes the messages and bucketizes them. An earlier entity id = 1 message would be replaced by a later one, so that at the end you have a unique set of messages to process. Then you process them. Of course now you might have too many messages in memory at once, and transactionality gets thrown out the window, but it might achieve what you want.
In this case you could actually be reading the messages as they come in and bucketizing them, and once a minute just run your processing logic. Make sure you synchronize your buckets so they aren't changed out from under you as new messages come in.
But overall, not sure it's going to work

Spring integration - Keep messages after delivery

1) I'm interested to learn if it is possible to keep the messages that were delivered using Spring Integration. I'm already using the mongo persistent storage (ConfigurableMongoDbMessageStore), but only failed messages remain in the collection. Ideally, I want all messages to remain with the functionality to list them and retry them.
I would use a field "status" or similar to identify queued, succesful or failed messages. Not sure if this field exists already, but I'm guessing something similar must be in place.
2) Also, when a message fails and is persited, there is a lot more data in the message. This data is serialised, so I'm curious how I can extract the original message and retry it.
3) The goal is to create an interface in the webapp where all queued messages can be seen, and retried. Not only failed messages, but also succesful deliveries (useful for testing).
I looked everywhere for an answer to this, but could not find it.
Thanks
I'd say it isn't good design for queue component.
Right it returns failed messages to the queue back for the future redelivery, but good message should be removed from the queue to avoid duplication on the next poll from queue.
No, there is no "status" field on the message, because you use store as a queue.
BTW Spring Integration provides separete implementation for queue channels: MongoDbChannelMessageStore.
You can achieve it with separate parallel Mongo collection and store your message twice: for the queue and for the future analysis. Here you can introduce "status" field and control it, when message successful or not.
From here you can introduce you UI to manage that collection and provide actions like send, retry. Remove the message from here and send it again to those two collections.
HTH

Camel JMS ensuring ordering when unsidelining from dead letter channel

I am using camel to integrate with ActiveMQ JMS. I am receiving prices for products on this queue. I am using JMSXGroupID on productId to ensure ordering across a productId. Now if I fail to process this message I move it to a DeadLetterQueue. This could be because of a connection error on a dependent service or because of error with the message itself.
In case of the former I would have to manually remove it from the DLQ and put it back into the JMS queue.
Now the problem is that I dont know if any other message on that groupId has been received and processed or not. And hence unsidelining from DLQ will disrupt the order. On the other hand if I dont unsideline it and no other message has been received the product Id will not get the correct price.
1 solution that I have in mind is to use a fast key-value store(Redis) to store the last messageId or JMSTimestamp against a productId(message group). This is updated everytime I dequeue a message. Any other solution for this?
Relying on message order in JMS is a risky business - at best.
The best thing to do is to make the receiver handle messages out of sequence as a special case (but may take advantage message order during normal operation).
You may also want to distinguish between two errors: posion messages and temporary connection problems, maybe even use two different error queues for them. In the case of a posion message (invalid payload etc.) then there is nothing you can really do about it except starting a bug investigation. In such cases, you can probably send along "something else", such as dummy message to not interfere with order.
For the issues with connection problems, you can have another strategy - ActiveMQ Redelivery Policies. If there is network trouble, it's usually no use in trying to process the second message until the first has been handled. A Redelivery Policy ensures that (given you have a single consumer, that is). There is another question at SO where the poster actually has a solution to your problem and wants to avoid it. Read it. :)

Resources