Is there any way to count how many time a job is requeued (via Reject or Nak) without manually requeu the job?
I need to retry a job for 'n' time and then drop it after 'n' time.
ps : Currently I requeue a job manually (drop old job, create a new job with the exact content and an extra Counter header if the Counter is not there or the value is less than 'n')
There are redelivered message property that set to true when message redelivered one or more time.
If you want to track redelivery count or left redelivers number (aka hop limit or ttl in IP stack) you have to store that value in message body or headers (literally - consume message, modify it and then publish it modified back to broker).
There are also similar question with answer which may help you: How do I set a number of retry attempts in RabbitMQ?
In the case that the message was actually dead-lettered, you can check the contents of the x-death message header.
This would for example be the case when you reject/nack with requeue = false and the queue has an associated dead letter exchange.
In that case, the contents of this header is an array. Each element describes a failed delivery attempt, containing information such as the time it was attempted delivered, routing information, etc.
This works for RabbitMQ - I don't know if it is applicable to AMQP in general.
EDIT
Since I originally wrote this answer, the x-death header structure has been changed.
It is generally a very bad thing that headers changes format, but
in this particular case the reason was that the message size would grow indefinitely if the message was continuously dead-lettered.
I have therefore removed the piece of code that used to be here to get the no of deaths for a message.
It is still possible to get the number of deaths from the new header format.
Related
ActiveMQ 5.15.13
Context: I have a single queue with multiple Consumers. I want to stop some consumers from processing certain messages. This has to be dynamic, I don't want to create separate queues for this. This works without any problems. e.g. Consumer1 ignores Stocks -> Consumer1 can process all invoices and Consumer2 can process all Stocks
But if there is a large number of messages already in the Queue (of one type, e.g. stocks) and I send a message of another type (e.g. invoices), Consumer1 won't process the message of type invoices. It will instead be idle until Consumer2 has processed all Stocks messages. It does not happen every time, but quite often.
Is there any option to change the order of the new messages coming into the queue, such that an idle consumer with matching selector picks up the new message?
Things I've already tried:
using a PendingMessageLimitStrategy -> it seems like it does not work for queues
increasing the maxPageSize and maxBrowsePageSize in the hope that once all Messages are in RAM, the Consumers will search for their messages.
Exclusive Consumers aren't an option since I want to be able to use more than one Consumer per message type.
Im pretty sure that there is some configuration which allows this type of usage. I'm aware that there are better solutions for this issue, but sadly I can't use them easily due to other constraints.
Thanks a lot in advance!
EDIT: I noticed that when I'm refreshing on the localhost queue browser, the stuck messages get executed immediately. It seems like this action performs some sort of queue refresh where the messages get filtered based on their selector again. So I just need this action whenever a new message enters the queue...
This is a 'window' problem where the next set of 'stocks' data needs to be processed before the 'invoicing' data can be processed.
The gotcha with window problems like this is that you need to account for the fact that some messages may never come through, or a consumer may never come back online either. Also, eventually you will be asked 'how many invoices or stocks are left to be processed'-- aka observability.
ActiveMQ has you covered-- check out wild-card destinations and consumers.
Produce 'stocks' to:
queue://data.stocks.input
Produce 'invoices' to:
queue://data.invoices.input
You then setup consumes to connect:
queue://data.*.input
note: the wildard '*'.
ActiveMQ will match queues based on the wildcard pattern, and then process data accordingly. As a bonus, you can still use a selector.
I'm working with Spring Cloud Stream and Rabbit, and I used the config defined here to set up a dead-letter queue (DLQ) and it works very nicely.
What I'd like to do is set a maximum amount of times a message goes to the DLQ before being discarded - is is possible to set this via config? If so, how? If not, what should I do to achieve this behaviour?
I'm looking for a code sample for the best answer, preferably in Kotlin (if relevant)
That depends whether you're using qurorum queues or not. I don't believe there's a default config for that without quorum queues. However, you should be able to store the redelivery count within a custom header or in the message itself.
Then if you set a MAX_REDELIVERY_COUNT constant in your application, you can check if the message exceeds the maximum number of redeliveries.
If you're not using quorum queues, I'd take a look at this answer:
How do I set a number of retry attempts in RabbitMQ?.
This answer has quite some good options.
However, when using quorum queues, you can set the delivery-limit option. More info on that can be found here: https://www.rabbitmq.com/quorum-queues.html#feature-matrix.
Edit 1: using custom headers
In order to publish a message with custom headers:
Map<String, Object> headers = new HashMap<String, Object>();
headers.put("latitude", 51.5252949);
headers.put("longitude", -0.0905493);
channel.basicPublish(exchangeName, routingKey,
new AMQP.BasicProperties.Builder()
.headers(headers)
.build(),
messageBodyBytes);
As found on https://www.rabbitmq.com/api-guide.html#publishing.
The problem is that the headers can't be simply updated. However, you could do this with a workaround. Let's say you want a maximum of 5 retries per message. If the message can't be processed, send it to a DLX. If the message doesn't exceed the maximum retries, read the original headers of the message, update the custom retry count header and resend it to the original queue.
If the message gets in de DLX and does exceed the maximum retry count, send the message as is to the DLX with a different routing key, which is bound to a queue for the "definitive" dead messages.
That'd mean that you would get something like this in a simplified diagram:
This is just an idea, I don't know if it'll work for sure, but it's the best that I can think of in your situation.
Edit 2: using the autoBindDlq
It seems like the Spring Cloud Stream Binder for RabbitMQ has this option. In the docs as found on https://github.com/spring-cloud/spring-cloud-stream-binder-rabbit, it says the following:
By using the optional autoBindDlq option, you can configure the binder to create and configure dead-letter queues (DLQs) (and a dead-letter exchange DLX, as well as routing infrastructure). By default, the dead letter queue has the name of the destination, appended with .dlq. If retry is enabled (maxAttempts > 1), failed messages are delivered to the DLQ after retries are exhausted. If retry is disabled (maxAttempts = 1), you should set requeueRejected to false (the default) so that failed messages are routed to the DLQ, instead of being re-queued. In addition, republishToDlq causes the binder to publish a failed message to the DLQ (instead of rejecting it). This feature lets additional information (such as the stack trace in the x-exception-stacktrace header) be added to the message in headers. See the frameMaxHeadroom property for information about truncated stack traces. This option does not need retry enabled. You can republish a failed message after just one attempt. Starting with version 1.2, you can configure the delivery mode of republished messages. See the republishDeliveryMode property.
EDIT: Solved this one while I was writing it up :P -- I love those kind of solutions. I figured I'd post it anyway, maybe someone else will have the same problem and find my solution. Don't care about points/karma, etc. I just already wrote the whole thing up, so figured I'd post it and the solution.
I have an SQS FIFO queue. It is using a dead letter queue. Here is how it had been configured:
I have a single producer microservice, and I have 10 ECS images that are running as consumers.
It is important that we process the messages close to the time they are delivered in the queue for business reasons.
We're using a fairly recent version of the AWS SDK Golang client package for both producer and consumer code (if important, I can go look up the version, but it is not terribly outdated).
I capture the logs for the producer so I know exactly when messages were put in the queue and what the messages were.
I capture aggregate logs for all the consumers, so I have a full view of all 10 consumers and when messages were received and processed.
Here's what I see under normal conditions looking at the logs:
Message put in the queue at time x
Message received by one of the 10 consumers at time x
Message processed by consumer successfully
Message deleted from queue by consumer at time x + (0-2 seconds)
Repeat ad infinitum for up to about 700 messages / day at various times per day
But the problem I am seeing now is that some messages are not being processed in a timely manner. Occasionally we fail processing a message deliberately b/c of the state of the system for that message (e.g. maybe users still logged in, so it should back off and retry...which it does). The problem is if the consumer fails a message it is causing the queue to stop delivering any other messages to any other consumers.
"Failure to process a message" here just means the message was received, but the consumer declared it a failure, so we just log an error, and do not proceed to delete it from the queue. Thus, the visibility timeout (here 5m) will expire and it will be re-delivered to another consumer and retried up to 10 times, after which it will go to the dead letter queue.
After delving into the logs and analyzing it, here's what I'm seeing:
Process begins like above (message produced, consumed, deleted).
New message received at time x by consumer
Consumer fails -- logs error and just returns (does not delete)
Same message is received again at time x + 5m (visibility timeout)
Consumer fails -- logs error and just returns (does not delete)
Repeat up to 10x -- message goes to dead-letter queue
New message received but it is now 50 minutes late!
Now all messages that were put in the queue between steps 2-7 are 50 minutes late (5m visibility timeout * 10 retries)
All the docs I've read tells me the queue should not behave this way, but I've verified it several times in our logs. Sadly, we don't have a paid AWS support plan, or I'd file a ticket with them. But just consider the fact that we have 10 separate consumers all reading from the same queue. They only read from this queue. We don't have any other queues it is using.
For de-duplication we are using the automated hash of the message body. Messages are small JSON documents.
My expectation would be if we have a single bad message that causes a visibility timeout, that the queue would still happily deliver any other messages it has available while there are available consumers.
OK, so turns out I missed this little nugget of info about FIFO queues in the documentation:
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/FIFO-queues.html
When you receive a message with a message group ID, no more messages
for the same message group ID are returned unless you delete the
message or it becomes visible.
I was indeed using the same Message Group ID. Hadn't given it a second thought. Just be aware, if you do that and any one of your messages fails to process, it will back up all other messages in the queue, until the time that the message is finally dealt with. The solution for me was to change the message group id. There is some business logic id I can postfix on it that will work for me.
I know that JMS messages are immutable. But I have a task to solve, which requires rewrite message in queue by entity id. Maybe there is a problem with system design, help me please.
App A sends message (with entity id = 1) to JMS. App B checks for new messages every minute.
App A might send many messages with entity id = 1 in a minute, but App B should see just the last one.
Is it possible?
App A should work as fast as possible, so I don't like the idea to perform removeMatchingMessages(String selector) before new message push.
IMO the approach is flawed.
Even if you did accept clearing off the queue by using a message selector to remove all messages where entity id = 1 before writing the new message, timing becomes an issue: it's possible that whichever process writes the out-dated messages would need to complete before the new message is written, some level of synchronization.
The other solution I can think of is reading all messages before processing them. Every minute, the thread takes the messages and bucketizes them. An earlier entity id = 1 message would be replaced by a later one, so that at the end you have a unique set of messages to process. Then you process them. Of course now you might have too many messages in memory at once, and transactionality gets thrown out the window, but it might achieve what you want.
In this case you could actually be reading the messages as they come in and bucketizing them, and once a minute just run your processing logic. Make sure you synchronize your buckets so they aren't changed out from under you as new messages come in.
But overall, not sure it's going to work
I am using camel to integrate with ActiveMQ JMS. I am receiving prices for products on this queue. I am using JMSXGroupID on productId to ensure ordering across a productId. Now if I fail to process this message I move it to a DeadLetterQueue. This could be because of a connection error on a dependent service or because of error with the message itself.
In case of the former I would have to manually remove it from the DLQ and put it back into the JMS queue.
Now the problem is that I dont know if any other message on that groupId has been received and processed or not. And hence unsidelining from DLQ will disrupt the order. On the other hand if I dont unsideline it and no other message has been received the product Id will not get the correct price.
1 solution that I have in mind is to use a fast key-value store(Redis) to store the last messageId or JMSTimestamp against a productId(message group). This is updated everytime I dequeue a message. Any other solution for this?
Relying on message order in JMS is a risky business - at best.
The best thing to do is to make the receiver handle messages out of sequence as a special case (but may take advantage message order during normal operation).
You may also want to distinguish between two errors: posion messages and temporary connection problems, maybe even use two different error queues for them. In the case of a posion message (invalid payload etc.) then there is nothing you can really do about it except starting a bug investigation. In such cases, you can probably send along "something else", such as dummy message to not interfere with order.
For the issues with connection problems, you can have another strategy - ActiveMQ Redelivery Policies. If there is network trouble, it's usually no use in trying to process the second message until the first has been handled. A Redelivery Policy ensures that (given you have a single consumer, that is). There is another question at SO where the poster actually has a solution to your problem and wants to avoid it. Read it. :)