how does delivery failure affect JMS message ordering - jms

I am using weblogic 11g but this question applies to JMS messaging in general.
Lets assume i have messages in queue in the order 5-4-3-2-1
If message#1 fails to deliver and there is a re-delivery delay of 30 secs on the JMS queue. Will the messages behind 1 get delivered during those 30 secs or will they also have to wait for 30 secs on this case ?

I found the answer .. leaving reference for future.
Following article
http://middlewaremagic.com/weblogic/?p=6334
lists the following in Poison Messages section -
"Note that messages with a redelivery delay do not prevent other messages from being delivered"

Related

Spring JMS Message Listener - DMLC - what is benefit of polling?

I know the DefaultMessageListenerContainer polls by design. And that the receiveTimeout which sets the polling interval defaults to 1 second.
The way I understand it is that the DMLC will issue a get, and waits the 'receiveTimeout' defined interval (1 second) before it times out and issues another get.
From what I have read, we can set this receiveTimout value to a larger value and have NO effect on messages getting picked up from the MQ because the active 'get' will sit on the listener until a message arrives... and once/if the timeout interval expires it will just submit another get which remains active on the queue until a message arrives.
So my questions is, what is the benefit of a smaller receiveTimout interval? If we are always going to process a message when it arrives, why on earth would we want to poll the queue every second?
We are running many large applications, and the polling is simply running the CPU usage/bill through the roof, and I cannot find a justification for this.
Yes - the 1 second receive timeout can be very CPU intensive with a large number of queues.
The general idea for the DefaultMessageListenerContainer was to wait for a bit (1 second seems to be a very short wait period), and then, if you don't get a message, it actually tears everything down and does a full reconnect. This is kind of a poor-mans error handling. "If I haven't heard from the broker, assume that something is broken, drop everything and reconnect". If the reconnect were not so expensive, it might not be a bad strategy. Or if you have only one queue. Or maybe you are expecting 10 messages a second and do want to reconnect if a second goes by. If you have a reasonable number of destinations, the reconnect traffic can get downright abusive.
For IBM MQ, failures on the JMS connection/session are reliably picked up. You don't have the, "it just sits there not getting any messages for some reason" scenario. So setting the timeout to 10 minutes (whatever) would be fine.
Note that if you are running in a JEE application server, and your JMS connections are managed by the JCA, then that layer is responsible for detecting bad connections and you don't have to worry about it up in the application layer.
With Camel and for SpringBoot GitHub might be useful.

Does MassTransit's retry brehavior block consuming other messages?

Given MassTransit is configured with a concurrency of 1
and has a retry policy of 1 hour for failed messages
and the queue starts with 2 messages
and consuming the first message fails:
Does MassTransit
1) wait for an hour before trying the first message again while the second message stays enqueued
or
2) wait for an hour before trying the first message again while proceeding to try the second message?
Simple answer: 1.
There are two ways to retry using MassTransit.
.UseMessageRetry(r => r.???);
This is in-memory and keeps the message locked. It is also an active message consumption, so if a prefetch count or concurrency limit is used, it will continue to count towards that limit.
.UseScheduledRedelivery(r => r.???);
This reschedules the message for delivery using a scheduler (which may be supported by the broker, or via Quartz.NET). It does not block subsequent messages and will enqueue the message for future delivery.
Both are documented here.

AWS SQS - Queue not delivering any messages until Visibility Timeout expires for one message

EDIT: Solved this one while I was writing it up :P -- I love those kind of solutions. I figured I'd post it anyway, maybe someone else will have the same problem and find my solution. Don't care about points/karma, etc. I just already wrote the whole thing up, so figured I'd post it and the solution.
I have an SQS FIFO queue. It is using a dead letter queue. Here is how it had been configured:
I have a single producer microservice, and I have 10 ECS images that are running as consumers.
It is important that we process the messages close to the time they are delivered in the queue for business reasons.
We're using a fairly recent version of the AWS SDK Golang client package for both producer and consumer code (if important, I can go look up the version, but it is not terribly outdated).
I capture the logs for the producer so I know exactly when messages were put in the queue and what the messages were.
I capture aggregate logs for all the consumers, so I have a full view of all 10 consumers and when messages were received and processed.
Here's what I see under normal conditions looking at the logs:
Message put in the queue at time x
Message received by one of the 10 consumers at time x
Message processed by consumer successfully
Message deleted from queue by consumer at time x + (0-2 seconds)
Repeat ad infinitum for up to about 700 messages / day at various times per day
But the problem I am seeing now is that some messages are not being processed in a timely manner. Occasionally we fail processing a message deliberately b/c of the state of the system for that message (e.g. maybe users still logged in, so it should back off and retry...which it does). The problem is if the consumer fails a message it is causing the queue to stop delivering any other messages to any other consumers.
"Failure to process a message" here just means the message was received, but the consumer declared it a failure, so we just log an error, and do not proceed to delete it from the queue. Thus, the visibility timeout (here 5m) will expire and it will be re-delivered to another consumer and retried up to 10 times, after which it will go to the dead letter queue.
After delving into the logs and analyzing it, here's what I'm seeing:
Process begins like above (message produced, consumed, deleted).
New message received at time x by consumer
Consumer fails -- logs error and just returns (does not delete)
Same message is received again at time x + 5m (visibility timeout)
Consumer fails -- logs error and just returns (does not delete)
Repeat up to 10x -- message goes to dead-letter queue
New message received but it is now 50 minutes late!
Now all messages that were put in the queue between steps 2-7 are 50 minutes late (5m visibility timeout * 10 retries)
All the docs I've read tells me the queue should not behave this way, but I've verified it several times in our logs. Sadly, we don't have a paid AWS support plan, or I'd file a ticket with them. But just consider the fact that we have 10 separate consumers all reading from the same queue. They only read from this queue. We don't have any other queues it is using.
For de-duplication we are using the automated hash of the message body. Messages are small JSON documents.
My expectation would be if we have a single bad message that causes a visibility timeout, that the queue would still happily deliver any other messages it has available while there are available consumers.
OK, so turns out I missed this little nugget of info about FIFO queues in the documentation:
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/FIFO-queues.html
When you receive a message with a message group ID, no more messages
for the same message group ID are returned unless you delete the
message or it becomes visible.
I was indeed using the same Message Group ID. Hadn't given it a second thought. Just be aware, if you do that and any one of your messages fails to process, it will back up all other messages in the queue, until the time that the message is finally dealt with. The solution for me was to change the message group id. There is some business logic id I can postfix on it that will work for me.

jms:message-driven-channel-adapter should not poll for messages older than 30 mins

I want to poll for messages in a queue which are not older than 30 mins.
How do I do that with jms:message-driven-channel-adapter ?
Please help.
Such functionality is not supported by the JMS specification.
On the producer side, you can set a time to live on the message which will cause the message to be removed if not consumed within that time.
You could use a selector to query messages based on a timestamp header. But I have to say that selectors usualy don't have good performance.
A topic would be more apropriated for this kind of logic (message that expires after a while) but I don't know if it would be suitable to your business logic because a message in a topic is received by every consumer/listener subscribed.

Spring DefaultMessageListenerContainer And ActiveMQ

I have configured Spring DefaultMessageListenerContainer as ActiveMQ consumer consuming messages from a queue. Let's call it "Test.Queue"
I have this code deployed in 4 different machines and all the machines are configured to the same ActiveMQ instance to process the messages from the same "Test.Queue" queue.
I set the max consumer size to 20 as soon as all the 4 machines are up and running, I see the number of consumers count against the queue as 80 (4 * max consumer size = 80)
Everything is fine when the messages produced and sent to the queue grows high.
When there are 1000's of messages and among the 80 consumers, let's say one of them is stuck it puts a freeze on Active MQ to stop sending messages to other consumers.
All messages are stuck in ActiveMQ forever.
As I have 4 machines with up to 80 consumers , I have no clue as to see which consumer failed to acknowledge.
I go stop and restart all the 4 machines and when I stop the machine that has the bad consumer which got stuck, then messages starts flowing again.
I don't know how to configure DefaultMessageListenerContainer to abandon the bad consumer and signal ActiveMQ immediately to start sending messages.
I was able to create the scenario even without Spring as follows:
I produced up to 5000 messages and sent them to the "Test.Queue" queue
I created 2 consumers (Consumer A, B) and in one consumer B's
onMessage() method, I put the thread to sleep for a long time (
Thread.sleep(Long.MAX_VALUE)) having the condition like when current time % 13 is 0 then put the thread to sleep.
Ran these 2 consumers.
Went to Active MQ and found that the queue has 2 consumers.
Both A and B are processing messages
At some point of time consumer B's onMessage() gets called and it puts the Thread to sleep when the condition of current time % 13 is 0 is satisified.
The consumer B is stuck and it can't acknowledge to the broker
I went back to Active MQ web console, still see the consumers as 2, but no messages are dequeued.
Now I created another consumer C and ran it to consume.
Only the consumer count in ActiveMQ went up to 3 from 2.
But Consumer C is not consuming anything as the broker failed sending any messages holding them all as it is still waiting for consumer B to acknowledge it.
Also I noticed Consumer A is not consuming anything
I go and kill consumer B , now all messages are drained.
Let's say A, B, C are managed by Spring's DefaultMessageListenerContainer, how do I tweak Spring DefaultMessageListenerContainer to take that bad consumer off the pool (in my case consumer B) after it failed to acknowledge for X number of seconds, acknowledge the broker immediately so that the broker is not holding onto messages forever.
Thanks for your time.
Appreciate if I get a solution to this problem.
here are a few options to try...
set the queue prefetch to 0 to promote better distribution across consumers and reduce 'stuck' messages on specific consumers. see http://activemq.apache.org/what-is-the-prefetch-limit-for.html
set "?useKeepAlive=false&wireFormat.maxInactivityDuration=20000" on the connection to timeout the slow consumer after a specified inactive time
set the queue policy "slowConsumerStrategy->abortSlowConsumer"...again to timeout a slow consumer
<policyEntry ...
...
<slowConsumerStrategy>
<abortSlowConsumerStrategy />
</slowConsumerStrategy>
...
</policyEntry>

Resources