When sending SMS messages, I notice that outgoing messages tend to get delayed for 15-30 seconds in something called the "outbound queue" as noted on the Twilio logs screen.
For example:
This is concerning because it leads to out-of-order delivery. I sent the above message, waited 5 seconds, then sent this one (same from, same recipient):
and the second one was delivered first. I understand the speed and success of delivering varies by carrier load, but I am expecting a FIFO process from Twilio's side.
Is the outbound queue owned entirely by Twilio, and is it affected by carrier delivery time, i.e., is there back pressure from slow carrier delivery? Is it a single queue shared by all API clients, or one per API client? What causes a message enqueued later to be dequeued earlier?
Related
I'm running a Go service that uses the Paho Go MQTT client for subscribing to a topic. The clients that produce the MQTT messages (also Paho, but on Android devices) log when they produce and my service logs when it receives. As you can see from this graph, there seems to be a pretty consistent "cap" right below 36.000 messages per day on the receiving side. The graphs follow each other almost perfectly up to the cap, but then it seems that the go service caps out on slightly below 600 messages per minute, which means around 10 msgs per second.
Where should I look for the solution to this? I cannot find any setting (options) that could explain this cap.
As per the comments paho.mqtt.golang defaults to ordered delivery of messages (the MQTT spec provides some guarantees re message ordering and and calling handlers in a go routine may break this). The upshot of this is that messages will be delivered one-by-one and, if your handler is not keeping up, a queue may form (at QOS1+ the broker needs to retain messages as it may be necessary to resend them).
Some brokers limit the number of messages queued for a client; for example the max_queued_messages option in Mosquitto defaults to 1000 (this default was lower in Mosquitto 1.X) and, if the queue exceeds the limit, "messages will be silently dropped".
This is what appears to have been happening here; the application was not keeping up with incoming messages so the broker began dropping messages when the queue exceeded a limit.
In many cases using the paho.mqtt.golangoption ClientOptions.SetOrderMatters(false) will help; with this option set the message handler will be called in a separate go routine (so the handler must be threadsafe). Alternatively start a go routine within the handler but note that this approach results in the ACK being sent before the handler completes (which may result in message loss if your application terminates unexpectedly).
EDIT: Solved this one while I was writing it up :P -- I love those kind of solutions. I figured I'd post it anyway, maybe someone else will have the same problem and find my solution. Don't care about points/karma, etc. I just already wrote the whole thing up, so figured I'd post it and the solution.
I have an SQS FIFO queue. It is using a dead letter queue. Here is how it had been configured:
I have a single producer microservice, and I have 10 ECS images that are running as consumers.
It is important that we process the messages close to the time they are delivered in the queue for business reasons.
We're using a fairly recent version of the AWS SDK Golang client package for both producer and consumer code (if important, I can go look up the version, but it is not terribly outdated).
I capture the logs for the producer so I know exactly when messages were put in the queue and what the messages were.
I capture aggregate logs for all the consumers, so I have a full view of all 10 consumers and when messages were received and processed.
Here's what I see under normal conditions looking at the logs:
Message put in the queue at time x
Message received by one of the 10 consumers at time x
Message processed by consumer successfully
Message deleted from queue by consumer at time x + (0-2 seconds)
Repeat ad infinitum for up to about 700 messages / day at various times per day
But the problem I am seeing now is that some messages are not being processed in a timely manner. Occasionally we fail processing a message deliberately b/c of the state of the system for that message (e.g. maybe users still logged in, so it should back off and retry...which it does). The problem is if the consumer fails a message it is causing the queue to stop delivering any other messages to any other consumers.
"Failure to process a message" here just means the message was received, but the consumer declared it a failure, so we just log an error, and do not proceed to delete it from the queue. Thus, the visibility timeout (here 5m) will expire and it will be re-delivered to another consumer and retried up to 10 times, after which it will go to the dead letter queue.
After delving into the logs and analyzing it, here's what I'm seeing:
Process begins like above (message produced, consumed, deleted).
New message received at time x by consumer
Consumer fails -- logs error and just returns (does not delete)
Same message is received again at time x + 5m (visibility timeout)
Consumer fails -- logs error and just returns (does not delete)
Repeat up to 10x -- message goes to dead-letter queue
New message received but it is now 50 minutes late!
Now all messages that were put in the queue between steps 2-7 are 50 minutes late (5m visibility timeout * 10 retries)
All the docs I've read tells me the queue should not behave this way, but I've verified it several times in our logs. Sadly, we don't have a paid AWS support plan, or I'd file a ticket with them. But just consider the fact that we have 10 separate consumers all reading from the same queue. They only read from this queue. We don't have any other queues it is using.
For de-duplication we are using the automated hash of the message body. Messages are small JSON documents.
My expectation would be if we have a single bad message that causes a visibility timeout, that the queue would still happily deliver any other messages it has available while there are available consumers.
OK, so turns out I missed this little nugget of info about FIFO queues in the documentation:
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/FIFO-queues.html
When you receive a message with a message group ID, no more messages
for the same message group ID are returned unless you delete the
message or it becomes visible.
I was indeed using the same Message Group ID. Hadn't given it a second thought. Just be aware, if you do that and any one of your messages fails to process, it will back up all other messages in the queue, until the time that the message is finally dealt with. The solution for me was to change the message group id. There is some business logic id I can postfix on it that will work for me.
I am creating NATS go lang Queue Subscriber client as follows,
nc.QueueSubscribe("foo", "my_queue", func(msg *nats.Msg) {
log.Printf("Message :%s", string(msg.Data))
})
So whenever i publish any message to "foo" subject then some time it is receiving and some time not.
e.g let say i sent 10 messages to above "foo" subject then it will receive 2 or 3 max.
My requirement is as follows,
There should be Queue Subscription.
All input events should be processed.
How to implement Queue Subscribe in concurrent mode.
Any help appreciated.
If you start multiple queue subscribers with the same name (in your example my_queue), then a message published on "foo" goes to only one of those queue subscribers.
I am not sure from your statement if you imply that the queue subscriber sometimes misses messages or not. Keep in mind one thing: there is no persistence in NATS (there is in NATS Streaming). So if you publish messages before the subscriber is created, and if there is no other subscriber on that subject, the messages will be lost.
If you were experimenting and starting the queue subscriber from one connection and then in the same application sending messages from another connection, it is possible that the server did not register the queue subscription before it started to receive messages (again, if you were using 2 connections). If that is the case, you would need to flush the connection after creating the subscription and before starting sending: nc.Flush().
Finally, there is nothing special to use queue subscribers in concurrent mode. This is what they are for: load balancing processing of messages on the same subject for subscribers belonging to the same group. The only thing you have to be careful of if you are creating multiple queue subscribers in the same application is either to not share the message handler or if you do, you need to use locking since the message handler would be concurrently invoked if messages arrive fast enough.
Does the prefetch config locks away the messages so that other consumers will not be able to consume them?
Do they reflect immediately for example, if I have 1000 messages, and I have a prefetch value of 1000 on my consumers, will one consumer "reserve" all those messages to its self?
The messages in a client's prefetch buffer are not dispatched to any other client until the client holding them closes and it has some outstanding messages.
If the client comes online and it is the sole consumer on the destination it will start prefetching right away, if there are other clients the destination and it is a Queue then the messages are round robin dispatched to the clients until their prefetch buffers are full. Multiple clients on a Queue act as load balancers.
I am writing a Message Handler for an ebXML message passing application. The message follow the Request-Response Pattern. The process is straightforward: The Sender sends a message, the Receiver receives the message and sends back a response. So far so good.
On receipt of a message, the Receiver has a set Time To Respond (TTR) to the message. This could be anywhere from seconds to hours/days.
My question is this: How should the Sender deal with the TTR? I need this to be an async process, as the TTR could be quite long (several days). How can I somehow count down the timer, but not tie up system resources for large periods of time. There could be large volumes of messages.
My initial idea is to have a "Waiting" Collection, to which the message Id is added, along with its TTR expiry time. I would then poll the collection on a regular basis. When the timer expires, the message Id would be moved to an "Expired" Collection and the message transaction would be terminated.
When the Sender receives a response, it can check the "Waiting" collection for its matching sent message, and confirm the response was received in time. The message would then be removed from the collection for the next stage of processing.
Does this sound like a robust solution. I am sure this is a solved problem, but there is precious little information about this type of algorithm. I plan to implement it in C#, but the implementation language is kind of irrelevant at this stage I think.
Thanks for your input
Depending on number of clients you can use persistent JMS queues. One queue per client ID. The message will stay in the queue until a client connects to it to retrieve it.
I'm not understanding the purpose of the TTR. Is it more of a client side measure to mean that if the response cannot be returned within certain time then just don't bother sending it? Or is it to be used on the server to schedule the work and do what's required now and push the requests with later response time to be done later?
It's a broad question...