Azure Queues - Functions - Message Visibility - Workers? - visibility

I have some questions regarding the capabilities regarding Azure Queues, Functions, and Workers. I'm not really sure how this works.
Scenario:
q-notifications is an queue in an Azure storage account.
f-process-notification is a function in Azure that is bound to q-notifications. Its job is to get the first message on the queue and process it.
In theory when a message is added to q-notifications, the function f-process-notification should be called.
Questions:
Does the triggered function replace the need to have workers? In other words, is f-process-notification called each time a message is placed in the queue.
Suppose I place a message on the queue that has a visibility timeout of 5 minutes. Basically I am queueing the message but it shouldn't be acted on until 5 minutes pass. Does the queue trigger f-process-notification immediately when the message is placed on the queue, or will it only trigger f-process-notification when the message becomes visible, i.e. 5 minutes after it is placed on the queue?

In Azure Functions, each Function App instance running your queue triggered function will have its own listener for the target queue. It monitors the queue for new work using an exponential backoff strategy. When new items are added to the queue the listener will pull multiple items off of the queue (batching behavior is configurable) and dispatch then in parallel to your function. If your function is successful, the message is deleted, otherwise it will remain on the queue to be reprocessed. To answer your question - yes we respect any visibility timeout you specify. If a message is added with a 5 minute timeout it will only be processed after that.
Regarding scale out - when N instances of your Function App are running they will all cooperate in processing the queue. Each queue listener will independently pull batches of messages off the queue to process. In effect, the work will be load balanced across the N instances. Exactly what you want :) Azure Functions is implementing all the complexities of the multiple consumer/worker pattern for you behind the scenes.

I typically use a listener logic as opposed to triggers. The consumer(s) are constantly monitoring the queue for messages. If you have multiple consumers, for example 5 instances of the consuming code in different Azure worker roles processing the same bus/queue, the first consumer to get the message wins (they are "competing"). This provides a scaling scenario common in a SOA architecture..
This article describes some of the ways to defer processing.
http://markheath.net/post/defer-processing-azure-service-bus-message
good luck!

Related

Real-time monitoring of SQS queue in AWS

What's the best way to provide real-time monitoring of the total count of messages sent to an SQS queue?
I currently have a Grafana dashboard set up to monitor an SQS queue, but it seems to refresh about every two minutes. I'm looking to get something set up to update almost in real-time, e.g. refresh every second.
The queue I'm using consumes around 6,000 messages per minute.
Colleagues of mine have built something for real-time monitoring of uploads to an S3 bucket, using a lambda to populate a PostgreSQL DB and using Grafana to query this.
Is this the best way of achieving this? Is there a more efficient way?
SQS is not event driven - it must be polled. Therefore, there isn't an event each time a message is put into the queue or removed from it. With S3 to Lambda there is an event sent in pretty much real time every time an object has been created or removed.
You can change the polling interval for SQS and poll as fast as you'd like. But be aware that polling does have a cost. The first 1 million requests a month are free.
I'm not sure what you're trying to accomplish (I'll address after my idea), but there's certainly a couple ways you could accomplish this. Each has positive and negative.
In every place you produce or consume messages, increment or decrement a cloudwatch metric (or datadog, librato, etc). It's still polling-based, but you could get the granularity down (even by using Cloudwatch) to 15-60 seconds. The biggest problem here is that it's error prone (what happens if the SQS message times out and gets reprocessed?).
Create a secondary queue. Each message that goes into this queue is either a "add" or "delete" message. Attach a lambda, container, autoscale group to process the queue and update metrics in an RDS or DynamoDB table. Query the table as needed.
Use a different queue processing system instead of SQS. I've seen RabbitMQ and Sensu used in very large environments, they will easily handle 6,000 messages per minute.
Keep in mind, there are a lot more metrics than just number of messages in the queue. I've recently become really fond of ApproximateAgeOfOldestMessage, because it indicates whether messages are being processed without error. Here's a blog post about the most helpful SQS metrics. It's called How to Monitor Amazon SQS with CloudWatch

How to handle side effects based on multiple events in a message driven microservice system?

we are currently working in a message driven Microservice environment and some of our messages/events are event sourced (using Apache Kafka). Now we are struggling with implementing more complex business requirements, were we have to take multiple events into account to create new events and side effects.
In the current situation we are working with devices that can produce errors and we already process them and have a single topic which contains ERROR_OCCURRED and ERROR_RESOLVED events (so they are in order). We also make sure, that all messages regarding a specific device always go onto the same partition. And both messages share an ID that identifies that specific error incident. We already have a projection that consumes those events and provides an API for our customers, s.t. they can see all occurred errors and their current state.
Now we have to deal with the following requirement:
Reporting Errors
We need a push system that reports errors of devices to our external partners, but only after 15 minutes and if they have not been resolved in that timeframe. Our first approach was to consume all ERROR_RESOLVED events, store the IDs and have another consumer that is handling the ERROR_OCCURRED events in a delayed fashion (e.g. by only consuming the next ERROR_OCCURRED event on the topic if its timestamp is at least 15 minutes old). We would then be able to know if that particular error has already been resolved and does not need to be reported (since they share a common ID with the corresponding ERROR_RESOLVED event). Otherwise we send an HTTP request to our external partner and create an ERROR_REPORTED event on a new topic. Is there any better approach for delayed and conditional message processing?
We also have to take the following special use cases into account:
Service restarts: currently we are planning to keep the list of resolved errors in memory, so if a service restarts, that list has to be created from scratch. We could just replay the ERROR_RESOLVED messages, but that may take some time and in that time no ERROR_OCCURRED events should be processed because that may result in reporting errors that have been resolved in less then 15 minutes, but we are just not aware of it. Are there any good practices regarding replay vs. "normal" processing?
Scaling: we may increase or decrease the number of instances of our service at any time, so the partition assignment may change during runtime. That should not be a problem if we create a consumer group for each service instance when consuming the ERROR_RESOLVED events, s.t. every instance knows all resolved errors while still only handling the ERROR_OCCURRED events of its assigned partitions (in another consumer group which is shared by all instances). Is there a better approach for handling partition reassignment and internal state?
Thanks in advance!
For side effects, I would record all "side" actions in the event store. In your particular example, when it is time to send a notification, I would call SEND_NOTIFICATION command that emit NOTIFICATION_SENT event. These events would be processed by some worker process that does actual HTTP request.
Actually I would elaborate this even furter, since notifications could fail, so I would have, say, two events NOTIFICATION_REQUIRED, and NORIFICATION_SENT, so we can retry failed notifications.
And finally your logic would be "if error was not resolved in 15 minutes and notification was not sent - send a notification (or just discard if it missed its timeframe)"

Performance and limitations of temporary queues

I want a bunch of several hundred client apps to create and use temporary queues at one instance of the middleware.
Are there some cons regarding performance why I shouldn't use temp queues? Are there limitations, for example on how many temp. queues can be created per HornetQ instance?
On a recent project we have switched from using temporary queues to using static queues on SonicMQ. We had implemented synchronous service calls over JMS where the response of each call would be delivered on a dedicated temporary queue, created by the consumer. During stress testing we noticed that the overhead of temporary queue creation and allocated resources started to play a bigger and bigger part when pushing the maximum throughput of the solution.
We changed the solution so it would use static queues between consumer and provider and use a selector to correlate on the JMSCorrelationID. This resulted in better throughput in our case. If you are planning on each time (re)creating the temporary queues that your client applications will use, it could start to impact performance when higher throughput rates are needed.
Note that selector performance can also start to play when the number of messages in a queue increase. In our case the solution was designed to hand-off the messages as soon as possible and not play the role of a (storage) buffer in between consumer and provider. As such the number of message inside a queue would always be low.

SQS/SNS and Architecting For Disposable Computing ( EC2 SPOT Instances )

I have an application that reads a message from SQS (let's call the queue "p" ), does computationally expensive image processing ( step #1 ), uploads the result to S3 and deletes the message from the queue "p" and then sends a notification to a SNS topic ( this SNS topic routes the message to another queue called "q" ). There is another application that reads from queue "q" and does the second stage of the image processing ( downloads the result of step #1 from S3 and does additional mathematical operations on that result ).
I have a combination of regular instances + spot instances running the step #1 application.
I know that ( because of the SQS visibility time-out concept ) if the spot instances get shut down during image processing phase , SQS makes the messages visible again to other consumers so the non-spot EC2 instances will eventually do the work that the spot instances did not manage to complete due to the system shutdown.
Now my question is : what happens if the spot instances get shut down exactly after the delete but before a message is sent to SNS ? How can we recover from such an event ?
# PSEUDO CODE
msg = read message from queue
result = doWork(msg)
upload result to S3
delete msg
publish to sns about result
Cheers !
First of all, process A should not delete the message from its SQS queue until AFTER it has sent the SNS message to kick of the second process. Deleting the message from the queue is the very last thing you should do to signal that 'my work is done'. Until the SNS message is sent, the work is not done.
Secondly, one of the key things that you need to embrace when designing processes like this, (and especially when using spot instances) is the concept of Idempotence: http://en.wikipedia.org/wiki/Idempotence
A unary operation (or function) is idempotent if, whenever it is applied twice to any value, it gives the same result as if it were applied once
Further more: http://aws.amazon.com/sqs/faqs/#How_many_times_will_I_receive_each_message
Amazon SQS is engineered to provide “at least once” delivery of all messages in its queues. Although most of the time each message will be delivered to your application exactly once, you should design your system so that processing a message more than once does not create any errors or inconsistencies.
What this ultimately means, whether or not a spot instance gets shut down mid-process, there is the real possibility, that a given message in an SQS queue will be simultaneously delivered to multiple worker processes or delivered to the same process more than once, either because SQS sent it twice, or the spot fails after SNS message is sent but before the SQS queue is updated.
Without knowing exactly what your processing entails I couldn't tell you how to make your process idempotent, but don't try to solve the problem 'what happens if the spot instances gets shutdown mid-stream', think about 'how do I design each step in the process so that it can be run multiple times, with the same inputs and not cause any problems - if you do that, you will kill two birds with one stone.

Multi-Thread Processing in .NET

I already have a few ideas, but I'd like to hear some differing opinions and alternatives from everyone if possible.
I have a Windows console app that uses Exchange web services to connect to Exchange and download e-mail messages. The goal is to take each individual message object, extract metadata, parse attachments, etc. The app is checking the inbox every 60 seconds. I have no problems connecting to the inbox and getting the message objects. This is all good.
Here's where I am accepting input from you: When I get a message object, I immediately want to process the message and do all of the busy work explained above. I was considering a few different approaches to this:
Queuing the e-mail objects up in a table and processing them one-by-one.
Passing the e-mail object off to a local Windows service to do the busy work.
I don't think db queuing would be a good approach because, at times, multiple e-mail objects need to be processed. It's not fair if a low-priority e-mail with 30 attachments is processed before a high-priority e-mail with 5 attachments is processed. In other words, e-mails lower in the stack shouldn't need to wait in line to be processed. It's like waiting in line at the store with a single register for the bonehead in front of you to scan 100 items. It's just not fair. Same concept for my e-mail objects.
I'm somewhat unsure about the Windows service approach. However, I'm pretty confident that I could have an installed service listening, waiting on demand for an instruction to process a new e-mail. If I have 5 separate e-mail objects, can I make 5 separate calls to the Windows service and process without collisions?
I'm open to suggestions or alternative approaches. However, the solution must be presented using .NET technology stack.
One option is to do the processing in the console application. What you have looks like a standard producer-consumer problem with one producer (the thread that gets the emails) and multiple consumers. This is easily handled with BlockingCollection.
I'll assume that your message type (what you get from the mail server) is called MailMessage.
So you create a BlockingCollection<MailMessage> at class scope. I'll also assume that you have a timer that ticks every 60 seconds to gather messages and enqueue them:
private BlockingCollection<MailMessage> MailMessageQueue =
new BlockingCollection<MailMessage>();
// Timer is created as a one-shot and re-initialized at each tick.
// This prevents the timer proc from being re-entered if it takes
// longer than 60 seconds to run.
System.Threading.Timer ProducerTimer = new System.Threading.Timer(
TimerProc, null, TimeSpan.FromSeconds(60), TimeSpan.FromMilliseconds(-1));
void TimerProc(object state)
{
var newMessages = GetMessagesFromServer();
foreach (var msg in newMessages)
{
MailMessageQueue.Add(msg);
}
ProducerTimer.Change(TimeSpan.FromSeconds(60), TimeSpan.FromMilliseconds(-1));
}
Your consumer threads just read the queue:
void MessageProcessor()
{
foreach (var msg in MailMessageQueue.GetConsumingEnumerable())
{
ProcessMessage();
}
}
The timer will cause the producer to run once per minute. To start the consumers (say you want two of them):
var t1 = Task.Factory.StartNew(MessageProcessor, TaskCreationOptions.LongRunning);
var t2 = Task.Factory.StartNew(MessageProcessor, TaskCreationOptions.LongRunning);
So you'll have two threads processing messages.
It makes no sense to have more processing threads than you have available CPU cores. The producer thread presumably won't require a lot of CPU resources, so you don't have to dedicate a thread to it. It'll just slow down message processing briefly whenever it's doing its thing.
I've skipped over some detail in the description above, particularly cancellation of the threads. When you want to stop the program, but let the consumers finish processing messages, just kill the producer timer and set the queue as complete for adding:
MailMessageQueue.CompleteAdding();
The consumers will empty the queue and exit. You'll of course want to wait for the tasks to complete (see Task.Wait).
If you want the ability to kill the consumers without emptying the queue, you'll need to look into Cancellation.
The default backing store for BlockingCollection is a ConcurrentQueue, which is a strict FIFO. If you want to prioritize things, you'll need to come up with a concurrent priority queue that implements the IProducerConsumerCollection interface. .NET doesn't have such a thing (or even a priority queue class), but a simple binary heap that uses locks to prevent concurrent access would suffice in your situation; you're not talking about hitting this thing very hard.
Of course you'd need some way to prioritize the messages. Probably sort by number of attachments so that messages with no attachments are processed quicker. Another option would be to have two separate queues: one for messages with 0 or 1 attachments, and a separate queue for those with lots of attachments. You could have one of your consumers dedicated to the 0 or 1 queue so that easy messages always have a good chance of being processed first, and the other consumers take from the 0 or 1 queue unless it's empty, and then take from the other queue. It would make your consumers a little more complicated, but not hugely so.
If you choose to move the message processing to a separate program, you'll need some way to persist the data from the producer to the consumer. There are many possible ways to do that, but I just don't see the advantage of it.
I'm somewhat a novice here, but it seems like an initial approach could be to have a separate high-priority queue. Every time a worker is available to obtain a new message, it could do something like:
If DateTime.Now - lowPriorityQueue.Peek.AddedTime < maxWaitTime Then
ProcessMessage(lowPriorityQueue.Dequeue())
Else If highPriorityQueue.Count > 0 Then
ProcessMessage(highPriorityQueue.Dequeue())
Else
ProcessMessage(lowPriorityQueue.Dequeue())
End If
In a single thread, while you can still have one message blocking the others, higher priority messages could be processed sooner.
Depending on how fast most messages get processed, the application could create a new worker on a new thread if the queues are getting too big or too old.
Please tell me if I'm completely off-base here though.

Resources