Azure Queue GetMessage Conditionally - azure-cloud-services

We have multiple job messages being queued to cloud storage queue. The worker roles GetMessage() from respective Job Queues (We have multiple queues) and process them.
We are trying to create multiple workers process messages from same queue using dequeue visibility and instance identifiers however for these workers to GetMessage() I am not able to find a way to conditionally filter job messages that only this Worker is allotted to process.
For example, If Worker A is looking for JobMessages that has WorkerName='WorkerA' as one of its property or parameter, instead of using JobQueue.GetMessage() and then reading through every message to know if WorkerName='WorkerA' is there a way I can filter JobQueue to get Only Messages that has WorkerName = 'WorkerA'?

Related

Design of a system where results from a separate worker process need to be sent back to the correct producer thread

I'm designing a system where an HTTP service with multiple threads accepts request to perform work. These requests are placed into a multiprocessing queue and sent downstream to a worker process where the work is performed (let's assume that we have a reasonable expectation that work can be handled quickly and the HTTP threads aren't blocking for a long time)
The issue that I can't figure out how to solve is - once the worker process is done processing the request, how are the results returned to the specific producer that produced a request?
I considered having another multiprocessing queue - a "results" queue - that each producer has a handle to, and they can wait on this queue for the results. The issue is that there's no guarantee that a specific producer will pull the results for their request from this queue, it might go to some other producer, and that other producer won't hold the open connection to the requesting HTTP client so it won't be able to do anything with the results.
I've included a simple system diagram below that shows the producer threads and worker process
One solution here would be to have the worker process write the results to some data store, e.g. Redis, under a random key created by the producer, and the producer could watch this key for the result. However, I would prefer to avoid adding an external storage system if possible as the overhead of serializing to/from Redis would be non-trivial in some cases.
I don't think the language should matter here, but in case it does, this would be developed in Python using some standard microframework (like FastAPI)
EDIT - I thought of one possible solution. I can have another thread in the producer process that is responsible for reading from a "response" multiprocessing queue from the worker process. All other producer threads can then query some thread-safe data structure within this "response reader" thread for their specific results (which will be placed under some unique key generated by the producer)
The main issue I'm struggling with now is how to scale this to multiple producer processes (each with multiple producer threads) and multiple worker processes that are distinct (worker A handles different jobs from worker B)

Make JMS queue aware of the state of the events that are being processed. Is it possible to configure in ActiveMQ?

I'm trying to configure a queue that is aware of the events that are being processed.
Questions
Does this make sense? :)
Is it possible to configure/customize ActiveMQ?
Are there any other library that can be "easily" configured to handle such cases? Kafka?
Problem
The queue contains events. Each event is associated with an object. A consumer takes the event from the queue and performs a task. Each event should be taken only by exactly one consumer.
Constraints
Events for the same object cannot be processed concurrently.
But events for different objects should be processed in parallel.
Example
The queue is
ObjectA-Event1
ObjectA-Event2
ObjectB-Event1
ObjectC-Event1
The Consumer1 should receive ObjectA-Event1 from the queue. The Consumer2 should receive ObjectB-Event1 from the queue and not the ObjectA-Event2. The ObjectA-Event2 should be available for consumers only when the first consumer completes the task for the ObjectA-Event1.
It looks to me like you should use message groups. Messages for each object should be in the same group so that they are received by the same consumer and processed serially. Messages in different groups are free to be processed by different consumers.

RabbitMQ Bunny Parallel Consumers

I have built an application which consists of one publisher, several queues and several consumers for each queue. Consumers on a queue (including the queue) share the channel. Other queues use a different channel. I am observing that for different queues, tasks are being worked on parallel but for a specific queue this is not happening. If I publish several messages at once to a specific queue, only one consumer works while the other ones wait until the work is ended. What should I do in order for consumers to work on parallel?
workers.each do |worker|
worker.on_delivery() do |delivery_info, metadata, payload|
perform_work(delivery_info, metadata, payload)
end
queue.subscribe_with(worker)
end
This is how I register all the consumers for a specific queue. The operation perform_work(_,_,_) is rather expensive and takes several seconds to complete.
RabbitMQ works off the back of the concept of channels, and channels are generally intended to not be shared between threads. Moreover, channels by default have a work thread pool size of one. A channel is an analog to a session.
In your case, you have multiple consumers sharing a queue and channel, and performing a long-duration job within the event handler for the channel.
There are two ways to work around this:
Allocate a channel per consumer, or
Set the work pool size of the channel on creation See this documentation
I would advocate 1 channel per consumer since it has a lower chance of causing unintended side-effects.

AWS Lambda processing stream from DynamoDB

I'm trying to create a lambda function that is consuming a stream from dynamoDB table. However I was wondering which is the best practice to handle data that may not have been processed for some errors during the execution? For example my lambda failed and I lost part of the stream, which is the best way to reprocess the lost data?
This is handled for you. DynamoDB Streams, like Kinesis Streams, will resend records until they have been successfully processed. When you are using Lambda to process the stream, that means successfully exiting the function. If there is an error and the function exits unexpectedly, the DynamoDB stream will simply resend the record that was being processing.
The good thing is you are guaranteed at-least-once processing however, there are some things you need to look out for. Like Kinesis Streams, DynamoDB Streams are guaranteed to processes records in order. As a side effect of this, when a record fails to process, it is retried until it is successfully processed or it expires from the stream (possibly days) before processing any records behind it in the stream.
How you solve for this depends on the needs of your application. If you need at-least-once processing but don't need to guarantee that all records are processed in order, I would just drop the records into an SQS queue and do the processing off of the queue. SQS queues will also retry records that aren't successfully processed however, unlike DynamoDB and Kinesis Streams, records will not block each other in the queue. If you encounter an error when transferring a record from the DynamoDB Stream to the SQS Queue, you can just retry however, this may introduce duplicates in the SQS Queue.
If order is critical or duplicates can't be tolerated, you can use a SQS FIFO Queue. SQS FIFO Queues are similar to (Standard) SQS Queues except they they are guaranteed to deliver messages to the consumer in order and have a deduplication window (5 mins) where any duplicates added to the queue within that window will be discarded.
In both cases, when using SQS queues to process messages, you can setup a Dead Letter Queue where messages can automatically be sent if they fail to be processed N number of times.
TLDR: Use SQS Queues.
Updating this thread as all the existing answers are stale.
AWS Lambda now supports the DLQs for synchronous steam read from DynamoDB table stream.
With this feature in context, here is the flow that I would recommend:
Configure the event source mapping to include the DLQ arns and set the retry-attempts count. After these many retry, the batch metadata would then be moved to DLQs.
Set-up alarm on DLQ message visibility to get alert on impacted records.
DLQ message can be used to retrieve the impacted stream record using KCL library
ProTip: you can use attribute "Bisect on Function Error" to enable batch splitting. With this option, lambda would be able to narrow down on the impacted record.
DynamoDB Streams invokes the Lambda function for each event untill it successfully processes it (Untill the code calls success callback).
In an error situation while executing, you need to handle it in code unless otherwise the Lambda won't continue with the remaining messages in the stream.
If there is a situation where you need to process the message separate due to an error, you can use the dead letter queue (with Amazon SQS) to push the message and continue with the remaining items in the stream. You can have a separate logic to process the messages in this queue.

Azure Queues - Functions - Message Visibility - Workers?

I have some questions regarding the capabilities regarding Azure Queues, Functions, and Workers. I'm not really sure how this works.
Scenario:
q-notifications is an queue in an Azure storage account.
f-process-notification is a function in Azure that is bound to q-notifications. Its job is to get the first message on the queue and process it.
In theory when a message is added to q-notifications, the function f-process-notification should be called.
Questions:
Does the triggered function replace the need to have workers? In other words, is f-process-notification called each time a message is placed in the queue.
Suppose I place a message on the queue that has a visibility timeout of 5 minutes. Basically I am queueing the message but it shouldn't be acted on until 5 minutes pass. Does the queue trigger f-process-notification immediately when the message is placed on the queue, or will it only trigger f-process-notification when the message becomes visible, i.e. 5 minutes after it is placed on the queue?
In Azure Functions, each Function App instance running your queue triggered function will have its own listener for the target queue. It monitors the queue for new work using an exponential backoff strategy. When new items are added to the queue the listener will pull multiple items off of the queue (batching behavior is configurable) and dispatch then in parallel to your function. If your function is successful, the message is deleted, otherwise it will remain on the queue to be reprocessed. To answer your question - yes we respect any visibility timeout you specify. If a message is added with a 5 minute timeout it will only be processed after that.
Regarding scale out - when N instances of your Function App are running they will all cooperate in processing the queue. Each queue listener will independently pull batches of messages off the queue to process. In effect, the work will be load balanced across the N instances. Exactly what you want :) Azure Functions is implementing all the complexities of the multiple consumer/worker pattern for you behind the scenes.
I typically use a listener logic as opposed to triggers. The consumer(s) are constantly monitoring the queue for messages. If you have multiple consumers, for example 5 instances of the consuming code in different Azure worker roles processing the same bus/queue, the first consumer to get the message wins (they are "competing"). This provides a scaling scenario common in a SOA architecture..
This article describes some of the ways to defer processing.
http://markheath.net/post/defer-processing-azure-service-bus-message
good luck!

Resources