Design of a system where results from a separate worker process need to be sent back to the correct producer thread - multiprocessing

I'm designing a system where an HTTP service with multiple threads accepts request to perform work. These requests are placed into a multiprocessing queue and sent downstream to a worker process where the work is performed (let's assume that we have a reasonable expectation that work can be handled quickly and the HTTP threads aren't blocking for a long time)
The issue that I can't figure out how to solve is - once the worker process is done processing the request, how are the results returned to the specific producer that produced a request?
I considered having another multiprocessing queue - a "results" queue - that each producer has a handle to, and they can wait on this queue for the results. The issue is that there's no guarantee that a specific producer will pull the results for their request from this queue, it might go to some other producer, and that other producer won't hold the open connection to the requesting HTTP client so it won't be able to do anything with the results.
I've included a simple system diagram below that shows the producer threads and worker process
One solution here would be to have the worker process write the results to some data store, e.g. Redis, under a random key created by the producer, and the producer could watch this key for the result. However, I would prefer to avoid adding an external storage system if possible as the overhead of serializing to/from Redis would be non-trivial in some cases.
I don't think the language should matter here, but in case it does, this would be developed in Python using some standard microframework (like FastAPI)
EDIT - I thought of one possible solution. I can have another thread in the producer process that is responsible for reading from a "response" multiprocessing queue from the worker process. All other producer threads can then query some thread-safe data structure within this "response reader" thread for their specific results (which will be placed under some unique key generated by the producer)
The main issue I'm struggling with now is how to scale this to multiple producer processes (each with multiple producer threads) and multiple worker processes that are distinct (worker A handles different jobs from worker B)

Related

Access ProcessContext::forward from multiple user threads

Given: DSL topology with KStream::transform. As part of Transformer::transform execution multiple messages are generated from the input one (it could be thousands of output messages from the single input message).
New messages are generated based on the data retrieved from the database. To speed up the process I would like to create multiple user threads to access data in DB in parallel. Upon generating a new message the thread will call ProcessContext::forward to send the message downstream.
Is it safe to call ProcessContext::forward from the different threads?
It is not safe and not allowed to call ProcessorContext#forward() from a different thread. If you try it, an exception will be thrown.
As a workaround, you could let all threads "buffer" their result data, and collect all data in the next call to process(). As an alternative, you could also schedule a punctuation that collects and forwards the data from the different threads.

Multi-Thread Processing in .NET

I already have a few ideas, but I'd like to hear some differing opinions and alternatives from everyone if possible.
I have a Windows console app that uses Exchange web services to connect to Exchange and download e-mail messages. The goal is to take each individual message object, extract metadata, parse attachments, etc. The app is checking the inbox every 60 seconds. I have no problems connecting to the inbox and getting the message objects. This is all good.
Here's where I am accepting input from you: When I get a message object, I immediately want to process the message and do all of the busy work explained above. I was considering a few different approaches to this:
Queuing the e-mail objects up in a table and processing them one-by-one.
Passing the e-mail object off to a local Windows service to do the busy work.
I don't think db queuing would be a good approach because, at times, multiple e-mail objects need to be processed. It's not fair if a low-priority e-mail with 30 attachments is processed before a high-priority e-mail with 5 attachments is processed. In other words, e-mails lower in the stack shouldn't need to wait in line to be processed. It's like waiting in line at the store with a single register for the bonehead in front of you to scan 100 items. It's just not fair. Same concept for my e-mail objects.
I'm somewhat unsure about the Windows service approach. However, I'm pretty confident that I could have an installed service listening, waiting on demand for an instruction to process a new e-mail. If I have 5 separate e-mail objects, can I make 5 separate calls to the Windows service and process without collisions?
I'm open to suggestions or alternative approaches. However, the solution must be presented using .NET technology stack.
One option is to do the processing in the console application. What you have looks like a standard producer-consumer problem with one producer (the thread that gets the emails) and multiple consumers. This is easily handled with BlockingCollection.
I'll assume that your message type (what you get from the mail server) is called MailMessage.
So you create a BlockingCollection<MailMessage> at class scope. I'll also assume that you have a timer that ticks every 60 seconds to gather messages and enqueue them:
private BlockingCollection<MailMessage> MailMessageQueue =
new BlockingCollection<MailMessage>();
// Timer is created as a one-shot and re-initialized at each tick.
// This prevents the timer proc from being re-entered if it takes
// longer than 60 seconds to run.
System.Threading.Timer ProducerTimer = new System.Threading.Timer(
TimerProc, null, TimeSpan.FromSeconds(60), TimeSpan.FromMilliseconds(-1));
void TimerProc(object state)
{
var newMessages = GetMessagesFromServer();
foreach (var msg in newMessages)
{
MailMessageQueue.Add(msg);
}
ProducerTimer.Change(TimeSpan.FromSeconds(60), TimeSpan.FromMilliseconds(-1));
}
Your consumer threads just read the queue:
void MessageProcessor()
{
foreach (var msg in MailMessageQueue.GetConsumingEnumerable())
{
ProcessMessage();
}
}
The timer will cause the producer to run once per minute. To start the consumers (say you want two of them):
var t1 = Task.Factory.StartNew(MessageProcessor, TaskCreationOptions.LongRunning);
var t2 = Task.Factory.StartNew(MessageProcessor, TaskCreationOptions.LongRunning);
So you'll have two threads processing messages.
It makes no sense to have more processing threads than you have available CPU cores. The producer thread presumably won't require a lot of CPU resources, so you don't have to dedicate a thread to it. It'll just slow down message processing briefly whenever it's doing its thing.
I've skipped over some detail in the description above, particularly cancellation of the threads. When you want to stop the program, but let the consumers finish processing messages, just kill the producer timer and set the queue as complete for adding:
MailMessageQueue.CompleteAdding();
The consumers will empty the queue and exit. You'll of course want to wait for the tasks to complete (see Task.Wait).
If you want the ability to kill the consumers without emptying the queue, you'll need to look into Cancellation.
The default backing store for BlockingCollection is a ConcurrentQueue, which is a strict FIFO. If you want to prioritize things, you'll need to come up with a concurrent priority queue that implements the IProducerConsumerCollection interface. .NET doesn't have such a thing (or even a priority queue class), but a simple binary heap that uses locks to prevent concurrent access would suffice in your situation; you're not talking about hitting this thing very hard.
Of course you'd need some way to prioritize the messages. Probably sort by number of attachments so that messages with no attachments are processed quicker. Another option would be to have two separate queues: one for messages with 0 or 1 attachments, and a separate queue for those with lots of attachments. You could have one of your consumers dedicated to the 0 or 1 queue so that easy messages always have a good chance of being processed first, and the other consumers take from the 0 or 1 queue unless it's empty, and then take from the other queue. It would make your consumers a little more complicated, but not hugely so.
If you choose to move the message processing to a separate program, you'll need some way to persist the data from the producer to the consumer. There are many possible ways to do that, but I just don't see the advantage of it.
I'm somewhat a novice here, but it seems like an initial approach could be to have a separate high-priority queue. Every time a worker is available to obtain a new message, it could do something like:
If DateTime.Now - lowPriorityQueue.Peek.AddedTime < maxWaitTime Then
ProcessMessage(lowPriorityQueue.Dequeue())
Else If highPriorityQueue.Count > 0 Then
ProcessMessage(highPriorityQueue.Dequeue())
Else
ProcessMessage(lowPriorityQueue.Dequeue())
End If
In a single thread, while you can still have one message blocking the others, higher priority messages could be processed sooner.
Depending on how fast most messages get processed, the application could create a new worker on a new thread if the queues are getting too big or too old.
Please tell me if I'm completely off-base here though.

how to send response with resque

We have 3 resque workers here, that process and convert some data. Now, I need to send a response to the one who sent me the data. How do I send a response? Do resque have async-something way to send a response to a client?
Resque can't (and was never designed to) do that, however, you can use redis or your database as a communication mechanism. We actually do this as we process long-running tasks. For example, just create a new key in redis when enqueueing the job, passing that key in with the job arguments. As the job processes, it may update that key in redis. Whoever enqueued that job simply needs to watch that redis key for changes.
There are more efficient "push" type solutions too (such as HTTP notifications to a controller in your app, or sockets), if polling is a problem, though with only 3 workers it doesn't sound like it should be.

Long running processing in handlers with boost::asio

I'm designing a network sever based on boost::asio. I need to perform long running processing jobs in handlers and think that these processing should be moved from handlers to separate thread pool where I would have better control (e.g. prioritize tasks). Handlers would just enqueue a new task in job queue.
There would be also a response queue where responses would be dequeued and send back to the clients. (client send requests synchronously)
I wonder if this make sense or just miss something.
Short answer is Yes. Long answer it depends. Generally speaking, if you want a higher network though put you should minimize processing that is performed in the handlers and offload it to a thread. This is especially important if you have causality requirements for the data that you receive, since async_receive doesn't guarantee execution order of handlers.

Broadcasting message to multiple processes (Point-to-Point Message Queue with multiple readers)

I want to share a data with multiple processes. My first attempt is to use Point to point message queue with multiple readers since I read that P2P Msg Queue is very fast.
During my test, it seems like multiple readers are reading from the same queue and once a message is fetched by one reader, other readers will not be able to fetch the same message.
What is a better IPC for sharing data to multiple processes?
The data is updated frequently (multiple times per second) so I think WM_COPYDATA is not a good choice and will interfere with the "normal" message queue.
My second attempt will probably be a shared memory + mutex + events
Point-to-point queues will work fine. Yes, when you send, only one receiver will get the message but the sender can query the queue (by calling GetMsgQueueInfo) to see how many listeners (wNumReaders member of the MSGQUEUEINFO) there are and simply repeat the message that number of times.
Finally, it's perfectly valid for more than one thread or process to open the same queue for read access or for write access. Point-to-point message queues support multiple readers and multiple writers. This practice allows, for example, one writer process to send messages to multiple client processes or multiple writer processes to send messages to a single reader process. There is, however, no way to address a message to a specific reader process. When a process, or a thread, reads the queue, it will read the next available message. There is also no way to broadcast a message to multiple readers.
Programming Windows Embedded CE 6.0 Developer Reference, Fourth Edition, Douglas Boiling, Page 304
Despite the warning, ctacke's ide seems to be fine for my use cases.
Caveat:
My queue readers need to Sleep(10) after they fetch their share of message to allow other readers to go and fetch messages. Without Sleep(), only one reader process is signaled from waiting.

Resources