Long running processing in handlers with boost::asio - boost

I'm designing a network sever based on boost::asio. I need to perform long running processing jobs in handlers and think that these processing should be moved from handlers to separate thread pool where I would have better control (e.g. prioritize tasks). Handlers would just enqueue a new task in job queue.
There would be also a response queue where responses would be dequeued and send back to the clients. (client send requests synchronously)
I wonder if this make sense or just miss something.

Short answer is Yes. Long answer it depends. Generally speaking, if you want a higher network though put you should minimize processing that is performed in the handlers and offload it to a thread. This is especially important if you have causality requirements for the data that you receive, since async_receive doesn't guarantee execution order of handlers.


Reliable Asynchronous Handling of Domain Events

In concurrent systems, domain events are typically handled asynchronously. In Go, a simple approach for asynchronous event handling can be implemented via channels, but the issue is that if something bad happens for handling an event, or worst, for the whole program, the event will be lost.
How asynchronous domain events can be handled properly in a Go program, i.e.:
When an event handler fails, the event should not be purged from the event queue, in order to be handled properly in a later time.
If the whole program goes down, the events have to be recovered and processed accordingly.
The first is relatively easy; you can have an error handler within the worker that re-queues the work in the event of an error.
The second is much harder; your options are a) roll your own bulletproof mechanism for writing events to disk and purging them when they're completed in a thread-safe way or b) use one of the many, many popular systems available that's already proven reliable, e.g. RabbitMQ or Kafka, with the appropriate replication and redundancy to ensure the level of reliability you require. I would strongly recommend the latter.

Latest Windows threadpool API usage for I/O

I don't understand part of the latest Windows threadpool API. I need help with that.
From the documentation, the recipe to use it for I/O (in my case, for SOCKET) can be summarized as follows:
Call CreateThreadpoolIo.
Call StartThreadpoolIo. You can find this warning there:
You must call this function before initiating each asynchronous I/O operation on the file handle bound to the I/O completion object. Failure to do so will cause the thread pool to ignore an I/O operation when it completes and will cause memory corruption.
Call the operation on the file handle (e.g., WSARecvFrom). If it fails, call CancelThreadpoolIo. Otherwise, process the result when it is available. WSARecvFrom, when used asynchronously, asks for a WSAOVERLAPPED (that you have to create beforehand) but not for any information that links it to the previous call to StartThreadpoolIo. CancelThreadpoolIo only asks for the PTP_IO, but not for any additional information to derive a specific asynchronous operation.
Repeat steps 2 and 3.
Call CloseThreadpoolIo to finish. You can find this warning there:
It may be necessary to cancel threadpool I/O notifications to prevent memory leaks. For more information, see CancelThreadpoolIo.
I usually need it for UDP, so I strive to have several reception operations queued (asynchronous WSARecvFrom operations started) at any given time. That way I don't have to rush to start another reception operation at the beginning of the callback function nor synchronize access to the reception buffers (I can have a pool of them, each one able to contain a datagram, and reissue the reception operation when I finish processing each message; in the interim, other queued operations will keep the receiver busy). Datagrams are independent and self contained. I'm aware that this approach may not be valid for TCP.
StartThreadpoolIo/CancelThreadpoolIo seem to me the source of the problem: StartThreadpoolIo and WSARecvFrom are not directly bound (they don't share any arguments). So:
How can the framework know which operation to cancel when you call CancelThreadpoolIo? How does it cancel just the operation that failed and not any of the pending ones?
You can say, "don't call StartThreadpoolIo concurrently". I can live without several concurrent WSARecvFrom's, but I can't live without concurrent WSARecvFrom and WSASendTo. So I think being unable to have several asynchronous operations at the same time can't be the way the API was designed.
You can say, "call StartThreadpoolIo only once, that will suffice to register the callback; it is an on/off process". But the documentation says:
You must call this function before initiating each asynchronous I/O operation on the file handle...
You can say, "it cancels the operation started by the same thread that just called StartThreadpoolIo". But then the advice of calling CancelThreadpoolIo in the context of calling CloseThreadpoolIo doesn't make sense (I will call CloseThreadpoolIo from the thread that triggers stopping, which will be completely independent from the threads issuing the asynchronous operations; and a single call to CancelThreadpoolIo may not be enough to cancel several operations). Being unable to trigger cancellation from a different thread is a serious limitation, anyway. I'm aware of the existence of CreateThreadpoolCleanupGroup, but my question is more fundamental. I want to understand how this API can be fundamentally right and useful.
You can say "call CreateThreadpoolIo several times, so that you have independent PTP_IO's to work with". It doesn't work. When I call CreateThreadpoolIo a second time, nullptr is returned.
Am I wrong, or is this API awkward? Normally, other asynchronous APIs work with one of these patterns:
Create an operation and receive a handle => call methods passing the handle.
Create a reusable handle => call methods (including starting operations) passing the handle.
The latest Windows threadpool API, in which the handle seems to be implicit, or there are several handles for the same operation (TP_IO, WSAOVERLAPPED, StartThreadpoolIo) and they aren't all explicitly linked together, uses neither of them.
Thank you very much for your help.
How can the framework know which operation to cancel when you call CancelThreadpoolIo? How does it cancel just the operation that failed
and not any of the pending ones?
CancelThreadpoolIo() doesn't cancel IO. It is reciprocal to StartThreadpoolIo(). StartThreadpoolIo() prepares threadpool to accept a completion. If threadpool doesn't expect a completion, it won't wait for it, thus you may miss it. If threadpool expects a completion but completion doesn't happen, threadpool may waste resources.
CancelThreadpoolIo() undoes whatever StartThreadpoolIo() did.

C++ IRC Client design

I'm attempting to write an RFC 2812 compliant C++ IRC library.
I am having some trouble with the design of the client itself.
From what I have read IRC communication tends to be asynchronous.
I am using boost::asio::async_read and boost::asio::async_write.
From reading the documentation I have gathered that you cannot perform multiple async_write requests before one is completed. You therefore end up with rather nested callbacks. Doesn't this defeat the purpose of doing async calls? Wouldn't it just be better to use synchronous calls to prevent the nesting? If not, why?
Secondly, if I am not mistaken, each boost::asio::async_write should be followed up by a boost::asio::async_read to receive the server's response to the commands sent. My client's functions, therefore, would need to take a callback parameter so a user of the class may do something after the client receives a response (ex. send another message...).
If I were to continue implementing this with async, should I keep a std::deque<std::tuple<message, callback>> and each time a boost::asio::async_write is finished, and there is a tuple in the queue, dequeue and send the message then raise the callback? Would this be the optimal way to implement this system?
I'm thinking since messages are sent all the time I'm going to have to implement some kind of listener loop that queues up responses, but how would you associate these responses with the specific command that triggered them? Or in the case that the response is just a message to the channel from another user?
The IRC protocol is a full-duplex protocol. As such, one should always be listening to the server connection expecting commands to process. It could be argued that one should primarily use the messages received from the server to update state, rather than correlating request and responses, as the server may not respond to a command or may respond much later than expected. For example, one may issue a WHOIS command, but receive multiple PRIVMSG commands before receiving a response to WHOIS. For a chat client, a user would likely expect being able to receive chat messages while waiting for a response to WHOIS. Hence, having a async_write() to async_read() call chain may not be ideal in handling the protocol.
For a given socket, the Asio documentation does recommend not initiating additional read operations if there is an outstanding composed read operation and not initiating additional write operations if there is an outstanding composed write operation. Queuing up messages and having an asynchronous call chains process from the queue is a great way to fulfill this recommendation. Consider reading this answer for a nice solution using a queue and an asynchronous call chain.
Also, be aware that the server may send a PING command even on an active connection. When the client is responding with a PONG command, it may be necessary to insert the PONG command near the front of the outbound queue so that it gets sent out as soon as possible.
Doesn't this defeat the purpose of doing async calls?
The usual solution is to use strands:
Why do I need strand per connection when using boost::asio?
You are free to queue multiple asynchronous operations on the same io objects using an (implicit) strand¹.
Using a strand ensures that the completion handlers are invoked on that same logical thread.
On the Protocol
You could indeed keep a queue of commands and await responses for each command before sending the next.
You might be a little bit smarter about this if you can spot the correlation due the different type of reply, but then you'd need to keep queues per type of command. I'd consider that premature optimization.

How can I monitor/manage queue in ZeroMQ?

First of all, I'm new to ZeroMQ and message queue systems, so what I'm trying to do may be solved through a different approach. I'm designing a messaging system that does the following:
Multiple clients connect to a broker and send the id of an item that needs to be processed. The client disconnects immediately and does not wait for a response.
The broker sends items to workers, one item per worker, to perform some processing. Each return returns a signal that the processing was completed.
I have a rudimentary system setup which is processing requests/replies correctly, but I'd also like to be able to do the following:
Query the broker to see how many processes are actually running on the workers and how many are simply waiting to be run.
Have the broker ensure that only one process per id is running - if a duplicate id arrives and that item is not currently being processed by a worker, do not add it to the queue.
I'm using a poll setup with broker/dealer sockets. The code I'm using is very similar to this example from Ian Barber.
My first inclination (although I'm not sure how to implement it in zmq) is to have the broker keep track of the ids that have been received, and those that are actively being processed by workers. It seems that the broker forwards requests to workers immediately, regardless of whether or not they are available to actually run the processing. The workers then queue up the ids and process them in order. This isn't ideal since I'm looking to be able to monitor and control what is going on in the system centrally to achieve reliability.
Anyways, any hints, tips or examples of this type of setup would be greatly appreciated.
ZeroMQ is, in my opinion, best used in broker-less designs, for which the library is designed. If you want to monitor the number of items in a queue, or throughput, or whatever, you're going to have to build that into the application/device/producer yourself. Since you're new to messaging, that could get out of hand real quick. Given this, I'd suggest looking into RabbitMQ (or a similar broker), which would provide these services for you out of the box. If you do adopt RabbitMQ (or rather, AMQP), I'd suggest using a fanout exchange for the scenario you describe above.
The Python library for ZeroMQ seems to come with a pattern for dealing with this: http://zeromq.github.com/pyzmq/devices.html#monitoredqueue

Usage of non-blocking send and blocking receive in MPI?

I am trying to implement master-worker program.
My master has jobs that the workers are going to do. Every time a worker completes a job, he asks for a new job from the master, and the master sends it to him. The workers are calculating minimal paths. When a worker finds a minimum that is better than the global minimum he got, he sends it to everyone including the master.
I plan for the workers and masters to send data using MPI_ISEND. Also, I think that the receive should be blocking. The master has nothing to do when no one has asked for work or has updated the best result, so he should block waiting for a receive. Also, each worker should, after he has done his work, wait on a receive to get a new one.
Nevertheless, I'm not sure of the impact of using non-blocking asynchronous send, and blocking synchronous receive.
An alternative I think is using MPI_IPROBE, but I'm not sure that this will give me any optimization.
Please help me understand whether what I'm doing is right. Is this the right solution?
You can match blocking sends with nonblocking receives and vice versa, that won't cause any problems. However, if the master really has nothing to do while the workers work, and the workers should block after completing their work unit, then there's no reason for non-blocking communication on that front. The master can post a blocking receive with MPI_ANY_SOURCE, and the workers can just use a blocking send to post back their results, since the matching receive at the master will already have been posted.
So, I'd have Send-Recv for exchanging work units between master and worker, and Isend-Irecv for broadcasting the new global minima.
