boost.asio - do i need to use locks if sharing database type object between different async handlers?

boost.asio - do i need to use locks if sharing database type object between different async handlers? - c++11

I'm making a little server for a project, I have a log handler class which contains a log implemented as a map and some methods to act on it (add entry, flush to disk, commit etc..)
This object is instantiated in the server Class, and I'm passing the address to the session so each session can add entries to it.
The sessions are async, the log writes will happen in the async_read callback. I'm wondering if this will be an issue and if i need to use locks?
The map format is map<transactionId map<sequenceNum, pair<head, body>>, each session will access a different transactionId, so there should be no clashes as far as i can figure. Also hypothetically, if they were all writing to the same place in memory -- something large enough that the operation would not be atomic; would i need locks? As far as I understand each async method dispatches a thread to handle the operation, which would make me assume yes. At the same time I read that one of the great uses of async functions is the fact that synchronization primitives are not needed. So I'm a bit confused.
First time using ASIO or any type of asynchronous functions altogether, and i'm not a very experienced coder. I hope the question makes sense! The code seems to run fine so far, but i'm curios if it's correct.
Thank you!

Asynchronous handlers will only be invoked in application threads processing the io_service event loop via run(), run_one(), poll(), or poll_one(). The documentation states:
Asynchronous completion handlers will only be called from threads that are currently calling io_service::run().
Hence, for a non-thread safe shared resource:
If the application code only has one thread, then there is neither concurrency nor race conditions. Thus, no additional form of synchronization is required. Boost.Asio refers to this as an implicit strand.
If the application code has multiple threads processing the event-loop and the shared resource is only accessed within handlers, then synchronization needs to occur, as multiple threads may attempt to concurrently access the shared resource. To resolve this, one can either:
Protect the calls to the shared resource via a synchronization primitive, such as a mutex. This question covers using mutexes within handlers.
Use the same strand to wrap() the ReadHandlers. A strand will prevent concurrent invocation of handlers dispatched through it. For more details on the usage of strands, particularly for composed operations, such as async_read(), consider reading this answer.
Rather than posting the entire ReadHandler into the strand, one could limit interacting with the shared resource to a specific set of functions, and these functions are posted as CompletionHandlers to the same strand. This subtle difference between this and the previous solution is the granularity of synchronization.
If the application code has multiple threads and the shared resource is accessed from threads processing the event loop and from threads not processing the event loop, then synchronization primitives, such as a mutex, needs to be used.
Also, even if a shared resource is small enough that writes and reads are always atomic, one should prefer using explicit and proper synchronization. For example, although the write and read may be atomic, without proper memory fencing to guarantee memory visibility, a thread may not observe a chance in memory even though the actual memory has chanced. Boost.Asio's will perform the proper memory barriers to guarantee visibility. For more details, on Boost.Asio and memory barriers, consider reading this answer.

Related

GetOverlappedResultEx will create a thread to process on or do I have to create and sync the threads?

Trying to understand how this works... do I have to create various threads to take advantage of the functionality for GetOverlappedResultEx? However why couldn't I just put GetOverlappedResult in a separate thread from the main thread to handle blocking of the IO and not interfere with main operations?
GetOverlappedResult function
https://learn.microsoft.com/en-us/windows/win32/api/ioapiset/nf-ioapiset-getoverlappedresult
Retrieves the results of an overlapped operation on the specified file, named pipe, or communications device. To specify a timeout interval or wait on an alertable thread, use GetOverlappedResultEx.
https://learn.microsoft.com/en-us/windows/win32/api/ioapiset/nf-ioapiset-getoverlappedresultex
Retrieves the results of an overlapped operation on the specified file, named pipe, or communications device within the specified time-out interval. The calling thread can perform an alertable wait.
https://learn.microsoft.com/en-us/windows/win32/fileio/alertable-i-o

You handle threads, for concurrency, yourself.
There are basically three ways to do it:
Having initiated an overlapped (i.e., async completion) I/O operation you do something else and then every once in awhile poll the handle to see if the overlapped operation has completed. This is how you can use GetOverlappedResult looking for STATUS_PENDING to see if the operation isn't done yet.
You sit around waiting for an overlapped operation to complete. But it's not as bad as that, because you can actually sit around waiting for any of a set of overlapped operations to complete. As soon as any one completes you handle it, and then loop around to wait for the rest. Handling it, of course, may fire off another asynch operation, you add that handle to the list. This is where you use WaitForSingleObject{Ex} or better WaitForMultipleObjects{Ex}.
You use I/O Completion ports. Here you pass some handles to a kernel object called an I/O Completion port - this kernel object cleverly combines a thread pool (that it manages itself) with callbacks. It is a very efficient way of dealing with multiple - in fact, very many - async operations in-flight simultaneously. In these callbacks you can do whatever you want, including initiating more async operations and adding them to the same I/O Completion port.
There is also a fourth concept: alertable I/O, which executes a callback on an "APC" on your thread that initiated the I/O, provided your thread is in an "alertable" state - which means it is executing one or another of certain APIs that wait in the kernel. But I've never used it, as it seems to have drawbacks (such as only working on the thread that initiated the I/O, and that the environment the callback environment runs in isn't as clear as it could be) and if you're going to go that far just figure out I/O Completion ports and use them.
Options #2 and #3 of course involve concurrent programming - so in both cases you have to make sure your callbacks are thread-safe with respect to your other threads.
There are plenty of examples of all these methods out there on the intertubes.

boost asio concurrent async_read and async_write

Looking at the documentation it looks like the TCP socket object is not thread-safe. So I cannot issue async_read from one thread and async_write concurrently from another thread? Also I would guess it applies to boost::asio::write() as well?
Can I issue write() - synchronous, while I do async_read from another thread?
If that is not safe, then only way is probably to get the socket native handle
and use synchronous linux mechanisms to achieve concurrent read and writes. I have an application where the reads and writes are actually independent.

It is thread-safe for the use-cases you listed. You can read in one thread, and write in another. And you can use the synchronous as well as asynchronous operations for that.
You will however run into problems, if you try to do one dedicated operation type (e.g. reads) from more than one thread. Especially if you are using the freestanding/composed operations (boost::asio::read(socket) instead of socket.read_some(). The reason for this is one the primitive operations are atomic / threadsafe. And the composed operations are working by calling multiple times into the primitives.

Latest Windows threadpool API usage for I/O

I don't understand part of the latest Windows threadpool API. I need help with that.
From the documentation, the recipe to use it for I/O (in my case, for SOCKET) can be summarized as follows:
Call CreateThreadpoolIo.
Call StartThreadpoolIo. You can find this warning there:
You must call this function before initiating each asynchronous I/O operation on the file handle bound to the I/O completion object. Failure to do so will cause the thread pool to ignore an I/O operation when it completes and will cause memory corruption.
Call the operation on the file handle (e.g., WSARecvFrom). If it fails, call CancelThreadpoolIo. Otherwise, process the result when it is available. WSARecvFrom, when used asynchronously, asks for a WSAOVERLAPPED (that you have to create beforehand) but not for any information that links it to the previous call to StartThreadpoolIo. CancelThreadpoolIo only asks for the PTP_IO, but not for any additional information to derive a specific asynchronous operation.
Repeat steps 2 and 3.
Call CloseThreadpoolIo to finish. You can find this warning there:
It may be necessary to cancel threadpool I/O notifications to prevent memory leaks. For more information, see CancelThreadpoolIo.
I usually need it for UDP, so I strive to have several reception operations queued (asynchronous WSARecvFrom operations started) at any given time. That way I don't have to rush to start another reception operation at the beginning of the callback function nor synchronize access to the reception buffers (I can have a pool of them, each one able to contain a datagram, and reissue the reception operation when I finish processing each message; in the interim, other queued operations will keep the receiver busy). Datagrams are independent and self contained. I'm aware that this approach may not be valid for TCP.
StartThreadpoolIo/CancelThreadpoolIo seem to me the source of the problem: StartThreadpoolIo and WSARecvFrom are not directly bound (they don't share any arguments). So:
How can the framework know which operation to cancel when you call CancelThreadpoolIo? How does it cancel just the operation that failed and not any of the pending ones?
You can say, "don't call StartThreadpoolIo concurrently". I can live without several concurrent WSARecvFrom's, but I can't live without concurrent WSARecvFrom and WSASendTo. So I think being unable to have several asynchronous operations at the same time can't be the way the API was designed.
You can say, "call StartThreadpoolIo only once, that will suffice to register the callback; it is an on/off process". But the documentation says:
You must call this function before initiating each asynchronous I/O operation on the file handle...
You can say, "it cancels the operation started by the same thread that just called StartThreadpoolIo". But then the advice of calling CancelThreadpoolIo in the context of calling CloseThreadpoolIo doesn't make sense (I will call CloseThreadpoolIo from the thread that triggers stopping, which will be completely independent from the threads issuing the asynchronous operations; and a single call to CancelThreadpoolIo may not be enough to cancel several operations). Being unable to trigger cancellation from a different thread is a serious limitation, anyway. I'm aware of the existence of CreateThreadpoolCleanupGroup, but my question is more fundamental. I want to understand how this API can be fundamentally right and useful.
You can say "call CreateThreadpoolIo several times, so that you have independent PTP_IO's to work with". It doesn't work. When I call CreateThreadpoolIo a second time, nullptr is returned.
Am I wrong, or is this API awkward? Normally, other asynchronous APIs work with one of these patterns:
Create an operation and receive a handle => call methods passing the handle.
Create a reusable handle => call methods (including starting operations) passing the handle.
The latest Windows threadpool API, in which the handle seems to be implicit, or there are several handles for the same operation (TP_IO, WSAOVERLAPPED, StartThreadpoolIo) and they aren't all explicitly linked together, uses neither of them.
Thank you very much for your help.

How can the framework know which operation to cancel when you call CancelThreadpoolIo? How does it cancel just the operation that failed
and not any of the pending ones?
CancelThreadpoolIo() doesn't cancel IO. It is reciprocal to StartThreadpoolIo(). StartThreadpoolIo() prepares threadpool to accept a completion. If threadpool doesn't expect a completion, it won't wait for it, thus you may miss it. If threadpool expects a completion but completion doesn't happen, threadpool may waste resources.
CancelThreadpoolIo() undoes whatever StartThreadpoolIo() did.

WaitForSingleObject() vs RegisterWaitForSingleObject()?

What is the advantage/disadvantage over using RegisterWaitForSingleObject() instead of WaitForSingleObject()?
The reason that I know:
RegisterWaitForSingleObject() uses the thread pool already available in OS
In case of the use of WaitForSingleObject(), an own thread should be polling for the event.
the only difference is Polling vs. Automatic Event? or Is there any considerable performance advantage between these?

It's pretty straight-forward, WaitForSingleObject() blocks a thread. It is consuming a megabyte of virtual memory and not doing anything useful with it while it is blocked. It won't wake up and resume doing useful stuff until the handle is signaled.
RegisterWaitForSingleObject() does not block a thread. The thread can continue doing useful work. When the handle is signaled, Windows grabs a thread-pool thread to run the code you specified as the callback. The same code you would have programmed after a WFSO call. There is still a thread involved with getting that callback to run, the wait thread, but it can handle many RWFSO requests.
So the big advantage is that your program can use a lot less threads while still handling many service requests. A disadvantage is that it can take a bit longer for the completion code to start running. And it is harder to program correctly since that code runs on another thread. Also note that you don't need RWFSO when you already use overlapped I/O.

They serve two different code models. In case with RegisterWaitForSingleObject you'll get an asynchronous notification callback on a random thread from the thread pool managed by the OS. If you can structure your code like this, it might be more efficient. On the other hand, WaitForSingleObject is a synchronous wait call blocking (an thus 'occupying') the calling thread. In most cases, such code is easier to write and would probably be less error-prone to various dead-lock and race conditions.

Event or Mutex?

I currently have a program running with about 20 threads at a time. I am fairly new to multi-threading so I'm a little confused on proper data protection.
Currently, my threads use Events as data locks an unlocks. I opted to use this over critical sections as most data is only shared between two or three threads so preventing a single thread from reading while one wrote by stopping all 20 threads seemed wasteful. I used Event over Mutex simply due to the fact that I could not (easily) find a source that clearly explained how a Mutex works and how to implement one.
I'm using Win32 API for my multi-threading. In my current setup I use Events to lock data so my event would be something like "DataUnlock" When not set I know that the data is being worked on. When set I know it is ok to work on the data. So my makeshift data locks look something like this.
WaitForSingleObject( DataUnlock,INFINITE ); //Wait until the Data is free
ResetEvent(DataUnlock); //Signal that the data is being worked on
...work on Data...
SetEvent(DataUnlock); //Signal that the data is free to use
My first question is: Is this as good (efficient) as using a Mutex when only two threads are accessing the data?
Second: If more than two threads waiting to access the data is there a potential that both will be triggered when the data is freed (will the both pass the wait before one reaches ResetEvent)? If so, would a mutex have the same issue?
Lastly: If a mutex is preferable, how would I go about implementing one (a link or explanation would be greatly appreciated)?
Thanks!

I don't think that the event approach is the best way of protecting the data.
Look at Mutex Objects and Using Mutex Objects to learn about mutexes.
One of your threads has to create a mutex. The CreateMutex function returns a handle to the mutex object. You can pass the handle as an argument to the threads dealing with your data.
Use the WaitForSingleObject function to wait for the mutex and then process your data. Release the mutex with a call to the ReleaseMutex function. When a mutex is released the next wait function will gain the mutex.
In case the data is to be accessed by threads of multiple processes, named mutexes have to be used.
Look at Critical Section Objects to learn about critical section synchronisation.
If you want to have the
A critical section has to be created by a call to the InitializeCriticalSection function.
Use the EnterCriticalSection function at all places before you handle your data.
The LeaveCriticalSection function releases the critical section releases the object. Use this call after you're done with the data.
The critical section can only be entered by the owning thread. Once a thread has gained the critical section object, no other thread can get access to your data. Other threads will block at the call to EnterCriticalSection(). However the thread owns the critical section can do succesive calls to EnterCriticalSection() more than once. Care shall be taken to call LeaveCriticalSection() once for every call to EnterCriticalSection().
Your example would let all threads waiting for the event process your data. And you would only know by the data themself if processing of anykind has happened. That's up to you,
how to determine what was done and what still needs to be done. If you have many threads waiting for your event, you can't tell the order in which the get access.
I would recommend using a critical section object. It is lightweight and relatively easy to use. See Using Critical Section Objects for an example how to use critical section objects.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio