Constructor time IO in a device mapper implementation - linux-kernel

I'm developing a device mapper driver and was wondering about delays dues to performing disk initialization in the constructor context. There's a fair amount of IO to be done during some initial setup - ranges of blocks to clean out.
From looking over dm implementations it appears that this work is done synchronously in the constructor. That is the constructor shouldn't be allowed to return until this operation is done. I don't know if that's a good idea or whether there's a way to make it asynchronous until the init time work is done.
I was thinking that calls to 'map' might be deferred by returning DM_IO_REQUEUE until the operation completes. But that may be some seconds. I've not found any docs or reference that covers the function set of the target_type structure in dm. Just what I've seen of some dm drivers making use of some of these function indirects. Any hints on where there's details of the methods or rules on what can and can't be done in the constructor?

Related

When to use Unconfined in Kotlin

When would I choose to use Dispatchers.Unconfined? Is when it doesn't really matter where the coroutine should run? So you let the coroutine to choose the thread pool as it better suits?
And how does it differ from Dispatchers.Default? Is it that when running the Default dispatcher is always within a specific thread pool defined as the default one?
So you let the coroutine to choose the thread pool as it better suits?
That's not really how Unconfined works. The best way to understand it is that it is a "no-op" dispatcher that doesn't actually do any dispatch at all. Wherever you call continuation.resume(), that's where the coroutine resumes execution — within that very call. When the resume() call returns, it means the coroutine has either suspended again or completed.
In normal programming, you usually call continuation.resume() from a callback and it is not your code that runs the callback, so you don't actually have any control over the thread where your coroutine will resume. It is not advisable to use the Unconfined dispatcher when resuming from a callback provided by a library that is not under your control.
Unconfined is really a special-cased tool you can use when building a coroutine execution environment yourself, or in other custom scenarios. Basically, you should use it only when you are actively looking for a way to disable the normal dispatching mechanism.
The unconfined dispatcher is appropriate for coroutines which neither consume CPU time nor update any shared data (like UI) confined to a specific thread.
So, I'd use it in non-IO, UI or computation heavy situations basically :D.
I think the nunmber of use-cases for this is pretty low, but I'd think of an operation which isn't heavy, but still for some reason you'd like it to run on a different thread.
Here's a link for how it actually works.
Dispatchers.Default is really different, and it's mostly used for heavy CPU operations.
This is because, it actually dispatches works to a thread pool with a number of threads equal to the number of CPU cores, and it's at least 2. This way developers can leverage the full capacity of the cpu when doing heavy computational work.

NSUrlConnection synchronous request without accepting redirects

I am currently implementing code that uses macOS API for HTTP/HTTPs requests in a Delphi/Lazarus program.
The code runs in its own thread (i.e. not main/ui thread) and is part of a larger threading based crawler across Windows/Mac and Delphi/Lazarus. I try to implement the actual HTTP/S request part using the OS API - but handle e.g. processing and taking action upon HTTP headers myself.
This means I would like to keep using synchronous mode if possible.
I want the request to simply return to me what the server returns.
I do not want it to follow redirects.
I currently use sendSynchroniousRequest_returningResponse_error
I have tried searching Google, but it seems there is no way when using synchronous requests? That just seems a bit odd.
No, NSURLConnection's synchronous functionality is very limited, and was never expanded because it is so strongly discouraged. That said, it is technically possible to implement what you're trying to do.
My recollection, from having replaced that method with an NSURLSession equivalent once (to swizzle in a less leaky replacement for that method in a binary-only library), is that you need to basically write a method that uses a shared dictionary to store a semaphore for each NSURLSessionDataTask (using the data task as a key). Then, you set the semaphore's count to zero so that it will block immediately when you wait on it, asynchronously start an asynchronous request on the main thread, and then wait on the semaphore (in the current thread). In the asynchronous data task's completion handler block, you increment the semaphore, thus unblocking the calling thread.
The trick is to ensure that the session runs its callbacks on a thread OTHER than the current one (which is blocked waiting for the semaphore). So you'll need to dispatch_async into the main thread when you actually start the data task.
Ostensibly, if you supported converting the task into a download task or stream task in the relevant delegate method, you would also need to take appropriate action to update the shared dictionary as well, but I'm assuming you won't use that feature. :-)

Latest Windows threadpool API usage for I/O

I don't understand part of the latest Windows threadpool API. I need help with that.
From the documentation, the recipe to use it for I/O (in my case, for SOCKET) can be summarized as follows:
Call CreateThreadpoolIo.
Call StartThreadpoolIo. You can find this warning there:
You must call this function before initiating each asynchronous I/O operation on the file handle bound to the I/O completion object. Failure to do so will cause the thread pool to ignore an I/O operation when it completes and will cause memory corruption.
Call the operation on the file handle (e.g., WSARecvFrom). If it fails, call CancelThreadpoolIo. Otherwise, process the result when it is available. WSARecvFrom, when used asynchronously, asks for a WSAOVERLAPPED (that you have to create beforehand) but not for any information that links it to the previous call to StartThreadpoolIo. CancelThreadpoolIo only asks for the PTP_IO, but not for any additional information to derive a specific asynchronous operation.
Repeat steps 2 and 3.
Call CloseThreadpoolIo to finish. You can find this warning there:
It may be necessary to cancel threadpool I/O notifications to prevent memory leaks. For more information, see CancelThreadpoolIo.
I usually need it for UDP, so I strive to have several reception operations queued (asynchronous WSARecvFrom operations started) at any given time. That way I don't have to rush to start another reception operation at the beginning of the callback function nor synchronize access to the reception buffers (I can have a pool of them, each one able to contain a datagram, and reissue the reception operation when I finish processing each message; in the interim, other queued operations will keep the receiver busy). Datagrams are independent and self contained. I'm aware that this approach may not be valid for TCP.
StartThreadpoolIo/CancelThreadpoolIo seem to me the source of the problem: StartThreadpoolIo and WSARecvFrom are not directly bound (they don't share any arguments). So:
How can the framework know which operation to cancel when you call CancelThreadpoolIo? How does it cancel just the operation that failed and not any of the pending ones?
You can say, "don't call StartThreadpoolIo concurrently". I can live without several concurrent WSARecvFrom's, but I can't live without concurrent WSARecvFrom and WSASendTo. So I think being unable to have several asynchronous operations at the same time can't be the way the API was designed.
You can say, "call StartThreadpoolIo only once, that will suffice to register the callback; it is an on/off process". But the documentation says:
You must call this function before initiating each asynchronous I/O operation on the file handle...
You can say, "it cancels the operation started by the same thread that just called StartThreadpoolIo". But then the advice of calling CancelThreadpoolIo in the context of calling CloseThreadpoolIo doesn't make sense (I will call CloseThreadpoolIo from the thread that triggers stopping, which will be completely independent from the threads issuing the asynchronous operations; and a single call to CancelThreadpoolIo may not be enough to cancel several operations). Being unable to trigger cancellation from a different thread is a serious limitation, anyway. I'm aware of the existence of CreateThreadpoolCleanupGroup, but my question is more fundamental. I want to understand how this API can be fundamentally right and useful.
You can say "call CreateThreadpoolIo several times, so that you have independent PTP_IO's to work with". It doesn't work. When I call CreateThreadpoolIo a second time, nullptr is returned.
Am I wrong, or is this API awkward? Normally, other asynchronous APIs work with one of these patterns:
Create an operation and receive a handle => call methods passing the handle.
Create a reusable handle => call methods (including starting operations) passing the handle.
The latest Windows threadpool API, in which the handle seems to be implicit, or there are several handles for the same operation (TP_IO, WSAOVERLAPPED, StartThreadpoolIo) and they aren't all explicitly linked together, uses neither of them.
Thank you very much for your help.
How can the framework know which operation to cancel when you call CancelThreadpoolIo? How does it cancel just the operation that failed
and not any of the pending ones?
CancelThreadpoolIo() doesn't cancel IO. It is reciprocal to StartThreadpoolIo(). StartThreadpoolIo() prepares threadpool to accept a completion. If threadpool doesn't expect a completion, it won't wait for it, thus you may miss it. If threadpool expects a completion but completion doesn't happen, threadpool may waste resources.
CancelThreadpoolIo() undoes whatever StartThreadpoolIo() did.

boost.asio - do i need to use locks if sharing database type object between different async handlers?

I'm making a little server for a project, I have a log handler class which contains a log implemented as a map and some methods to act on it (add entry, flush to disk, commit etc..)
This object is instantiated in the server Class, and I'm passing the address to the session so each session can add entries to it.
The sessions are async, the log writes will happen in the async_read callback. I'm wondering if this will be an issue and if i need to use locks?
The map format is map<transactionId map<sequenceNum, pair<head, body>>, each session will access a different transactionId, so there should be no clashes as far as i can figure. Also hypothetically, if they were all writing to the same place in memory -- something large enough that the operation would not be atomic; would i need locks? As far as I understand each async method dispatches a thread to handle the operation, which would make me assume yes. At the same time I read that one of the great uses of async functions is the fact that synchronization primitives are not needed. So I'm a bit confused.
First time using ASIO or any type of asynchronous functions altogether, and i'm not a very experienced coder. I hope the question makes sense! The code seems to run fine so far, but i'm curios if it's correct.
Thank you!
Asynchronous handlers will only be invoked in application threads processing the io_service event loop via run(), run_one(), poll(), or poll_one(). The documentation states:
Asynchronous completion handlers will only be called from threads that are currently calling io_service::run().
Hence, for a non-thread safe shared resource:
If the application code only has one thread, then there is neither concurrency nor race conditions. Thus, no additional form of synchronization is required. Boost.Asio refers to this as an implicit strand.
If the application code has multiple threads processing the event-loop and the shared resource is only accessed within handlers, then synchronization needs to occur, as multiple threads may attempt to concurrently access the shared resource. To resolve this, one can either:
Protect the calls to the shared resource via a synchronization primitive, such as a mutex. This question covers using mutexes within handlers.
Use the same strand to wrap() the ReadHandlers. A strand will prevent concurrent invocation of handlers dispatched through it. For more details on the usage of strands, particularly for composed operations, such as async_read(), consider reading this answer.
Rather than posting the entire ReadHandler into the strand, one could limit interacting with the shared resource to a specific set of functions, and these functions are posted as CompletionHandlers to the same strand. This subtle difference between this and the previous solution is the granularity of synchronization.
If the application code has multiple threads and the shared resource is accessed from threads processing the event loop and from threads not processing the event loop, then synchronization primitives, such as a mutex, needs to be used.
Also, even if a shared resource is small enough that writes and reads are always atomic, one should prefer using explicit and proper synchronization. For example, although the write and read may be atomic, without proper memory fencing to guarantee memory visibility, a thread may not observe a chance in memory even though the actual memory has chanced. Boost.Asio's will perform the proper memory barriers to guarantee visibility. For more details, on Boost.Asio and memory barriers, consider reading this answer.

dynamic tasklets or work queues

Background: I'm writing network traffic processing kernel module.
I'm getting packets using netfilter hooks. All filtering is done inside hook function, but I don't want to do packet processing here. So solution is tasklets or workqueues. I know the difference between them, I can use both, but I have some problems and I need an advice.
Tasklets solution. Preferrable. I can create and start tasklet for
each packet, but who will delete this tasklet? Tasklet function? I
don't think its a good idea - to dealloc tasklet while it is
executing. Create global pool of tasklets? Well, since there can't
be 2 executing tasklets on one processor, the pool size will be the
number of processors. But how to find out when tasklet is available
for new use? There are only two states: shed and run, but there is
no "done" state. Ok, I probably can wrap tasklet with some struct
with flag. But wouldn't that all be too much overkill?
Workqueue solution. Same problem: who will delete work? Same "solution" as for tasklets?
Workqueue solution 2. Just create permanent work due module loading, save packets to some queue and process them inside the work. May be two works and two queues: incoming and outgoing. But I'm afraid that with that solution I will use only one (or two) processors since looks like work can't be performed on few processors simultaneously.
Any other solutions?
One can use high-priority(WQ_HIGH_PRI), unbound(WQ_UNBOUND) workqueues and stick with option3 listed in the question.
WQ_HIGH_PRI guarantees that the processing is initiated ASAP. WQ_UNBOUND eliminates the single-CPU bottleneck as the scheduler assigns the work to any available CPU immediately.

Resources