Need help in understanding nonarbitrary thread context? - window

I was reading a MSDN doc about driver synchronization and I come across a statement that goes like this
a driver can wait if
• The driver is executing in a nonarbitrary thread context. That is,
you can identify the thread that will enter a wait state. In practice,
the only driver routines that execute in a nonarbitrary thread context
are the DriverEntry, AddDevice, Reinitialize, and Unload routines of
any driver, plus the dispatch routines of highest-level drivers. All
these routines are called directly by the system
now my question is that why dispatch routines are considered in arbitrary thread context ?? Since read, write and other routines will be invoked when a request will be raised from user space, so we can know that which thread did that in system space ?? My be I am completely messed up or it could be a silly question but still help me coz i am a newbie in windwos.

ok i found the answer in a document :) and here is what it states ..
Although the highest-level drivers receive I/O requests in the context
of the requesting thread, they often forward those requests to their
lower level drivers on different threads. Consequently, you can make
no assumptions about the contents of the user-mode address space at
the time such routines are called

Related

CoInitializeEx(COINIT_MULTITHREADED) and Goroutines using WMI

We have a monitoring agent written in Go that uses a number of goroutines to gather system metrics from WMI. We recently discovered the program was leaking memory when the go binary is run on Server 2016 or Windows 10 (also possibly on other OS using WMF 5.1). After creating a minimal test case to reproduce the issue it seems that the leak only occurs if you make a large number of calls to the ole.CoInitializeEx method (possibly something changed in WMF 5.1 but we could not reproduce the issue using the python comtypes package on the same system).
We are using COINIT_MULTITHREADED for multithread apartment (MTA) in our application, and my question is this: Because we are issuing OLE/WbemScripting calls from various goroutines, do we need to call ole.CoInitializeEx just once on startup or once in each goroutine? Our query code already uses runtime.LockOSThread to prevent the scheduler from running the method on different OS threads, but the MSDN remarks on CoInitializeEx seem to indicate it must be called at least once on each thread. I am not aware of any way to make sure new goroutines run on an already initialized OS thread, so multiple calls to CoInitializeEx seemed like the correct approach (and worked fine for the last few years).
We have already refactored the code to do all the WMI calls on a dedicated background worker, but I am curious to know if our original code should work using only one CoInitializeEx at startup instead of once for every goroutine.
AFAIK, since Win32 API is defined only in terms of native OS threads, a call to CoInitialize[Ex]() only ever affects the thread it completed on.
Since the Go runtime uses free M×N scheduling of the goroutines to OS threads, and these threads are created / deleted as needed at runtime in a manner completely transparent to the goroutines, the only way to make sure the CoInitialize[Ex]() call has any lasting effect on the goroutine it was performed on is to first bind that goroutine to its current OS thread by calling runtime.LockOSThread() and doing this for every goroutine intended to do COM calls.
Please note that this basically creates an 1×1 mapping between goroutines and OS threads which defeats much of the purpose of goroutines to begin with. So supposedly you might want to consider having just a single goroutine calling into COM and listening for requests on a channel, or having
a pool of such worker goroutines hidden behing another one which would dispatch the clients' requests onto the workers.
Update regarding COINIT_MULTITHREADED.
To cite the docs:
Multi-threading (also called free-threading) allows calls to methods
of objects created by this thread to be run on any thread. There is no
serialization of calls — many calls may occur to the same method or
to the same object or simultaneously. Multi-threaded object
concurrency offers the highest performance and takes the best
advantage of multiprocessor hardware for cross-thread, cross-process,
and cross-machine calling, since calls to objects are not serialized
in any way. This means, however, that the code for objects must
enforce its own concurrency model, typically through the use of
synchronization primitives, such as critical sections, semaphores, or
mutexes. In addition, because the object doesn't control the lifetime
of the threads that are accessing it, no thread-specific state may be
stored in the object (in Thread Local Storage).
So basically the COM threading model has nothing to do with initialization of the threads theirselves—but rather with how the COM subsystem is allowed to call the methods of the COM objects you create on the COM-initialized threads.
IIUC, if you will COM-initialize a thread as COINIT_MULTITHREADED and create some COM object on it, and then pass its reference to some outside client of that object so that it is able to call that object's methods, those methods can be called by the OS on any thread in your process.
I really have no idea how this is supposed to interact with Go runtime,
so I'd start small and would use a single thread with STA model and then
maybe try to make it more complicated if needed.
On the other hand, if you only instantiate external COM objects and not
transfer their descriptors outside (and it appears that's the case),
the threading model should not be relevant. That is, only unless some
code in the WUA API would call some "event-like" method on a COM object you
have instantiated.

What is guarded region and how it differs from critical region?

Threre is a "guarded region" concept in windows, that is similar to critical region. Who knows how is it differs from Critical?
Recall that a process on any modern OS is made of several components. The ones we're interested in are:
code: the body of the program being executed,
memory: code, as well as execution related data (heap and stack) are stored there for the duration of a process,
thread: an execution context, ie CPU state and memory bits (stack for instance) related to that context,
signal slots: a process structure holding signals emitted by threads to each other, each thread has one such structure (ie, it's one of the memory bits of a thread)
Within a process, there may be several instances of these objects; usually code and memory are accessible by all the existing threads within a process. At any time there may be as many active threads (ie, executing concurrently) as there are available cores on your CPUs. These threads, if they belong to a same process, might interfer when accessing the same memory or code sections. For that purpose, Windows implements the so called critical sections, which are basically protected code blocks (it is very similar to the concept of synchronized code blocks in Java).
However, threads may also be diverted from their current execution code path when a signal is triggered or posted to them. On Windows, APCs are one form of that signaling mechanism. Guarded sections are available to make sure that a thread will complete a given code block before being able to handle these signals (APC in this case).
So, while a critical section will prevent any other thread than the active one to execute the protected code section concurrently, a guarded section will ensure that the current thread will not execute any other code than the guarded one once it started it.
As a simple analogy, imagine a code section like a flat in a building (the process code). A person (thread) who enters a critically protected code will lock the flat's main door, thus preventing any other person to enter it while she's still inside. If the code is guarded, then the OS will lock the person's cellphone, preventing her from answering calls until she actually leaves the flat.
A typical scenario for critical sections is when a specific resource needs to be accessed exclusively (a socket, an file handle, an in memory data structure). For APCs, similarly, guarded sections would prevent a thread from interfering with itself by trying to access one such resource in an APC execution when it was already using it in its current execution code path.

how come ruby's single os thread doesn't block while copying a file?

My assumptions:
MRI ruby 1.8.X doesn't have native threads but green threads.
The OS is not aware of these green threads.
issuing an IO-heavy operation should suspend the whole process until the proper IO interruption is issued back.
With these I've created a simple ruby program that does the following:
starts a thread that prints "working!" every second.
issues an IO request to copy a large (1gb) file on the "main" thread.
Now one would guess that being the green threads invisible to the OS, it would put the whole process on the "blocked" queue and the "working!" green thread would not execute. Surprisingly, it works :S
Does anyone know what's going on there? Thanks.
There is no atomic kernel file copy operation. It's a lot of fairly short reads and writes that are entering and exiting the kernel.
As a result, the process is constantly getting control back. Signals are delivered.
Green threads work by hooking the Ruby-level thread dispatcher into low-level I/O and signal reception. As long as these hooks catch control periodically the green threads will act quite a bit like more concurrent threads would.
Unix originally had a quite thread-unaware but beautifully simple abstract machine model for the user process environment.
As the years went by support for concurrency in general and threads in particular were added bit-by-bit in two different ways.
Lots of little kludges were added to check if I/O would block, to fail (with later retry) if I/O would block, to interrupt slow tty I/O for signals but then transparently return to it, etc. When the Unix API's were merged each kludge existed in more than one form. Lots of choices.1.
Direct support for threads in the form of multiple kernel-visible processes sharing an address space was also added. These threads are dangerous and untestable but widely supported and used. Mostly, programs don't crash. As time goes on, latent bugs become visible as the hardware supports more true concurrency. I'm not the least bit worried that Ruby doesn't fully support that nightmare.
1. The good thing about standards is that there are so many of them.
When MRI 1.9 initiates, it spawns two native threads. One thread is for the VM, the other is used to handle signals. Rubinis uses this strategy, as does the JVM. Pipes can be used to communicate any info from other processes.
As for the FileUtils module, the cd, pwd, mkdir, rm, ln, cp, mv, chmod, chown, and touch methods are all, to some degree, outsourced to OS native utilities using the internal API of the StreamUtils submodule while the second thread is left to wait for a signal from the an outside process. Since these methods are quite thread-safe, there is no need to lock the interpreter and thus the methods don't block eachother.
Edit:
MRI 1.8.7 is quite smart, and knows that when a Thread is waiting for some external event (such as a browser to send an HTTP request), the Thread can be put to sleep and be woken up when data is detected. - Evan Phoenix from Engine Yard in Ruby, Concurrency, and You
The implementation basic implementation for FileUtils has not changed much sense 1.8.7 from looking at the source. 1.8.7 also uses a sleepy timer thread to wait for a IO response. The main difference in 1.9 is the use of native threads rather than green threads. Also the thread source code is much more refined.
By thread-safe I mean that since there is nothing shared between the processes, there is no reason to lock the global interpreter. There is a misconception that Ruby "blocks" when doing certain tasks. Whenever a thread has to block, i.e. wait without using any cpu, Ruby simply schedules another thread. However in certain situations, like a rack-server using 20% of the CPU waiting for a response, it can be appropriate to unlock the interpreter and allow concurrent threads to handle other requests during the wait. These threads are, in a sense, working in parallel. The GIL is unlocked with the rb_thread_blocking_region API. Here is a good post on this subject.

What are event handles?

I had a leaking handle problem ("Not enough quota available to process this command.") in some inherited C# winforms code, so I went and used Sysinternals' Handle tool to track it down. Turns out it was Event Handles that were leaking, so I tried googled it (took a couple tries to find a query that didn't return "Did you mean: event handler?"). According to Junfeng Zhang, event handles are generated by the use of Monitor, and there may be some weird rules as far as event handle disposal and the synchonization primitives.
I'm not entirely sure that the source of my leaking handles are entirely due to simply long-lived objects calling lots of synchronization stuff, as this code is also dealing with HID interfaces and lots of win32 marshaling and interop, and was not doing any synchronization that I was aware of. Either way, I'm just going to run this in windbg and start tracing down where the handles are originating from, and also spend a lot of time learning this section of the code, but I had a very hard time finding information about what event handles are in the first place.
The msdn page for the event kernel object just links to the generic synchronization overview... so what are event handles, and how are they different from mutexes/semaphores/whatever?
The NT kernel uses event objects to allow signals to transferred to entities that wait on the signal. A mutex and a semaphore are also waitable kernel objects (Kernel Dispatcher Objects), but with different semantics. The only time I ever came across them was when waiting for IO to complete in drivers.
So my theory on your problem is possibly a faulty driver, or are you relying on specialised hardware?
Edit: More info (from Windows Internals 5th Edition - Chapter 3 System Mechanics)
Some Kernel Dispatcher Objects (e.g. mutex, semaphore) have the of concept ownership. So when signalled the released one waiting thread will be released will grab these resources. And others will have to continue to wait. Events are not owned hence are available to be reset by any thread.
Also there are three types of events:
Notification : On signalled all waiting threads are released
Synchronisation : On signalled one waiting thread is released but the event is reset
Keyed : On signalled one waiting thread in the same process as the signaller is released.
Another interesting thing that I've learned is that critical sections (the lock primitive in c#) are actually not kernel objects, rather they are implemented out of a keyed event, or mutex or semaphore as required.
If you're talking about kernel Event Objects, then an event handle will be a handle (Int) that the system keeps on this object so other objects can reference it. IE Keep a 'handle' on it.
Hope this helps!

Is putting thread on hold optimal?

Application has an auxiliary thread. This thread is not meant to run all the time, but main process can call it very often.
So, my question is, what is more optimal in terms of CPU performance: suspend thread when it is not being used or keep it alive and use WaitForSingleObject function in order to wait for a signal from main process?
In terms of CPU resources used, both solutions are the same - the thread which is suspended and thread which is waiting in WaitForSingleObject for an object which is not signalled both get no CPU cycles at all.
That said, WaitForSingleObject is almost always a prefered solution because the code using it will be much more "natural" - easier to read, and easier to make right. Suspending/Resuming threads can be dangerous, because you need to take a lot of care to make sure you know you are suspending a thread in a state where suspending it will do no harm (imagine suspending a thread which is currently holding a mutex).
I would assume Andrei doesn't use Delphi to write .NET and therefore Suspend doesn't translate to System.Threading.Thread.Suspend but to SuspendThread Win32 API.
I would strongly suggest against it. If you suspend the thread in an arbitrary moment, then you don't know what's going to happen (for example, you may suspend the thread in such a state the some shared resource is blocked). If you however already know that the thread is in suspendable state, then simply use WaitForSingleObject (or any other WaitFor call) - it will be equally effective as suspending the thread, i.e. thread will use zero CPU time until it is awaken.
What do you mean by "suspend"? WaitForSingleObject will suspend the thread, i.e., it will not consume any CPU, until the signal arrives.
If it's a worker thread that has units of work given to it externally, it should definitely be using signalling objects as that will ensure it doesn't use CPU needlessly.
If it has any of its own work to do as well, that's another matter. I wouldn't suspend the thread from another thread (what happens if there's two threads delivering work to it?) - my basic rule is that threads should control their own lifetime with suggestions from other threads. This localizes all control in the thread itself.
See the excellent tutorial on multi-threading in Delphi :
Multi Threading Tutorial
Another option would be the TMonitor introduced in Delphi 2009, which has functions like Wait, Pulse and PulseAll to keep threads inactive when there is nothing to do for them, and notify them as soon as they should continue with their work. It is loosely modeled after the object locks in Java. Like there, Delphi object now have a "lock" field which can be used for thread synchrinozation.
A blog which gives an example for a threaded queue class can be found at http://delphihaven.wordpress.com/2011/05/04/using-tmonitor-1/
Unfortunately there was a bug in the TMonitor implementation, which seems to be fixed in XE2

Resources