Can SRW Lock be used as a binary semaphore? - winapi

Slim Reader/Writer (SRW) Locks is a synchronization primitive in Windows, available starting from Windows Vista.
Name and interface suggests that it should be used as non-timed shared non-recursive mutex. However, it is common to use it as non-shared mutex as well, to avoid CRTICAL_SECTION overhead (by using only Exclusive APIs).
I've noticed that it works also as a binary semaphore. This can come handy, as other semaphores available in Windows APIs - Event object and Semaphore object - are always a kernel call, so it is probably the only lightweight semaphore readily available from Windows API (and C++ has semaphores starting C++20, and boost thread also does not provide semaphores).
But is this reliable? Specifically, I have not found in the documentation explicit information that it can be used this way.
But, I have not found anything that prohibits this usage. The documentation seems to be uncertain.
What I'm expecting as an answer:
Maybe someone can point me to documentation wording that permits or prohibits semaphore usage
Maybe there's some practical experience with such usage
Maybe someone directly involved with SRW lock implementation could clarify (there's some chance, I think)
Example - this does not hang
#include <Windows.h>
#include <atomic>
SRWLOCK lock = SRWLOCK_INIT;
std::atomic<bool> can_release{ false };
DWORD CALLBACK Thread1(LPVOID)
{
for (int i = 0; i < 3; i++)
{
while (!can_release)
{
// spin
}
can_release = false;
::ReleaseSRWLockExclusive(&lock);
}
return 0;
}
DWORD CALLBACK Thread2(LPVOID)
{
for (int i = 0; i < 3; i++)
{
can_release = true;
::AcquireSRWLockExclusive(&lock);
}
return 0;
}
int main() {
::AcquireSRWLockExclusive(&lock);
HANDLE h1 = ::CreateThread(nullptr, 0, Thread1, nullptr, 0, nullptr);
HANDLE h2 = ::CreateThread(nullptr, 0, Thread2, nullptr, 0, nullptr);
::WaitForSingleObject(h1, INFINITE);
::WaitForSingleObject(h2, INFINITE);
::CloseHandle(h1);
::CloseHandle(h2);
return 0;
}

#Raymond Chen is right. Application Verifier reports errors for the code in question:
The code in question produces this error:
=======================================
VERIFIER STOP 0000000000000255: pid 0x1A44: The SRW lock being released was not acquired by this thread.
00007FF73979C170 : SRW Lock
00000000000025CC : Current ThreadId.
00000000000043F4 : ThreadId of the thread that acquired the SRW lock.
000001C1BEA8BF40 : Address of the acquire stack trace. Use dps <address> to see where the SRW lock was acquired.
=======================================
This verifier stop is continuable.
After debugging it use `go' to continue.
=======================================
=======================================
VERIFIER STOP 0000000000000253: pid 0x1A44: The SRW lock is being acquired recursively by the same thread.
00007FF73979C170 : SRW Lock
000001C1BEA8BF40 : Address of the first acquire stack trace. Use dps <address> to see where the SRW lock was acquired.
0000000000000000 : Not used
0000000000000000 : Not used
=======================================
This verifier stop is continuable.
After debugging it use `go' to continue.
=======================================
As of now, the documentation also explicitly prohibits releasing in a different thread, see ReleaseSRWLockExclusive, ReleaseSRWLockShared:
The SRW lock must be released by the same thread that acquired it. You can use Application Verifier to help verify that your program uses SRW locks correctly (enable Locks checker from Basic group).

Related

KSPIN_LOCK blocks when acquiring from Driver's main thread

I have a KSPIN_LOCK which is shared among a Windows driver's main thread and some threads I created with PsCreateSystemThread. The problem is that the main thread blocks if I try to acquire the spinlock and doesn't unblock. I'm very confused as to why this happens.. it's probably somehow connected to the fact that the main thread runs at driver IRQL, while the other threads run at PASSIVE_LEVEL as far as I know.
NOTE: If I only run the main thread, acquiring/releasing the lock works just fine.
NOTE: I'm using the functions KeAcquireSpinLock and KeReleaseSpinLock to acquire/release the lock.
Here's my checklist for a "stuck" spinlock:
Make sure the spinlock was initialized with KeInitializeSpinLock. If the KSPIN_LOCK holds uninitialized garbage, then the first attempt to acquire it will likely spin forever.
Check that you're not acquiring it recursively/nested. KSPIN_LOCK does not support recursion, and if you try it, it will spin forever.
Normal spinlocks must be acquired at IRQL <= DISPATCH_LEVEL. If you need something that works at DIRQL, check out [1] and [2].
Check for leaks. If one processor acquires the spinlock, but forgets to release it, then the next processor will spin forever when trying to acquire the lock.
Ensure there's no memory-safety issues. If code randomly writes a non-zero value on top of the spinlock, that'll cause it to appear to be acquired, and the next acquisition will spin forever.
Some of these issues can be caught easily and automatically with Driver Verifier; use it if you're not using it already. Other issues can be caught if you encapsulate the spinlock in a little helper that adds your own asserts. For example:
typedef struct _MY_LOCK {
KSPIN_LOCK Lock;
ULONG OwningProcessor;
KIRQL OldIrql;
} MY_LOCK;
void MyInitialize(MY_LOCK *lock) {
KeInitializeSpinLock(&lock->Lock);
lock->OwningProcessor = (ULONG)-1;
}
void MyAcquire(MY_LOCK *lock) {
ULONG current = KeGetCurrentProcessorIndex();
NT_ASSERT(KeGetCurrentIrql() <= DISPATCH_LEVEL);
NT_ASSERT(current != lock->OwningProcessor); // check for recursion
KeAcquireSpinLock(&lock->Lock, &lock->OldIrql);
NT_ASSERT(lock->OwningProcessor == (ULONG)-1); // check lock was inited
lock->OwningProcessor = current;
}
void MyRelease(MY_LOCK *lock) {
NT_ASSERT(KeGetCurrentProcessorIndex() == lock->OwningProcessor);
lock->OwningProcessor = (ULONG)-1;
KeReleaseSpinLock(&lock->Lock, lock->OldIrql);
}
Wrappers around KSPIN_LOCK are common. The KSPIN_LOCK is like a race car that has all the optional features stripped off to maximize raw speed. If you aren't counting microseconds, you might reasonably decide to add back the heated seats and FM radio by wrapping the low-level KSPIN_LOCK in something like the above. (And with the magic of #ifdefs, you can always take the airbags out of your retail builds, if you need to.)

Linux Interrupt Handling

I am trying to understand the Linux interrupt handling mechanism. I tried googling a bit but couldn't find an answer to this one. Can someone please explain it to me why the handle_IRQ_event needs to call local_irq_disable at the end? After this the control goes back to do_irq which eventually will go back to the entry point. Then who will enable the interrupts back.? It is the responsibility of the interrupt handler? If so why is that so?
Edit
Code for reference
asmlinkage int handle_IRQ_event(unsigned int irq, struct pt_regs *regs, struct irqaction *action)
{
int status = 1;
int retval = 0;
if (!(action->flags & SA_INTERRUPT))
local_irq_enable();
do
{
status |= action->flags;
retval |= action->handler(irq, action->dev_id, regs);
action = action->next;
}
while (action);
if (status & SA_SAMPLE_RANDOM)
add_interrupt_randomness(irq);
local_irq_disable();
return retval;
}
The version of handle_IRQ_event from LDD3 appears to come from the 2.6.8 kernel, or possibly earlier. Assuming we're dealing with x86, the processor clears the interrupt flag (IF) in the EFLAGS register before it calls the interrupt handler. The old EFLAGS register will be restored by the iret instruction.
Linux's SA_INTERRUPT IRQ handler flag (now obsolete) determines whether higher priority interrupts are allowed in the interrupt handler. The SA_INTERRUPT flag is set for "fast" interrupt handlers that left interrupts disabled. The SA_INTERRUPT flag is not set for "slow" interrupt handlers that re-enable interrupts.
Regardless of the SA_INTERRUPT flag, do_IRQ itself runs with interrupts disabled and they are still disabled when handle_IRQ_event is called. Since handle_IRQ_event can enable interrupts, the call to local_irq_disable at the end ensures they are disabled again on return to do_IRQ.
The relevant source code files in the 2.6.8 kernel for i386 architecture are arch/i386/kernel/entry.S, and arch/i386/kernel/irq.c.

Resolving Error R6016 - Not enough space for thread data

My statically-linked Visual C++ 2012 program sporadically generates a CRTL error: "R6016 - Not enough space for thread data".
The minimal documentation from Microsoft says this error message is generated when a new thread is spawned, but not enough memory can be allocated for it.
However, my code only explicitly spawns a new thread in a couple of well-defined cases, neither of which are occurring here (although certainly the Microsoft libraries internally spawn threads at well). One user reported this problem when the program was just existing in the background.
Not sure if it's relevant, but I haven't overridden the default 1MB reserved stack size or heap size, and the total memory in use by my program is usually quite small (3MB-10MB on a system with 12GB actual RAM, over half of which is unallocated).
This happens very rarely (so I can't track it down), and it's been reported on more than one machine. I've only heard about this on Windows 8.1, but I wouldn't read too much into that.
Is there some compiler setting somewhere that might influence this error? Or programming mistake?
This turned out to be caused by calling CreateThread rather than _beginthread. Microsoft documentation in the Remarks section states that CreateThread causes conflicts when using the CRT library, and indeed, once we made the change, we never saw that error again.
You have to call TlsAlloc in DllMain if the Windows version is Vista or higher .
implicit TLS handling was rewritten in Windows Vista [...]
threadprivate and __declspec(thread) should function correctly in
run-time loaded DLLs since then.
BOOL APIENTRY DllMain(HINSTANCE hinstDll, DWORD fdwReason,
LPVOID lpvReserved)
{
static BOOL fFirstProcess = TRUE;
BOOL fWin32s = FALSE;
DWORD dwVersion = GetVersion();
static DWORD dwIndex;
if ( !(dwVersion & 0x80000000) && LOBYTE(LOWORD(dwVersion))<4 )
fWin32s = TRUE;
if (dwReason == DLL_PROCESS_ATTACH) {
if (fFirstProcess || !fWin32s) {
dwIndex = TlsAlloc();
}
fFirstProcess = FALSE;
}
}
kb 118816
When a program is started the size of the TLS is determined by taking
into account the TLS size required by the executable as well as the
TLS requirements of all other implicitly loaded DLLs. When you load
another DLL dynamically with LoadLibrary or unload it with
FreeLibrary, the system has to examine all running threads and to
enlarge or compact their TLS storage accordingly.
Your DLL code should be modified to use such TLS functions as TlsAlloc, and to allocate TLS if the DLL is loaded with LoadLibrary. Or, the DLL that is using __declspec(thread) should only be implicitly loaded into the application.
Bottom line: LoadLibrary ain't thread-safe.
I discovered that process is in 32 bit.
In this case I increase memory to process with the command
bcdedit /set increaseuserva 3072

GetIpAddrTable() leaks memory. How to resolve that?

On my Windows 7 box, this simple program causes the memory use of the application to creep up continuously, with no upper bound. I've stripped out everything non-essential, and it seems clear that the culprit is the Microsoft Iphlpapi function "GetIpAddrTable()". On each call, it leaks some memory. In a loop (e.g. checking for changes to the network interface list), it is unsustainable. There seems to be no async notification API which could do this job, so now I'm faced with possibly having to isolate this logic into a separate process and recycle the process periodically -- an ugly solution.
Any ideas?
// IphlpLeak.cpp - demonstrates that GetIpAddrTable leaks memory internally: run this and watch
// the memory use of the app climb up continuously with no upper bound.
#include <stdio.h>
#include <windows.h>
#include <assert.h>
#include <Iphlpapi.h>
#pragma comment(lib,"Iphlpapi.lib")
void testLeak() {
static unsigned char buf[16384];
DWORD dwSize(sizeof(buf));
if (GetIpAddrTable((PMIB_IPADDRTABLE)buf, &dwSize, false) == ERROR_INSUFFICIENT_BUFFER)
{
assert(0); // we never hit this branch.
return;
}
}
int main(int argc, char* argv[]) {
for ( int i = 0; true; i++ ) {
testLeak();
printf("i=%d\n",i);
Sleep(1000);
}
return 0;
}
#Stabledog:
I've ran your example, unmodified, for 24 hours but did not observe that the program's Commit Size increased indefinitely. It always stayed below 1024 kilobyte. This was on Windows 7 (32-bit, and without Service Pack 1).
Just for the sake of completeness, what happens to memory usage if you comment out the entire if block and the sleep? If there's no leak there, then I would suggest you're correct as to what's causing it.
Worst case, report it to MS and see if they can fix it - you have a nice simple test case to work from which is more than what I see in most bug reports.
Another thing you may want to try is to check the error code against NO_ERROR rather than a specific error condition. If you get back a different error than ERROR_INSUFFICIENT_BUFFER, there may be a leak for that:
DWORD dwRetVal = GetIpAddrTable((PMIB_IPADDRTABLE)buf, &dwSize, false);
if (dwRetVal != NO_ERROR) {
printf ("ERROR: %d\n", dwRetVal);
}
I've been all over this issue now: it appears that there is no acknowledgment from Microsoft on the matter, but even a trivial application grows without bounds on Windows 7 (not XP, though) when calling any of the APIs which retrieve the local IP addresses.
So the way I solved it -- for now -- was to launch a separate instance of my app with a special command-line switch that tells it "retrieve the IP addresses and print them to stdout". I scrape stdout in the parent app, the child exits and the leak problem is resolved.
But it wins "dang ugly solution to an annoying problem", at best.

Device Driver IRQL and Thread/Context Switches

I'm new to Windows device driver programming. I know that certain operations can only be performed at IRQL PASSIVE_LEVEL. For example, Microsoft have this sample code of how to write to a file from a kernel driver:
if (KeGetCurrentIrql() != PASSIVE_LEVEL)
return STATUS_INVALID_DEVICE_STATE;
Status = ZwCreateFile(...);
My question is this: What is preventing the IRQL from being raised after the KeGetCurrentIrql() check above? Say a context or thread swithch occurs, couldn't the IRQL suddenly be DISPATCH_LEVEL when it gets back to my driver which would then result in a system crash?
If this is NOT possible then why not just check the IRQL in the DriverEntry function and be done with it once for all?
The irql of a thread can only be raised by itself.
Because you are called from upper/lower drivers, the irql of the current running context may be different. And there are a couple of functions that raise/lower the irql.
A couple examples :
IRP_MJ_READ
NTSTATUS DispatchRead(
__in struct _DEVICE_OBJECT *DeviceObject,
__in struct _IRP *Irp
)
{
// this will be called at irql == PASSIVE_LEVEL
...
// we have acquire a spinlock
KSSPIN_LOCK lck;
KeInititializeSpinLock( &lck );
KIRQL prev_irql;
KeAcquireSpinLock( &lck,&prev_irql );
// KeGetCurrentIrql() == DISPATCH_LEVEL
KeReleaseSpinLock( &lck, prev_irql );
// KeGetCurrentIrql() == PASSIVE_LEVEL
...
}
(Io-)Completion routines may be called at DISPATCH_LEVEL and so should behave accordingly.
NTSTATUS CompleteSth(IN PDEVICE_OBJECT DeviceObject,IN PIRP Irp,IN PVOID Context)
{
// KeGetCurrentIrql() >= PASSIVE_LEVEL
}
The IRQL can only change in any meaningful way under your control by setting it. There are two "thread specific" IRQLs - PASSIVE_LEVEL and APC_LEVEL. You control going in and out of these levels with things like fast mutexes, and a context switch to your thread will always leave you at the level you were in before. Above that are "processor specific" IRQLs. That is DISPATCH_LEVEL or above. In these levels a context switch cannot occur. You get into these levels using spin locks and such. ISRs will occur at higher IRQLs on your thread, but you can't see them. When they return control to you your IRQL is restored.
DriverEntry is also called at PASSIVE_LEVEL.
If you want to have a job done at PASSIVE_LEVEL then use functions like IoQueueWorkItem

Resources