Do WinAPI Slim Reader/Writer (SRW) Locks use memory barriers?

Do WinAPI Slim Reader/Writer (SRW) Locks use memory barriers? - winapi

The WinAPI Docu says:
"The following synchronization functions use the appropriate barriers to ensure memory ordering:
Functions that enter or leave critical sections
Functions that signal synchronization objects
Wait functions
Interlocked functions"
Synchonization Docu
Now the question is: Do WinAPI Slim Reader/Writer (SRW) Locks also use memory barriers?
SRW Locks Docu
Note: WinAPI SRW Locks are neither critical sections nor synchronization objects.

Related

Atomicity guaranteed on "lock" methods in Linux?

We know that the following kernel methods in Linux allow us to apply various locking mechanisms on shared data. But, does Linux guarantee atomicity on the methods themselves? With the exception to methods related to normal and reader-writer spin locks, which cannot sleep, wouldn't it be catastrophic if a thread of execution is preempted while it has partially executed any of the other methods mentioned below?
Spin Lock Methods
spin_lock();
spin_lock_irq();
spin_lock_irqsave();
spin_unlock();
spin_unlock_irq();
spin_unlock_irqrestore();
spin_lock_init();
spin_trylock();
spin_is_locked();
Reader-Writer Spin Lock Methods
read_lock();
read_lock_irq();
read_lock_irqsave();
read_unlock();
read_unlock_irq();
read_unlock_ irqrestore();
write_lock();
write_lock_irq();
write_lock_irqsave();
write_unlock();
write_unlock_irq();
write_unlock_irqrestore();
write_trylock();
rwlock_init();
Semaphore Methods
sema_init();
init_MUTEX();
init_MUTEX_LOCKED();
down_interruptible();
down();
down_trylock();
up();
Reader-Writer Semaphore Methods
init_rwsem();
down_read();
up_read();
down_write();
up_write();
down_read_trylock();
down_write_trylock();
downgrade_write();
Mutex Methods
mutex_lock();
mutex_unlock();
mutex_trylock();
mutex_is_locked();
Completion Variable Methods
init_completion();
wait_for_completion();
complete();

If these functions were not atomic with respect to the lock itself, they would not work at all. And last time I looked, my Linux did work.
Most of these functions indeed disable preemption while doing their stuff.

Semaphore, Reader-writer semeaphores would automatically disable kernel preemption in SMP systems.

Synchronization level (executive or kernel) used by monitors, mutex, and semaphore

I understand that the kernel can synchronize processes via the spinlock method. However, when it comes down to one processor how does it do so? How does it use a synchronization object to ensure mutual exclusion?
Is a semaphore at the level of the executive? How does the kernel come into play here?
Are mutexes only implemented at the level of the kernel? They do not give off a signal or message when the resource is free.

You've got several questions here:
I understand that the kernel can synchronize processes via the
spinlock method. However, when it comes down to one processor how does
it do so? How does it use a synchronization object to ensure mutual
exclusion?
On uni-processor machines, acquiring a spinlock simply raises the IRQL to >DISPATCH_LEVEL - a thread at such elevated IRQL cannot be pre-empted, so synchronization is guaranteed.
Is a semaphore at the level of the executive? How does the kernel come
into play here?
Semaphores, mutexes, (and most waitable objects, for that matter) are Kernel Dispatch Objects. Such objects are implemented by the kernel, and are made available to user mode applications via various functions exported by KERNEL32.DLL (CreateEvent/Mutex/Semaphore, et.al.). In addition, the "kernel comes into play" by scheduling thread waits, and awakening threads that are waiting on synchronization objects.
Are mutexes only implemented at the level of the kernel?
Mutex objects are indeed kernel dispatch objects (KMUTEX). A mutex object is signalled when it is un-owned. When a thread acquires a mutex, it's state goes to non-signalled, which means that any other thread that attempts to acquire it will be put into a wait state until either the mutex is acquired, or the wait times out.
For more detailed explanations on kernel dispatcher objects, as well as Windows synchronization in general, have a peek at the latest version of "Windows Internals" - every Windows developer should have a copy of this on their desk, IMHO.

'They do not give off a signal or message when the resource is free' - sure they do - they are an inter-thread signaling mechanism! A thread waiting on the mutex is signaled and made ready when the protected resource is released, so acquiring the mutex.
Spinlocks are generally not used on single-core processors - there is no point. TBH, spinlocks need great care on multi-core and clustered systems too if their use is not to be counter-productive.

CRITICAL_SECTION in boost?

is there something in boost that translates to windows CRITICAL_SECTION?
CRITICAL_SECTION is a so called "user mode" mutex that uses spin locks instead of blocking and avoids expensive transitions to the kernel.

Boost::Mutex is what you want, versions up to 1.34.1 used a win32 critical section, but new ones use a win32 event and locks. I don't know why - win32 mutexes are perfectly fine and as fast as an event (surely, he said...) unless you don't know if you need the crossprocess capability of them, or the sole-process limitation of a critical_section.
That said, chances are the performance implications of locking are mainly down to losing the rest of your threrad quantum, not necessarily kernel transitions.

How best to synchronize memory access shared between kernel and user space, in Windows

I can't find any function to acquire spinlock in Win32 Apis.
Is there a reason?
When I need to use spinlock, what do I do?
I know there is an CriticalSectionAndSpinCount function.
But that's not what I want.
Edit:
I want to synchronize a memory which will be shared between kernel space and user space. -The memory will be mapped.
I should lock it when I access the data structure and the locking time will be very short.
The data structure(suppose it is a queue) manages event handles to interaction each other.
What synchronization mechanism should I use?

A spinlock is clearly not appropriate for user-level synchronization. From http://www.microsoft.com/whdc/driver/kernel/locks.mspx:
All types of spin locks raise the IRQL
to DISPATCH_LEVEL or higher. Spin
locks are the only synchronization
mechanism that can be used at IRQL >=
DISPATCH_LEVEL. Code that holds a spin
lock runs at IRQL >= DISPATCH_LEVEL,
which means that the system’s thread
switching code (the dispatcher) cannot
run and, therefore, the current thread
cannot be pre-empted.
Imagine if it were possible to take a spin lock in user mode: Suddenly the thread would not be able to be pre-empted. So on a single-cpu machine, this is now an exclusive and real-time thread. The user-mode code would now be responsible for handling interrupts and other kernel-level tasks. The code could no longer access any paged memory, which means that the user-mode code would need to know what memory is currently paged and act accordingly. Cats and dogs living together, mass hysteria!
Perhaps a better question would be to tell us what you are trying to accomplish, and ask what synchronization method would be most appropriate.

There is a managed user-mode SpinLock as described here. Handle with care, as advised in the docs - it's easy to go badly wrong with these locks.
The only way to access this in native code is via the Win32 API you named already - CriticalSectionAndSpinCount and its siblings.

Are mutexes really slower?

I have read so many times, here and everywhere on the net, that mutexes are slower than critical section/semaphores/insert-your-preferred-synchronisation-method-here. but i have never seen any paper or study or whatever to back up this claim.
so, where does this idea come from ? is it a myth or a reality ? are mutexes really slower ?

In the book "Multithreading applications in win32" by Jim Beveridge and Robert Wiener it says: "It takes almost 100 times longer to lock an unowned mutex than it does to lock an unowned critical section because the critical section can be done in user mode without involving the kernel"
And on MSDN here it says "critical section objects provide a slightly faster, more efficient mechanism for mutual-exclusion synchronization"

I don't believe that any of the answers hit on the key point of why they are different.
Mutexes are at operating system level. A named mutex exists and is accessible from ANY process in the operating system (provided its ACL allows access from all).
Critical sections are faster as they don't require the system call into kernel mode, however they will only work WITHIN a process, you cannot lock more than one process using a critical section. So depending on what you are trying to achieve and what your software design looks like, you should choose the most appropriate tool for the job.
I'll additionally point out to you that Semaphores are separate to mutex/critical sections, because of their count. Semaphores can be used to control multiple concurrent access to a resource, where as a mutex/critical section is either being accessed or not being accessed.

A CRITICAL_SECTION is implemented as a spinlock with a capped spin count. See MSDN InitializeCriticalSectionAndSpinCount for the indication of this.
When the spin count 'elapsed', the critical section locks a semaphore (or whatever kernel-lock it is implemented with).
So in code it works like this (not really working, should just be an example) :
CRITICAL_SECTION s;
void EnterCriticalSection( CRITICAL_SECTION* s )
{
int spin_count = s.max_count;
while( --spin_count >= 0 )
{
if( InterlockedExchange( &s->Locked, 1 ) == 1 )
{
// we own the lock now
s->OwningThread = GetCurrentThread();
return;
}
}
// lock the mutex and wait for an unlock
WaitForSingleObject( &s->KernelLock, INFINITE );
}
So if your critical section is only held a very short time, and the entering thread does only wait very few 'spins' (cycles) the critical section can be very efficient. But if this is not the case, the critical section wastes many cycles doing nothing, and then falls back to a kernel synchronization object.
So the tradeoff is :
Mutex :
Slow acquire/release, but no wasted cycles for long 'locked regions'
CRITICAL_SECTION : Fast acquire/release for unowned 'regions', but wasted cycles for owned sections.

Yes, critical sections are more efficient. For a very good explanation, get "Concurrent Programming on Windows".
In a nutshell: a mutex is a kernel object, so there is always a context switch when you acquire one, even if "free". A critical section can be acquired without a context switch in that case, and (on an multicore/processor machine) it will even spin a few cycles if it's blocked to prevent the expensive context switch.

A mutex (at least in windows) allows for synchronizations between different processes in addition to threads. This means extra work must be done to ensure this. Also, as Brian pointed out, using a mutex also requires a switch to "kernel" mode, which causes another speed hit (I believe, i.e. infer, that the kernel is required for this interprocess synchronization, but I've got nothing to back me up on that).
Edit: You can find explicit reference to interprocess synchronization here and for more info on this topic, have a look at Interprocess Synchronization

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio