Difference between EnterCriticalSection() and TryEnterCriticalSection() and InitializeCriticalSection() and InitializeCriticalSectionAndSpinCount()

Difference between EnterCriticalSection() and TryEnterCriticalSection() and InitializeCriticalSection() and InitializeCriticalSectionAndSpinCount() - c++11

As i think that TryEnterCriticalSectioncan be called only once. So i just want to know what is difference between TryEnterCriticalSection and EnterCriticalSection.
#include<windows.h>
CRITICAL_SECTION csOpsPrintData;
void CreateCriticalSectionAsNeccessary()
{
InitializeCriticalSection(&csOpsPrintData);
InitializeCriticalSectionAndSpinCount(&csOpsPrintData,5);
EnterCriticalSection(&csOpsPrintData);
TryEnterCriticalSection(&csOpsPrintData);
}

The difference is that TryEnterCriticalSection returns immediately,
regardless of whether it obtained ownership of the critical section,
while EnterCriticalSection blocks until the thread can take ownership
of the critical section.
MSDN, Bing is your friend.
TryEnterCriticalSection also returns a value (whereas EnterCriticalSection doesn't) , nonzero if the call was successful and thus the ownership has been claimed by the thread. Zero if this is not the case.

Related

Should InterlockedExchange be used on all setting of a variable?

I'm using InterlockedExchange in Windows and I have two questions that put together are basically my title question.
InterlockedExchange uses type LONG (32-bits). According to Microsoft's documentation Interlocked Variable Access: "simple reads and writes to 32-bit variables are atomic operations without InterlockedExchange". According to function documentation InterlockedExchange: "This function is atomic with respect to calls to other interlocked functions". Is it or is it not atomic to read/write a LONG in Windows without the interlocked functions?
I am reviewing some code in which one thread sets a variable then all further accesses on that variable by that thread or any number of threads it created use InterlockedExchange. To make it simple consider thread running main() creates a thread running other():
LONG foo;
main()
{
foo = TRUE;
createthread(other());
/* do other things */
if(InterlockedExchange(&foo, 0))
{
cleanup()
}
}
other()
{
/* do other things */
if(InterlockedExchange(&foo, 0))
{
cleanup()
}
}
cleanup()
{
/* it's expected this is only called once */
}
Is it possible in this example that foo will not appear TRUE to either of the InterlockedExchange calls? If main is busy doing other things so that the first InterlockedExchange call is made from the other thread, does that mean foo is guaranteed to have been written by the main thread and visible to the other thread at that time?
Sorry if this is unclear I don't know how to phrase it better.

Is it or is it not atomic to read/write a LONG in Windows without the interlocked functions?
Yes to both, assuming default alignment. This follows from the quoted statement "simple reads and writes to properly-aligned 32-bit variables are atomic operations", because LONG is a 32-bit signed integer in Windows, regardless of the bitness of the OS.
Is it possible in this example that foo will not appear TRUE to either of the InterlockedExchange calls?
No, not possible if both calls are reached. That's because within a single thread the foo = TRUE; write is guaranteed to be visible to the InterlockedExchange call that comes after it. So the InterlockedExchange call in main will see either the TRUE value previously set in main, or the FALSE value reset in the other thread. Therefore, one of the InterlockedExchange calls must read the foo value as TRUE.
However, if the /* do other things */ code in main is an infinite loop while(1); then that would leave only one InterlockedExchange outstanding in other, and it may be possible for that call to see foo as FALSE, for the same reason as...
If main is busy doing other things so that the first InterlockedExchange call is made from the other thread, does that mean foo is guaranteed to have been written by the main thread and visible to the other thread at that time?
Not necessarily. The foo = TRUE; write is visible to the main thread at the point the secondary thread is created, but may not necessarily be visible to the other thread when it starts, or even when it gets to the InterlockedExchange call.

writing partial data with libwebsockets

I'm using the libwebsockets v2.4.
The doc seems unclear to me about what I have to do with the returned value of the lws_write() function.
If it returns -1, it's an error and I'm invited to close the connection. That's fine for me.
But when it returns a value that is strictly inferior to the buffer length I pass, should I consider that I have to write the last bytes that could not be written later (in another WRITABLE callback occurrence). Is it even possible to have this situation?
Also, should I use the lws_send_pipe_choked() before using the lws_write(), considering that I always use lws_write() in the context of a WRITABLE callback?

My understanding is that lws_write always return the asked buffer length except is an error occurs.
If you look at lws_issue_raw() (from which the result is returned by lws_write()) in output.c (https://github.com/warmcat/libwebsockets/blob/v2.4.0/lib/output.c#L157), you can see that if the length written by lws_ssl_capable_write() is less than the provided length, then the lws allocate a buffer to fill up the remaining bytes on wsi->trunc_alloc, in order for it to be sent in the future.
Concerning your second question, I think it is safe to call lws_write() in the context of a WRITABLE callback without checking if the pipe is choked. However, if you happen to loop on lws_write() in the callback, lws_send_pipe_choked() must be called in order to protect the subsequent calls to lws_write(). If you don't, you might stumble upon this assertion https://github.com/warmcat/libwebsockets/blob/v2.4.0/lib/output.c#L83 and the usercode will crash.

Is returning while holding a spinlock automatically unsafe?

The venerated book Linux Driver Development says that
The flags argument passed to spin_unlock_irqrestore must be the same variable passed to spin_lock_irqsave. You must also call spin_lock_irqsave and spin_unlock_irqrestore in the same function; otherwise your code may break on some architectures.
Yet I can't find any such restriction required by the official documentation bundled with the kernel code itself. And I find driver code that violates this guidance.
Obviously it isn't a good idea to call spin_lock_irqsave and spin_unlock_irqrestore from separate functions, because you're supposed to minimize the work done while holding a lock (with interrupts disabled, no less!). But have changes to the kernel made it possible if done with care, was it never actually against the API contract, or is it still verboten to do so?
If the restriction has been removed at some point, did it apply to version 3.10.17?

This is just a guess, but the might be unclearly referring to a potential bug which could happen if you try to use a nonlocal variable or storage location for flags.
Basically, flags has to be private to the current execution context, which is why spin_lock_irqsave is a macro which takes the name of the flags. While flags is being saved, you don't have the spinlock yet.
How this is related to locking and unlocking in a different function:
Consider two functions that some driver developer might write:
void my_lock(my_object *ctx)
{
spin_lock_irqsave(&ctx->mylock, ctx->myflags); /* BUG */
}
void my_unlock(my_object *ctx)
{
spin_unlock_irqrestore(&ctx->mylock, ctx->myflags);
}
This is a bug because at the time ctx->myflags is written, the lock is not yet held, and it is a shared variable visible to other contexts and processors. The local flags must be saved to a private location on the stack. Then when the lock is owned, by the caller, a copy of the flags can be saved into the exclusively owned object. In other words, it can be fixed like this:
void my_lock(my_object *ctx)
{
unsigned long flags;
spin_lock_irqsave(&ctx->mylock, flag);
ctx->myflags = flags;
}
void my_unlock(my_object *ctx)
{
unsigned long flags = ctx->myflags; /* probably unnecessary */
spin_unlock_irqrestore(&ctx->mylock, flags);
}
If it couldn't be fixed like that, it would be very difficult to implement higher level primitives which need to wrap IRQ spinlocks.
How it could be arch-dependent:
Suppose that spin_lock_irqsave expands into machine code which saves the current flags in some register, then acquires the lock, and then saves that register into specified flags destination. In that case, the buggy code is actually safe. If the expanded code saves the flags into the actual flags object designated by the caller and then tries to acquire the lock, then it's broken.

I have never see that constraint aside from the book. Probably, given information in the book is just outdated, or .. simply wrong.
In the current kernel(and at least since 2.6.32, which I start to work with) actual locking is done through many level of nested calls from spin_lock_irqsave(see, e.g. __raw_spin_lock_irqsave, which is called in the middle). So different function's context for lock and unlock may hardly be a reason for misfunction.

TCriticalSection.LockCount and negative values [duplicate]

I am debugging a deadlock issue and call stack shows that threads are waiting on some events.
Code is using critical section as synchronization primitive I think there is some issue here.
Also the debugger is pointing to a critical section that is owned by some other thread,but lock count is -2.
As per my understanding lock count>0 means that critical section is locked by one or more threads.
So is there any possibility that I am looking at right critical section which could be the culprit in deadlock.
In what scenarios can a critical section have negative lock count?

Beware: since Windows Server 2003 (for client OS this is Vista and newer) the meaning of LockCount has changed and -2 is a completely normal value, commonly seen when a thread has entered a critical section without waiting and no other thread is waiting for the CS. See Displaying a Critical Section:
In Microsoft Windows Server 2003 Service Pack 1 and later versions of Windows, the LockCount field is parsed as follows:
The lowest bit shows the lock status. If this bit is 0, the critical section is locked; if it is 1, the critical section is not locked.
The next bit shows whether a thread has been woken for this lock. If this bit is 0, then a thread has been woken for this lock; if it is 1, no thread has been woken.
The remaining bits are the ones-complement of the number of threads waiting for the lock.

I am assuming that you are talking about CCriticalSection class in MFC. I think you are looking at the right critical section. I have found that the critical section's lock count can go negative if the number of calls to Lock() function is less than the number of Unlock() calls. I found that this generally happens in the following type of code:
void f()
{
CSingleLock lock(&m_synchronizer, TRUE);
//Some logic here
m_synchronizer.Unlock();
}
At the first glance this code looks perfectly safe. However, note that I am using CCriticalSection's Unlock() method directly instead of CSingleLock's Unlock() method. Now what happens is that when the function exits, CSingleLock in its destructor calls Unlock() of the critical section again and its lock count becomes negative. After this the application will be in a bad shape and strange things start to happen. If you are using MFC critical sections then do check for this type of problems.

Are mutexes really slower?

I have read so many times, here and everywhere on the net, that mutexes are slower than critical section/semaphores/insert-your-preferred-synchronisation-method-here. but i have never seen any paper or study or whatever to back up this claim.
so, where does this idea come from ? is it a myth or a reality ? are mutexes really slower ?

In the book "Multithreading applications in win32" by Jim Beveridge and Robert Wiener it says: "It takes almost 100 times longer to lock an unowned mutex than it does to lock an unowned critical section because the critical section can be done in user mode without involving the kernel"
And on MSDN here it says "critical section objects provide a slightly faster, more efficient mechanism for mutual-exclusion synchronization"

I don't believe that any of the answers hit on the key point of why they are different.
Mutexes are at operating system level. A named mutex exists and is accessible from ANY process in the operating system (provided its ACL allows access from all).
Critical sections are faster as they don't require the system call into kernel mode, however they will only work WITHIN a process, you cannot lock more than one process using a critical section. So depending on what you are trying to achieve and what your software design looks like, you should choose the most appropriate tool for the job.
I'll additionally point out to you that Semaphores are separate to mutex/critical sections, because of their count. Semaphores can be used to control multiple concurrent access to a resource, where as a mutex/critical section is either being accessed or not being accessed.

A CRITICAL_SECTION is implemented as a spinlock with a capped spin count. See MSDN InitializeCriticalSectionAndSpinCount for the indication of this.
When the spin count 'elapsed', the critical section locks a semaphore (or whatever kernel-lock it is implemented with).
So in code it works like this (not really working, should just be an example) :
CRITICAL_SECTION s;
void EnterCriticalSection( CRITICAL_SECTION* s )
{
int spin_count = s.max_count;
while( --spin_count >= 0 )
{
if( InterlockedExchange( &s->Locked, 1 ) == 1 )
{
// we own the lock now
s->OwningThread = GetCurrentThread();
return;
}
}
// lock the mutex and wait for an unlock
WaitForSingleObject( &s->KernelLock, INFINITE );
}
So if your critical section is only held a very short time, and the entering thread does only wait very few 'spins' (cycles) the critical section can be very efficient. But if this is not the case, the critical section wastes many cycles doing nothing, and then falls back to a kernel synchronization object.
So the tradeoff is :
Mutex :
Slow acquire/release, but no wasted cycles for long 'locked regions'
CRITICAL_SECTION : Fast acquire/release for unowned 'regions', but wasted cycles for owned sections.

Yes, critical sections are more efficient. For a very good explanation, get "Concurrent Programming on Windows".
In a nutshell: a mutex is a kernel object, so there is always a context switch when you acquire one, even if "free". A critical section can be acquired without a context switch in that case, and (on an multicore/processor machine) it will even spin a few cycles if it's blocked to prevent the expensive context switch.

A mutex (at least in windows) allows for synchronizations between different processes in addition to threads. This means extra work must be done to ensure this. Also, as Brian pointed out, using a mutex also requires a switch to "kernel" mode, which causes another speed hit (I believe, i.e. infer, that the kernel is required for this interprocess synchronization, but I've got nothing to back me up on that).
Edit: You can find explicit reference to interprocess synchronization here and for more info on this topic, have a look at Interprocess Synchronization

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio