Deadlocks with pthreads and CreateThread - windows

I'm using pthreads in a Windows application. I noticed my program was deadlocking--a quick inspection showed that the following had occurred:
Thread 1 spawned Thread 2. Thread 2 spawned Thread 3. Thread 2 waited on a mutex from Thread 3, which wasn't unlocking.
So, I went to debug in gdb and got the following when backtracing the third thread:
Thread 3 (thread 3456.0x880):
#0 0x7c8106e9 in KERNEL32!CreateThread ()
from /cygdrive/c/WINDOWS/system32/kernel32.dll
Cannot access memory at address 0x131
It was stuck, deadlocked, somehow, in the Windows CreateThread function! Obviously it couldn't unlock the mutex when it wasn't even able to start executing code. Yet, despite the fact that it was apparently stuck here, pthread_create returned zero (success).
What makes this particularly odd is that the same application on Linux has no such issues. What in the world would cause a thread to hang during the creation process (!?) but return successfully as if it had been created properly?
Edit: in response to the request for code, here's some code (simplified):
The creation of the thread:
if ( pthread_create( &h->lookahead->thread_handle, NULL, (void *)lookahead_thread, (void *)h->thread[h->param.i_threads] ) )
{
log( LOG_ERROR, "failed to create lookahead thread\n");
return ERROR;
}
while ( !h_lookahead->b_thread_active )
usleep(100);
return SUCCESS;
Note that it waits until b_thread_active is set, so somehow b_thread_active is being set, so the thread being called has to have done something...
... here's the lookahead_thread function:
void lookahead_thread( mainstruct *h )
{
h->lookahead->b_thread_active = 1;
while( !h->lookahead->b_exit_thread && h->lookahead->b_thread_active )
{
if ( synch_frame_list_get_size( &h->lookahead->next ) > delay )
_lookahead_slicetype_decide (h);
else
usleep(100); // Arbitrary number to keep thread from spinning
}
while ( synch_frame_list_get_size( &h->lookahead->next ) )
_lookahead_slicetype_decide (h);
h->lookahead->b_thread_active = 0;
}
lookahead_slicetype_decide (h); is the thing that the thread does.
The mutex, synch_frame_list_get_size:
int synch_frame_list_get_size( synch_frame_list_t *slist )
{
int fno = 0;
pthread_mutex_lock( &slist->mutex );
while (slist->list[fno]) fno++;
pthread_mutex_unlock( &slist->mutex );
return fno;
}
The backtrace of thread 2:
Thread 2 (thread 332.0xf18):
#0 0x00478853 in pthread_mutex_lock ()
#1 0x004362e8 in synch_frame_list_get_size (slist=0x3ef3a8)
at common/frame.c:1078
#2 0x004399e0 in lookahead_thread (h=0xd33150)
at encoder/lookahead.c:288
#3 0x0047c5ed in ptw32_threadStart#4 ()
#4 0x77c3a3b0 in msvcrt!_endthreadex ()
from /cygdrive/c/WINDOWS/system32/msvcrt.dll
#5 0x7c80b713 in KERNEL32!GetModuleFileNameA ()
from /cygdrive/c/WINDOWS/system32/kernel32.dll
#6 0x00000000 in ??

I would try double checking your mutexes in thread 2 and thread 3. Pthreads are implemented for windows using the standard windows api; So there will be slight differences between the windows and linux versions. This is a bizarre problem, but then again, that happens a lot in threading.
Could you try posting a snippet of the code where the locking is done in thread 2, and in the function that thread 3 should start in?
Edit in response to code
Did you ever unlock the mutex in thread 2? Your trace shows it locking a mutex, then creating a thread to do all that work which tries to also lock on the mutex. I'm guessing after thread 2 returns SUCESS it does? Also, why are you using flags and sleeping, perhaps barriers or conditional variables for process synchronization may be more robust.
Another note, is b_thread_active flag marked as volatile? Perhaps the compiler is caching something to not allow it to break out?

Related

std::thread::join hangs even though thread exits from thread proc

I have this weird problem where thread I created does not terminate even after it exits from the thread function. I create the thread so:
typedef void(*Task)(void*);
AsyncWorker(Task proc, void* arg): thd_(NULL) {
thd_ = new std::thread(proc, arg);
}
~AsyncWorker() {
if (thd_) {
if(thd_->joinable())
thd_->join(); // does not return from here
delete thd_;
}
}
This is the task that the thread executes:
static void RunLoop(void* arg)
{
if (!arg)
return;
SomeObject* thiz = static_cast<SomeObject*>(arg);
while( !(thiz->done_) ) {
thiz->DoInLoop();
}
return;
}
I set the member SomeObject::done_ to true from the main thread and delete AsyncWorker. When I step through the debugger I can see that the thread has exited from the RunLoop function but call to join in the dtor hangs. The call stack for both the thread and the main thread shows
[External Code]
[No symbols loaded for ntdll.dll]
What could be the problem? The SomeObject::DoInLoop method does wait on a mutex but I signal the mutex before deleting AsyncWorker object so that the thread can go past that and in any case if the thread has exited from the thread proc it is clearly not holding on to any mutexes, right? What is frustrating is that the call stack does not tell me where it is stuck.
Initially, I thought it was a problem how I was using std::thread (I am using them for the first time) but the I tried the same with Windows threads and got the same problem. So I must be doing something wrong.
Edit: I initially tagged the problem as vs2012 but I am actually using vs2013 sp1.

Can a thread not block if the lock it tries to lock is not available?

In OpenMP, there is routine OMP_TEST_LOCK which, when called by a thread to attempt to set a lock, but the thread does not block if the lock is unavailable.
I wonder what the calling a thread will do, if not block, when the lock it tries to lock is not available? Thanks!
OMP_TEST_LOCK will indicate if a lock could be set via the return code.
Example:
if( omp_test_lock( &a_lock ) )
{
work_a();
omp_unset_lock( &a_lock ) )
}
else
{
work_b();
}
work_c();
If the lock can be set, work_a and then work_c will be called. If the lock cannot be set, work_b and the work_c will be called. This is just the normal flow of control.

Multithreaded synchronization primitive

I have the following scenario:
I have multiple worker threads running that all go through a certain section of code, and they're allowed to do so simultaneously. No critical section surrounds this piece of code right now as it's not required for these threads.
I have a main thread that also -occassionally- wants to enter that section of code, but when it does, none of the other worker threads should use that section of code.
Naive solution: surround the section of code with a critical section. But that would kill a lot of parallelism between the worker threads, which is important in my case.
Is there a better solution?
Use RW locks. RW locks allow multiple readers and only a single writer. Your workers would call read-lock at the start of the critical section and the main thread would write-lock.
By definition, when calling read-lock, the calling process will wait for any writing threads to finish. When calling write-lock, the calling process will wait for any reading or writing threads to finish.
Example using POSIX threads:
pthread_rwlock_t lock;
/* worker threads */
void *do_work(void *args) {
for (int i = 0; i < 100; ++i) {
pthread_rwlock_rdlock(&lock);
// do some work...
pthread_rwlock_unlock(&lock);
sleep(1);
}
pthread_exit(0);
}
/* main thread */
int main(void) {
pthread_t workers[4];
pthread_rwlock_init(&lock);
int i;
// spawn workers...
for (i = 0; i < 4; ++i) {
pthread_create(workers[i]; NULL, do_worker, NULL);
}
for (i = 0; i < 100, ++i) {
pthread_rwlock_wrlock(&lock);
// do some work...
pthread_rwlock_unlock(&lock);
sleep(1);
}
return 0;
}
As far as I understand it, your worker threads are started asynchronously. So when the main thread wants to run this code section, you have to ensure that no worker thread is executing it. Therefore you have to stop all worker threads before the main thread can enter that code section, and allow them to enter it again afterwards.
This could be done - using Grand Central Dispatch - if your worker threads would be assigned to a dispatch group, see https://developer.apple.com/library/mac/#documentation/Performance/Reference/GCD_libdispatch_Ref/Reference/reference.html.
The main thread could then send the message dispatch_group_wait to this dispatch group, wait for all worker thread to leave this code section, execute it, and then requeue the worker threads.

pthread condition variables vs win32 events (linux vs windows-ce)

I am doing a performance evaluation between Windows CE and Linux on an arm imx27 board. The code has already been written for CE and measures the time it takes to do different kernel calls like using OS primitives like mutex and semaphores, opening and closing files and networking.
During my porting of this application to Linux (pthreads) I stumbled upon a problem which I cannot explain. Almost all tests showed a performance increase from 5 to 10 times but not my version of win32 events (SetEvent and WaitForSingleObject), CE actually "won" this test.
To emulate the behaviour I was using pthreads condition variables (I know that my implementation doesn't fully emulate the CE version but it's enough for the evaluation).
The test code uses two threads that "ping-pong" each other using events.
Windows code:
Thread 1: (the thread I measure)
HANDLE hEvt1, hEvt2;
hEvt1 = CreateEvent(NULL, FALSE, FALSE, TEXT("MyLocEvt1"));
hEvt2 = CreateEvent(NULL, FALSE, FALSE, TEXT("MyLocEvt2"));
ResetEvent(hEvt1);
ResetEvent(hEvt2);
for (i = 0; i < 10000; i++)
{
SetEvent (hEvt1);
WaitForSingleObject(hEvt2, INFINITE);
}
Thread 2: (just "responding")
while (1)
{
WaitForSingleObject(hEvt1, INFINITE);
SetEvent(hEvt2);
}
Linux code:
Thread 1: (the thread I measure)
struct event_flag *event1, *event2;
event1 = eventflag_create();
event2 = eventflag_create();
for (i = 0; i < 10000; i++)
{
eventflag_set(event1);
eventflag_wait(event2);
}
Thread 2: (just "responding")
while (1)
{
eventflag_wait(event1);
eventflag_set(event2);
}
My implementation of eventflag_*:
struct event_flag* eventflag_create()
{
struct event_flag* ev;
ev = (struct event_flag*) malloc(sizeof(struct event_flag));
pthread_mutex_init(&ev->mutex, NULL);
pthread_cond_init(&ev->condition, NULL);
ev->flag = 0;
return ev;
}
void eventflag_wait(struct event_flag* ev)
{
pthread_mutex_lock(&ev->mutex);
while (!ev->flag)
pthread_cond_wait(&ev->condition, &ev->mutex);
ev->flag = 0;
pthread_mutex_unlock(&ev->mutex);
}
void eventflag_set(struct event_flag* ev)
{
pthread_mutex_lock(&ev->mutex);
ev->flag = 1;
pthread_cond_signal(&ev->condition);
pthread_mutex_unlock(&ev->mutex);
}
And the struct:
struct event_flag
{
pthread_mutex_t mutex;
pthread_cond_t condition;
unsigned int flag;
};
Questions:
Why doesn't I see the performance boost here?
What can be done to improve performance (e.g are there faster ways to implement CEs behaviour)?
I'm not used to coding pthreads, are there bugs in my implementation maybe resulting in performance loss?
Are there any alternative libraries for this?
Note that you don't need to be holding the mutex when calling pthread_cond_signal(), so you might be able to increase the performance of your condition variable 'event' implementation by releasing the mutex before signaling the condition:
void eventflag_set(struct event_flag* ev)
{
pthread_mutex_lock(&ev->mutex);
ev->flag = 1;
pthread_mutex_unlock(&ev->mutex);
pthread_cond_signal(&ev->condition);
}
This might prevent the awakened thread from immediately blocking on the mutex.
This type of implementation only works if you can afford to miss an event. I just tested it and ran into many deadlocks. The main reason for this is that the condition variables only wake up a thread that is already waiting. Signals issued before are lost.
No counter is associated with a condition that allows a waiting thread to simply continue if the condition has already been signalled. Windows Events support this type of use.
I can think of no better solution than taking a semaphore (the POSIX version is very easy to use) that is initialized to zero, using sem_post() for set() and sem_wait() for wait(). You can surely think of a way to have the semaphore count to a maximum of 1 using sem_getvalue()
That said I have no idea whether the POSIX semaphores are just a neat interface to the Linux semaphores or what the performance penalties are.

Watchdog built into the same process as the program it controls

I run a Visual C++ console test program inside the daily build. Every now and then the test would call some function that was changed by other developers improperly, descend into an infinite loop and hang thus blocking the build.
I need a watchdog solution as simple as possible. Here's what I came up with. In the test program entry point I start a separate thread that loops continuosly and checks elapsed time. If some predefined period is exceeded it calls TerminateProcess(). Pseudocode:
DWORD WatchDog( LPVOID)
{
DWORD start = GetTickCount();
while( true ) {
Sleep( ReasonablePeriod );
if( GetTickCount() - start > MaxAllowed ) {
TerminateProcess( GetCurrentProcess(), 0 );
}
}
return 0;
}
Is this solution any worse than a watchdog implemented as a separate master program?
I think it's preferable to implement the watchdog as a separate process. It's easier to re-use it, it's easier to detect if your app crashed and to get its return code.

Resources