std::thread::join hangs even though thread exits from thread proc - windows

I have this weird problem where thread I created does not terminate even after it exits from the thread function. I create the thread so:
typedef void(*Task)(void*);
AsyncWorker(Task proc, void* arg): thd_(NULL) {
thd_ = new std::thread(proc, arg);
~AsyncWorker() {
if (thd_) {
thd_->join(); // does not return from here
delete thd_;
This is the task that the thread executes:
static void RunLoop(void* arg)
if (!arg)
SomeObject* thiz = static_cast<SomeObject*>(arg);
while( !(thiz->done_) ) {
I set the member SomeObject::done_ to true from the main thread and delete AsyncWorker. When I step through the debugger I can see that the thread has exited from the RunLoop function but call to join in the dtor hangs. The call stack for both the thread and the main thread shows
[External Code]
[No symbols loaded for ntdll.dll]
What could be the problem? The SomeObject::DoInLoop method does wait on a mutex but I signal the mutex before deleting AsyncWorker object so that the thread can go past that and in any case if the thread has exited from the thread proc it is clearly not holding on to any mutexes, right? What is frustrating is that the call stack does not tell me where it is stuck.
Initially, I thought it was a problem how I was using std::thread (I am using them for the first time) but the I tried the same with Windows threads and got the same problem. So I must be doing something wrong.
Edit: I initially tagged the problem as vs2012 but I am actually using vs2013 sp1.


MFC: How to use MsgWaitForMultipleObjects() from the main thread to wait for multiple threads to complete that use SendMessage()?

I have a main thread that fires off several other threads to complete various items of work based on what the user choose from the main UI. Normally I'd use WaitForMultipleObjects() with bWaitAll set to TRUE. However, in this case those other threads will log output to another window that uses a mutex to ensure the threads only output one at a time. Part of that process uses SendMessage() to send get the text size and send the text to the windows which will hang if using WaitForMultipleObjects() since it's running from the main UI thread. So I moved over to use MsgWaitForMultipleObjects with QS_SENDMESSAGE flag, only it's problem is the logic for bWaitAll which states it will only return if all objects are signaled AND an input event occurred (instead of returning when all objects are signaled OR an input event occurred). Had the logic been OR this should have worked:
while (1)
MSG msg;
while (::PeekMessage(&msg, NULL, 0, 0, PM_NOREMOVE)) {
// mfc message pump
if (!theApp.PumpMessage()) {
// program end request
// TO DO
// MFC idel processing
LONG lidlecount = 0;
while (theApp.OnIdle(lidlecount++));
// our wait
waitres = ::MsgWaitForMultipleObjects(threadcount, threadhandles, TRUE, INFINITE, QS_SENDMESSAGE);
// check if ended due to message
if (waitres!=WAIT_OBJECT_0+threadcount) {
// no, exit loop
Rather than fire off a thread that then fires off the other threads I wondered what is the correct way to handle this from the main thread? I thought about using bWaitAll FALSE then using WaitForMultipleObjects() with bWaitAll set to TRUE and the dwMilliseconds set to 0 (or 1) and checking the result to see if completed. If not, it would need to loop back to the top of the loop and then to MsgWaitForMultipleObjects() which when using bWaitAll FALSE could return right away if one of the many threads completed (say 1 thread of 10 completed, I could check as mentioned above if all completed, but when going back with bWaitAll FALSE it will just return and not wait).
So what is the proper way to handle waiting for multiple threads (that use SendMessage()) to complete in the main thread of an MFC application?
So what is the proper way to handle waiting for multiple threads to
need create some structure, with reference count and pass pointer to this structure to every thread. here also probably exist sense have some common task data. and HWND of some window in main(GUI) thread. when worked thread exit - it release reference on object. when last thread exit - delete object and post some message to window, from main thread.
so we not need store thread handles (can just close it) and wait om multiple handles. instead we got some window message when all thread finish task
example of code
struct Task
HWND _hwnd;
LONG _dwRefCount = 1;
// some common task data probably ..
Task(HWND hwnd) : _hwnd(hwnd) {}
~Task() {
PostMessageW(_hwnd, WM_USER, 0, 0);// WM_USER as demo only
void AddRef(){
void Release(){
if (!InterlockedDecrement(&_dwRefCount)) delete this;
ULONG CALLBACK WorkThread(void* pTask)
WCHAR sz[16];
swprintf_s(sz, _countof(sz), L"%x", GetCurrentThreadId());
MessageBoxW(0, L"working...", sz, MB_ICONINFORMATION|MB_OK);
return 0;
void StartTask(HWND hwnd, ULONG n)
if (Task* pTask = new Task(hwnd))
if (HANDLE hThread = CreateThread(0, 0, WorkThread, pTask, 0, 0))
} while (--n);

What is the standard mandated behavior of std::promise's destructor after calling set_value_at_thread_exit?

If you destroy an std::promise whose shared state is not yet ready, but for which someone has called set_value_at_thread_exit (and that thread has not yet exited), what is the expected result?
As best I can tell, the destructor for the promise should store a future_error exception (with code broken_promise) into the shared state. However, this does not appear to be the behavior for GNU/libstdc++, which will yield the stored value (and not throw an exception) on a call to the future's get().
I've come to my conclusion based on my reading of cppreference's descriptions for std::promise::set_value_at_thread_exit:
Stores the value into the shared state without making the state ready immediately. The state is made ready when the current thread exits, after all variables with thread-local storage duration have been destroyed.
and for std::promise::~promise
Abandons the shared state:
if the shared state is ready, releases it.
if the shared state is not ready, stores an exception object of type
std::future_error with an error condition std::future_errc::broken_promise, makes the shared state ready and releases it.
For example code:
#include <future>
void foo(std::promise<int> p)
// p is destroyed here
int main()
std::promise<int> p;
std::future<int> f = p.get_future();
std::thread t(foo, std::move(p));
(void)f.get(); // Throw future_error or return 42 ?

implementing a scheduler class in Windows

I want to implement a scheduler class, which any object can use to schedule timeouts and cancel then if necessary. When a timeout expires, this information will be sent to the timeout setter/owner at that time asynchronously.
So, for this purpose, I have 2 fundamental classes WindowsTimeout and WindowsScheduler.
class WindowsTimeout
bool mCancelled;
int mTimerID; // Windows handle to identify the actual timer set.
ITimeoutReceiver* mSetter;
int cancel()
mCancelled = true;
if ( timeKillEvent(mTimerID) == SUCCESS) // Line under question # 1
delete this; // Timeout instance is self-destroyed.
return 0; // ok. OS Timer resource given back.
return 1; // fail. OS Timer resource not given back.
WindowsTimeout(ITimeoutReceiver* setter, int timerID)
mSetter = setter;
mTimerID = timerID;
class WindowsScheduler
static void CALLBACK timerFunction(UINT uID,UINT uMsg,DWORD dwUser,DWORD dw1,DWORD dw2)
WindowsTimeout* timeout = (WindowsTimeout*) uMsg;
if (timeout->mCancelled)
delete timeout;
WindowsTimeout* schedule(ITimeoutReceiver* setter, TimeUnit t)
int timerID = timeSetEvent(...);
if (timerID == SUCCESS)
return WindowsTimeout(setter, timerID);
return 0;
My questions are:
Q.1. When a WindowsScheduler::timerFunction() call is made, this call is performed in which context ? It is simply a callback function and I think, it is performed by the OS context, right ? If it is so, does this calling pre-empt any other tasks already running ? I mean do callbacks have higher priority than any other user-task ?
Q.2. When a timeout setter wants to cancel its timeout, it calls WindowsTimeout::cancel().
However, there is always a possibility that timerFunction static call to be callbacked by OS, pre-empting the cancel operation, for example, just after mCancelled = true statement. In such a case, the timeout instance will be deleted by the callback function.
When the pre-empted cancel() function comes again, after the callback function completes execution, will try to access an attribute of the deleted instance (mTimerID), as you can see on the line : "Line under question # 1" in the code.
How can I avoid such a case ?
Please note that, this question is an improved version of the previos one of my own here:
Windows multimedia timer with callback argument
Q1 - I believe it gets called within a thread allocated by the timer API. I'm not sure, but I wouldn't be surprised if the thread ran at a very high priority. (In Windows, that doesn't necessarily mean it will completely preempt other threads, it just means it will get more cycles than other threads).
Q2 - I started to sketch out a solution for this, but then realized it was a bit harder than I thought. Personally, I would maintain a hash table that maps timerIDs to your WindowsTimeout object instances. The hash table could be a simple std::map instance that's guarded by a critical section. When the timer callback occurs, it enters the critical section and tries to obtain the WindowsTimer instance pointer, and then flags the WindowsTimer instance as having been executed, exits the critical section, and then actually executes the callback. In the event that the hash table doesn't contain the WindowsTimer instance, it means the caller has already removed it. Be very careful here.
One subtle bug in your own code above:
WindowsTimeout* schedule(ITimeoutReceiver* setter, TimeUnit t)
int timerID = timeSetEvent(...);
if (timerID == SUCCESS)
return WindowsTimeout(setter, timerID);
return 0;
In your schedule method, it's entirely possible that the callback scheduled by timeSetEvent will return BEFORE you can create an instance of WindowsTimeout.

application exits prematurely with OpenMp with the error code : Fatal User Error 1002: Not all work-sharing constructs executed by all threads

I added openMp code to some serial code in a simulator applicaton, when I run a program that uses this application the program exits unexpectedly with the output "The thread 'Win32 Thread' (0x1828) has exited with code 1 (0x1)", this happens in the parallel region where I added the OpenMp code,
here's a code sample:
#pragma omp parallel for private (curr_proc_info, current_writer, method_h) shared (exceptionOccured) schedule(dynamic, 1)
for (i = 0 ; i < method_process_num ; i++)
current_writer = 0;
// we need to add protection before we can dequeue a method from the methods queue,
#pragma omp critical(dequeueMethod)
method_h = pop_runnable_method(curr_proc_info, current_writer);
if(method_h !=0 && exceptionOccured == false){
try {
catch( const sc_report& ex ) {
::std::cout << "\n" << ex.what() << ::std::endl;
m_error = true;
exceptionOccured = true; // we cannot jump outside the loop, so instead of return we use a flag and return somewhere else
The scheduling was static before I made it dynamic, after I added dynamic with a chunk size of 1 the application proceeded a little further before it exited, can this be an indication of what is happening inside the parallel region?
As I read it, and I'm more of a Fortran programmer than C/C++, your private variable curr_proc_info is not declared (or defined ?) before it first appears in the call to pop_runnable_method. But private variables are undefined on entry to the parallel region.
I also think your sharing of exception_occurred is a little fishy since it suggests that an exception on any thread should be noticed by any thread, not just the thread in which it is noticed. Of course, that may be your intent.

Deadlocks with pthreads and CreateThread

I'm using pthreads in a Windows application. I noticed my program was deadlocking--a quick inspection showed that the following had occurred:
Thread 1 spawned Thread 2. Thread 2 spawned Thread 3. Thread 2 waited on a mutex from Thread 3, which wasn't unlocking.
So, I went to debug in gdb and got the following when backtracing the third thread:
Thread 3 (thread 3456.0x880):
#0 0x7c8106e9 in KERNEL32!CreateThread ()
from /cygdrive/c/WINDOWS/system32/kernel32.dll
Cannot access memory at address 0x131
It was stuck, deadlocked, somehow, in the Windows CreateThread function! Obviously it couldn't unlock the mutex when it wasn't even able to start executing code. Yet, despite the fact that it was apparently stuck here, pthread_create returned zero (success).
What makes this particularly odd is that the same application on Linux has no such issues. What in the world would cause a thread to hang during the creation process (!?) but return successfully as if it had been created properly?
Edit: in response to the request for code, here's some code (simplified):
The creation of the thread:
if ( pthread_create( &h->lookahead->thread_handle, NULL, (void *)lookahead_thread, (void *)h->thread[h->param.i_threads] ) )
log( LOG_ERROR, "failed to create lookahead thread\n");
return ERROR;
while ( !h_lookahead->b_thread_active )
return SUCCESS;
Note that it waits until b_thread_active is set, so somehow b_thread_active is being set, so the thread being called has to have done something...
... here's the lookahead_thread function:
void lookahead_thread( mainstruct *h )
h->lookahead->b_thread_active = 1;
while( !h->lookahead->b_exit_thread && h->lookahead->b_thread_active )
if ( synch_frame_list_get_size( &h->lookahead->next ) > delay )
_lookahead_slicetype_decide (h);
usleep(100); // Arbitrary number to keep thread from spinning
while ( synch_frame_list_get_size( &h->lookahead->next ) )
_lookahead_slicetype_decide (h);
h->lookahead->b_thread_active = 0;
lookahead_slicetype_decide (h); is the thing that the thread does.
The mutex, synch_frame_list_get_size:
int synch_frame_list_get_size( synch_frame_list_t *slist )
int fno = 0;
pthread_mutex_lock( &slist->mutex );
while (slist->list[fno]) fno++;
pthread_mutex_unlock( &slist->mutex );
return fno;
The backtrace of thread 2:
Thread 2 (thread 332.0xf18):
#0 0x00478853 in pthread_mutex_lock ()
#1 0x004362e8 in synch_frame_list_get_size (slist=0x3ef3a8)
at common/frame.c:1078
#2 0x004399e0 in lookahead_thread (h=0xd33150)
at encoder/lookahead.c:288
#3 0x0047c5ed in ptw32_threadStart#4 ()
#4 0x77c3a3b0 in msvcrt!_endthreadex ()
from /cygdrive/c/WINDOWS/system32/msvcrt.dll
#5 0x7c80b713 in KERNEL32!GetModuleFileNameA ()
from /cygdrive/c/WINDOWS/system32/kernel32.dll
#6 0x00000000 in ??
I would try double checking your mutexes in thread 2 and thread 3. Pthreads are implemented for windows using the standard windows api; So there will be slight differences between the windows and linux versions. This is a bizarre problem, but then again, that happens a lot in threading.
Could you try posting a snippet of the code where the locking is done in thread 2, and in the function that thread 3 should start in?
Edit in response to code
Did you ever unlock the mutex in thread 2? Your trace shows it locking a mutex, then creating a thread to do all that work which tries to also lock on the mutex. I'm guessing after thread 2 returns SUCESS it does? Also, why are you using flags and sleeping, perhaps barriers or conditional variables for process synchronization may be more robust.
Another note, is b_thread_active flag marked as volatile? Perhaps the compiler is caching something to not allow it to break out?
