MFC: troubleshooting on which thread is responsible for causing a crash - windows

I currently have my project running two separate threads (one for MFC operations, like views/formviews, application window, etc., and one for an infinite while loop in its main function). However, for certain situations, when I run my program in debug mode, I have noticed that one of the threads' exit status was a 1 (in other words, it returned a non-zero number to the operating system). While running the application in Visual Studio 2005, what would be easiest way of finding out which thread is responsible for the return value so that I can troubleshoot what's going on? Thanks in advance.
Incorporating Steve Gilham's suggestion:
After investigating which thread is responsible for the error by looking at Spy++, I have pinned down to the single line of hdlUninitDevice() call from Novint Falcon SDK being responsible for this return message. Returning a failure status from this call doesn't make an error message pop up. However, I don't know if this is a major deal that is essential to deal with on a larger context. It would be great to listen to more suggestions. Thanks.

How is the subordinate thread being terminated? My first thought is that if it is being shut down in a non-graceful fashion by the main thread exiting, that might cause the non-zero status.
At the very least, if you can decouple the time when the two threads finish, it might help tell which is returning which status.

Related

The thread '<No Name>' (0xb24) has exited with code 0 (0x0)

whenever i try to run program for example,
if i have to run "frmphonebook" so in
Application.Run(new frmphonebook());
I typed but when i run it it run another form, and it happens to each and every form and it is displaying output as
The thread 'vshost.RunParkingWindow' (0x63c) has exited with code 0 (0x0).
The thread '<No Name>' (0xb24) has exited with code 0 (0x0).
how to solve this ?
You can give your threads a name it would also help you in your debugging...
But in many apps threads are created implicitly and you have no control over the name.
So that is not an error message. Code 0 means everything went according to plan. Any non-zero code usually indicates an error.
edit: You can also disable the display of these messages, when debugging, do right click on output, and choose what do you want see.
If a thread has exited with code 0 it ran successfully. On Codeproject is a Beginners-Guide-to-Threading
This article on threading might also be helpfull. This question on so could also be of use. A list of System Error Codes
One of the things you will learn about using the Debugger is that you will see what we might call "the soft white underbelly" (an allusion to alligators' anatomy) of the system: all kinds of DLLs being loaded and unloaded, the somewhat complex arrangement of "helper" threads being started and stopped... etc.
It can be distracting to a less experienced user, to see all of these messages. However, over time, you will come to understand that the Debugger is simply being truthful and verbose. The details it is displaying for you might not really be relevant to your debugging process, but it cannot "know" that; it is only displaying factual information, and you have to sort out what is relevant and what is not.
As for Windows Forms applications, I have myself noticed that there seem to be several "helper" threads, typically with no name, or (as is frequently seen by me when debugging), they are named things like "vshost.RunParkingWindow". Typically, you have to trust that the system is creating threads on your behalf, in addition to any threads you might create yourself. As others have suggested, give your own threads meaningful names so you can tell them apart from the system's threads.
You can get further insight into the multithreaded structure of your Windows Forms app by putting a breakpoint somewhere in your UI update code, and when it hits, use Debug/Windows/Threads to bring up a view of all the threads running in your process space. You'll be quite surprised, I think, by how many there are! Try creating and .Show()-ing several forms in your app, one by one. I think you'll see that each .Show() operation creates a new window, and with it, several supporting threads for that window.
You may also see messages in the debug window such as the following: "A first chance exception of type 'System.ObjectDisposedException' occurred in System.Windows.Forms.dll". Many times there are system exception handlers that perform a reasonable default action on your behalf. This message appearing without a break in the debugger indicates that some default handler took care of this exception for you.
The system support for something like a Windows forms application is somewhat complicated, to make YOUR implementation easier and simpler. When you run the debugger, you get to see some of these details. Over time, you will learn what is "usual" and what is indicative of a problem.
Check to see if there are some files in your web app that have been rendered inaccessible. For my case my chart control created a text file which was read only and it threw an exception. Deleted the file and the folders and voila
i found your solution i think....i the visual studio go to project >properties >linker >system look for the Subsystem line and click the down arrow and change to Console(....words....).
it worked for me !! ENJOY"

File operation functions return, but are not actually committed when Windows shuts down

I am working on an MFC application that can (among other things) be used to shut Windows down. When doing this, Windows of course sends the WM_QUERYENDSESSION and WM_ENDSESSION to all applications, mine included. However, the problem is that my application, as part of some destructors, delete certain files (with CFile::Remove) that have been used during the execution. I have reason to believe that the destructors are called (but that is hard to know for certain) when the application is closed by Windows.
However, when Windows starts back up again, I do occasionally notice that the files that were supposed to be deleted are still present. This does not happen consistently, even when the execution of the program is identical (I have a script for testing this). This leads me to think that one of two things are happening: Either a) the destructors are not consistently being called, or b) the Remove function returns, but the file is not actually deleted before Windows is shut down.
The only work-around I have found so far is that if I get the system to wait with the shutdown for approximately 10 seconds after my program has stopped, then the files will be properly deleted. This leads me to believe that b) may be the case.
I hope someone is able to help me with this problem.
Regards
Mort
Once your program returns from WM_ENDSESSION, Windows can terminate it at any time:
If the session is being ended, this parameter is TRUE; the session can end any time after all applications have returned from processing this message.
If the session ends quickly, then it may end before your destructors run. You must do all your cleanup before returning from WM_ENDSESSION, because there is no guarantee that you will get a chance to do it afterwards.
The problem here is that some versions of Windows report back that file handling operations have been completed before they actually have. This isn't a problem unless shutdown is triggered as some operations, including file delete will be abandoned.
I would suggest that you cope with this by forcing your code to wait for a confirmed deletion of the files (have a process look for the files and raise an event when they've gone) before calling for system shutdown.
If the system is properly shut down (nut went sudden power loss or etc.) then all the cached data is flushed. In particular this includes flushing the global file descriptor table (or whatever it's called in your file system) which should commit the file deletion.
So the problem seems to be that the user-mode code doesn't call DeleteFile, or it failes (for whatever reason).
Note that there are several ways the application (process) may exit, whereas not always d'tors are called. There are automatic objects which are destroyed in the context of their callstack, plus there are global/static objects, which are initialized and destroyed by the CRT init/cleanup code.
Below is a short summary of ways to terminate the process, with the consequences:
All process threads exit conventionally (return from their procedure). The OS terminates the process that has no threads. All the d'tors are executed.
Some threads either exit via ExitThread or killed by TerminateThread. The automatic objects of those threads are not d'tructed.
Process exited by ExitProcess. Automatic objects are not destructed, global may be destructed (this happens in the CRT is used in a DLL)
Process is terminated by TerminateProcess. All d'tors are not called.
I suggest you check if the DeleteFile (or CFile::Remove that wraos it) is called indeed, and check also if it succeeds. For instance you may open the same file twice for whatever reason

Windows SuspendThread doesn't? (GetThreadContext fails)

We have an Windows32 application in which one thread can stop another to inspect its
state [PC, etc.], by doing SuspendThread/GetThreadContext/ResumeThread.
if (SuspendThread((HANDLE)hComputeThread[threadId])<0) // freeze thread
ThreadOperationFault("SuspendThread","InterruptGranule");
CONTEXT Context, *pContext;
Context.ContextFlags = (CONTEXT_INTEGER | CONTEXT_CONTROL);
if (!GetThreadContext((HANDLE)hComputeThread[threadId],&Context))
ThreadOperationFault("GetThreadContext","InterruptGranule");
Extremely rarely, on a multicore system, GetThreadContext returns error code 5 (Windows system error code "Access Denied").
The SuspendThread documentation seems to clearly indicate that the targeted thread is suspended, if no error is returned. We are checking the return status of SuspendThread and ResumeThread; they aren't complaining, ever.
How can it be the case that I can suspend a thread, but can't access its context?
This blog
http://www.dcl.hpi.uni-potsdam.de/research/WRK/2009/01/what-does-suspendthread-really-do/
suggests that SuspendThread, when it returns, may have started the
suspension of the other thread, but that thread hasn't yet suspended. In this case, I can kind of see how GetThreadContext would be problematic, but this seems like a stupid way to define SuspendThread. (How would the call of SuspendThread know when the target thread was actually suspended?)
EDIT: I lied. I said this was for Windows.
Well, the strange truth is that I don't see this behavior under Windows XP 64 (at least not in the last week and I don't really know what happened before that)... but we have been testing this Windows application under Wine on Ubuntu 10.x. The Wine source for the guts of GetThreadContext contains
an Access Denied return response on line 819 when an attempt to grab the thread state fails for some reason. I'm guessing, but it appears that Wine GetThreadStatus believes that a thread just might not be accessible repeatedly. Why that would be true after a SuspendThead is beyond me, but there's the code. Thoughts?
EDIT2: I lied again. I said we only saw the behavior on Wine. Nope... we have now found a Vista Ultimate system that seems to produce the same error (again, rarely). So, it appears that Wine and Windows agree on an obscure case. It also appears that the mere enabling of the Sysinternals Process monitor program aggravates the situation and causes the problem to appear on Windows XP 64; I suspect a Heisenbug. (The Process Monitor
doesn't even exist on the Wine-tasting (:-) machine or the XP 64 system I use for development).
What on earth is it?
EDIT3: Sept 15 2010. I've added careful checking to the error return status, without otherwise disturbing the code, for SuspendThread, ResumeThread, and GetContext. I haven't seen any hint of this behavior on Windows systems since I did that. Haven't gotten back to the Wine experiment.
Nov 2010: Strange. It seems that if I compile this under VisualStudio 2005, it fails on Windows Vista and 7, but not earlier OSes. If I compile under VisualStudio 2010, it doesn't fail anywhere. One might point a finger at VisualStudio2005, but I'm suspicious of a location-sensitivve problem, and different optimizers in VS 2005 and VS 2010 place the code a slightly different places.
Nov 2012: Saga continues. We see this failure on a number of XP and Windows 7 machines, at a pretty low rate (once every several thousand runs). Our Suspend activities are applied to threads that mostly execute pure computational code but that sometimes make calls into Windows. I don't recall seeing this issue when the PC of the thread was in our computational code. Of course, I can't see the PC of the thread when it hangs because GetContext won't give it to me, so I can't directly confirm that the problem only happens when executing system calls. But, all our system calls are channeled through one point, and so far the evidence is that point was executed when we get the hang. So the indirect evidence suggests GetContext on a thread only fails if a system call is being executed by that thread. I haven't had the energy to build a critical experiment to test this hypothesis yet.
Let me quote from Richter/Nassare's "Windows via C++ 5Ed" which may shed some light:
DWORD SuspendThread(HANDLE hThread);
Any thread can call this function to
suspend another thread (as long as you
have the thread's handle). It goes
without saying (but I'll say it
anyway) that a thread can suspend
itself but cannot resume itself. Like
ResumeThread, SuspendThread returns
the thread's previous suspend count. A
thread can be suspended as many as
MAXIMUM_SUSPEND_COUNT times (defined
as 127 in WinNT.h). Note that
SuspendThread is asynchronous with
respect to kernel-mode execution, but
user-mode execution does not occur
until the thread is resumed.
In real life, an application must be
careful when it calls SuspendThread
because you have no idea what the
thread might be doing when you attempt
to suspend it. If the thread is
attempting to allocate memory from a
heap, for example, the thread will
have a lock on the heap. As other
threads attempt to access the heap,
their execution will be halted until
the first thread is resumed.
SuspendThread is safe only if you know
exactly what the target thread is (or
might be doing) and you take extreme
measures to avoid problems or
deadlocks caused by suspending the
thread.
...
Windows actually lets you look inside
a thread's kernel object and grab its
current set of CPU registers. To do
this, you simply call
GetThreadContext:
BOOL GetThreadContext( HANDLE
hThread, PCONTEXT pContext);
To call this function, just allocate a
CONTEXT structure, initialize some
flags (the structure's ContextFlags
member) indicating which registers you
want to get back, and pass the address
of the structure to GetThreadContext.
The function then fills in the members
you've requested.
You should call SuspendThread before
calling GetThreadContext; otherwise,
the thread might be scheduled and the
thread's context might be different
from what you get back. A thread
actually has two contexts: user mode
and kernel mode. GetThreadContext can
return only the user-mode context of a
thread. If you call SuspendThread to
stop a thread but that thread is
currently executing in kernel mode,
its user-mode context is stable even
though SuspendThread hasn't actually
suspended the thread yet. But the
thread cannot execute any more
user-mode code until it is resumed, so
you can safely consider the thread
suspended and GetThreadContext will
work.
My guess is that GetThreadContext may fail if you just called SuspendThread, while the thread is in kernel mode, and the kernel is locking the thread context block at this time.
Maybe on multicore systems, one core is handling the kernel-mode execution of the thread that it's user mode was just suspended, keep locking the CONTEXT structure of the thread, exactly when the other core is calling GetThreadContext.
Since this behaviour is not documented, I suggest contacting microsoft.
There are some particular problems surrounding suspending a thread that owns a CriticalSection. I can't find a good reference to it now, but there is one mention of it on Raymond Chen's blog and another mention on Chris Brumme's blog. Basically, if you are unlucky enough to call SuspendThread while the thread is accessing an OS lock (e.g., heap lock, DllMain lock, etc.), then really strange things can happen. I would assume that this is the case that you are running into extremely rarely.
Does retrying the call to GetThreadContext work after a processor yield like Sleep(0)?
Old issue but good to see you still kept it updated with status changes after experiencing the issue for another more than 2 years.
The cause of your problem is that there is a bug in the translation layer of the x64 version of WoW64, as per:
http://social.msdn.microsoft.com/Forums/en/windowscompatibility/thread/1558e9ca-8180-4633-a349-534e8d51cf3a
There is a rather critical bug in GetThreadContext under WoW64 which makes it return stale contents which makes it unusable in many situations. The contents is stored in user-mode This is why you think the value is not-null but in the stale contents it is still null.
This is why it fails on newer OS but not older ones, try running it on Windows 7 32bit OS.
As for why this bug seems to happen less often with solutions built on Visual Studio 2010 / 2012 it is likely that there is something the compiler is doing which is mitigating most of the problem, for this you should inspect the IL generated from both 2005 and 2010 and see what the differences are. For example does the problem happen if the project is built without optimizations perhaps?
Finally, some further reading:
http://www.nynaeve.net/?p=129
Maybe a thread safety issue. Are you sure that the hComputeThread struct isn't changing out from under you? Maybe the thread was exiting when you called suspend? This may cause suspend to succeed, but by the time you call get context it is gone and the handle is invalid.
Calling SuspendThread on a thread that owns a synchronization object, such as a mutex or critical section, can lead to a deadlock if the calling thread tries to obtain a synchronization object owned by a suspended thread.
- MSDN

Disabling Windows error reporting (Dr. Watson) for my process

I have an application that is hosting some unstable third-party code which I can't control in an external process to protect my main application from nasty errors it exhibits. My parent process is monitoring the other process and doing "the right thing (tm)" when it fails.
The problem that I have is that Dr. Watson is still detecting crashes in the isolated process and attaching to the processes on the way down to take a crash dump. This has the two problems of:
1. Dramatically slowing down the time that it takes for me to detect a failure because the process stays alive while the crash dump is being taken.
2. Showing annoying popups to the user asking if they want to submit the error reports to Microsoft.
Clearly I would prefer to fix the bugs in the child process, but given that it isn't an option, I would like to be able to selectively disable Dr. Watson (and Windows Error Reporting in Vista+) for that process.
I am running some of my own code in the process before handing off to the untrusted bit, so if there is an API that I can call that affects the current process that would be fine.
I am aware of: http://support.microsoft.com/default.aspx/kb/188296 which would disable Dr. Watson for the entire machine. I don't want to do that because it would make me a bad citizen to trash a machine-wide setting.
I am also aware of the WerSetFlags option in Vista+ that would seem to disable windows error reporting for the current process, but I need something that will disable Dr.Watson on earlier OS versions.
The good doctor is invoked when a process does not handle a certain exception. Therefore, the common way to go would be to handle all exceptions yourself. In your case, it is much harder since you don't own the crashing process code. What you can do then, is to inject your code into the other process at runtime, and install an exception handler that will swallow the exception causing the crash. When caught, gracefully shut down the process.
There are quite a few questions here talking about injecting code into another process. As for the crash handler, you can either set an unhandled exception filter, or add a vectored exception handler. Note that for the latter, you'll have to be careful not to swallow legit exceptions that are in fact handled inside the other process, namely find a way to recognize the crashing exception and make sure it is the only one you handle.
You want to disable the GPF popup: http://blogs.msdn.com/oldnewthing/archive/2004/07/27/198410.aspx

How to reload a crashed process on Windows

How to reload a crashed process on Windows? Of course, I can run a custom monitoring Win service process. But, for example, Firefox: it doesn't seem to install such a thing, but still it can restart itself when it crashes.
On Vista and above, you can use the RegisterApplicationRestart API to automatically restart when it crashes or hangs.
Before Vista, you need to have a top level exception filter which will do the restart, but be aware that running code inside of a compromised process isn't entirely secure or reliable.
Firefox constantly saves its state to the hard disk, every time you open a tab or click a link, or perform some other action. It also saves a flag saying it shut down safely.
On startup, it reads this all back, and is able to "restore" based on that info.
Structured exception handling (SEH) allows you to catch program crashes and to do something when it happens.
See: __try and __except
SEH can be very dangerous though and could lead to your program hanging instead. Please see this article for more information.
If you write your program as an NT service then you can set the first, second and subsequent failure actions to "Restart the service".
For Windows 2008 server and Windows Vista and Windows 7 you can use the Win32 API RegisterApplicationRestart
Please see my answer here for more information about dealing with different types of program crashes.
If I recall correctly Windows implements at least some subset of POSIX and so "must" have the signal interface (things like SIGKILL, SIGSEGV, SIGQUIT etc.).
I've never done this but on linux, but you could try setting the unexpected termination trap with signal() (signal.h).
From quick scan of docs it seems that very few things can be done while handling signal, it may be possible that even starting a new process is on forbidden list.
Now that I've thought about it, I'd probably go with master/worker pattern, very simple parent thread that does nothing but spawns the worker (that does all the UI / other things). If it does not set a specific "I'm gonna die now" bit but still dies (parent process always gets message / notification that spawned process died) then master respawns the worker. The main theme is keep master very simple and hard to die due to own bugs.

Resources