OpenGL suppresses exceptions in MFC dialog-based application - debugging

I have an MFC-driven dialog-based application created with MSVS2005. Here is my problem step by step. I have button on my dialog and corresponding click-handler with code like this:
int* i = 0;
*i = 3;
I'm running debug version of program and when I click on the button, Visual Studio catches focus and alerts "Access violation writing location" exception, program cannot recover from the error and all I can do is to stop debugging. And this is the right behavior.
Now I add some OpenGL initialization code in the OnInitDialog() method:
HDC DC = GetDC(GetSafeHwnd());
static PIXELFORMATDESCRIPTOR pfd =
{
sizeof(PIXELFORMATDESCRIPTOR), // size of this pfd
1, // version number
PFD_DRAW_TO_WINDOW | // support window
PFD_SUPPORT_OPENGL | // support OpenGL
PFD_DOUBLEBUFFER, // double buffered
PFD_TYPE_RGBA, // RGBA type
24, // 24-bit color depth
0, 0, 0, 0, 0, 0, // color bits ignored
0, // no alpha buffer
0, // shift bit ignored
0, // no accumulation buffer
0, 0, 0, 0, // accum bits ignored
32, // 32-bit z-buffer
0, // no stencil buffer
0, // no auxiliary buffer
PFD_MAIN_PLANE, // main layer
0, // reserved
0, 0, 0 // layer masks ignored
};
int pixelformat = ChoosePixelFormat(DC, &pfd);
SetPixelFormat(DC, pixelformat, &pfd);
HGLRC hrc = wglCreateContext(DC);
ASSERT(hrc != NULL);
wglMakeCurrent(DC, hrc);
Of course this is not exactly what I do, it is the simplified version of my code. Well now the strange things begin to happen: all initialization is fine, there are no errors in OnInitDialog(), but when I click the button... no exception is thrown. Nothing happens. At all. If I set a break-point at the *i = 3; and press F11 on it, the handler-function halts immediately and focus is returned to the application, which continue to work well. I can click button again and the same thing will happen.
It seems like someone had handled occurred exception of access violation and silently returned execution into main application message-receiving cycle.
If I comment the line wglMakeCurrent(DC, hrc);, all works fine as before, exception is thrown and Visual Studio catches it and shows window with error message and program must be terminated afterwards.
I experience this problem under Windows 7 64-bit, NVIDIA GeForce 8800 with latest drivers (of 11.01.2010) available at website installed. My colleague has Windows Vista 32-bit and has no such problem - exception is thrown and application crashes in both cases.
Well, hope good guys will help me :)
PS The problem originally where posted under this topic.

Ok, I found out some more information about this. In my case it's windows 7 that installs KiUserCallbackExceptionHandler as exception handler, before calling my WndProc and giving me execution control. This is done by ntdll!KiUserCallbackDispatcher. I suspect that this is a security measure taken by Microsoft to prevent hacking into SEH.
The solution is to wrap your wndproc (or hookproc) with a try/except frame so you can catch the exception before Windows does.
Thanks to Skywing at http://www.nynaeve.net/
We've contacted nVidia about this
issue, but they say it's not their
bug, but rather the Microsoft's. Could
you please tell how you located the
exception handler? And do you have
some additional information, e.g. some
feedbacks from Microsoft?
I used the "!exchain"-command in WinDbg to get this information.

Rather than wrapping the WndProc or hooking all WndProcs, you could use Vectored Exception Handling:
http://msdn.microsoft.com/en-us/library/ms679274.aspx

First, both behaviors are correct. Dereferencing a null pointer is "undefined behavior", not a guaranteed access violation.
First, find out whether this is related to exception throwing or only to accessing memory location zero (try a different exception).
If you configure Visual Studio to stop on first-chance access violations, does it break?
Call VirtualQuery(NULL, ...) before and after glMakeCurrent and compare. Maybe the nVidia OpenGL drivers VirtualAlloc page zero (a bad idea, but not impossible or illegal).

I found this question when I was looking at a similar problem. Our problem turned out to be silent consumption of exceptions when running a 32-bit application on 64-bit Windows.
http://connect.microsoft.com/VisualStudio/feedback/details/550944/hardware-exceptions-on-x64-machines-are-silently-caught-in-wndproc-messages
There’s a fix available from Microsoft, though deploying it is somewhat challenging if you have multiple target platforms:
http://support.microsoft.com/kb/976038
Here's an article on the subject describing the behavior:
http://blog.paulbetts.org/index.php/2010/07/20/the-case-of-the-disappearing-onload-exception-user-mode-callback-exceptions-in-x64/
This thread on stack overflow also describes the problem I was experiencing:
Exceptions silently caught by Windows, how to handle manually?

Related

Why is Minidump file contents wrong for only 1 process?

Situation:
We have an MFC/C++ Visual Studio (2005) application consisting of a lot of executables and a lot of dlls, all using MFC and interconnected with DCOM. Some are running on a controller (w2012) which controls slave computers running on WES2009.
The problem:
For diagnostic purposes we are embedding minidumps in all of our processes. This mechanism works fine in all processes except for one: the GUI exe. All processes including the GUI make dmp files BUT the dmp file contents of the GUI seems to be different/wrong. When I intentionally crash our application with e.g. a null pointer dereference, all dmp files of all processes/dlls (except GUI) point to the cause (the null pointer dereference)! The dmp file of the GUI process is created and can be openend in Visual Studio but non of the threads point to the cause (the null pointer dereference). Also windbg does not find the cause! The strange thing is that when we manually use WriteStackDetails() to dump the callstack it returns the correct problematic line! So why can't MinidumpWriteDump() do the same for only this one process? What could be the discriminating factor? Anyone any idea?
What we tried:
We tried crashes in all other process and dlls and they all seem to work ok except the GUI process! Unicode / non-Unicode does not seem to matter. A seperate test application works well, also when I link our production code library which contains the UnhandledExceptionFilter() and MinidumpWriteDump(). Crashes in sub(-sub) dlls does not seem to matter. The project settings wrt exception handling appear to be all the same. Anyone any idea?
Some more info and remarks:
Our production code (controller and slaves) is running in separate virtual boxes for development purposes.
yes we understand that the minidump should ideally be created from another process (some example somewhere? wrt process querying and security?) but doing it in-process seems to work 'always ok' for now. So we accept the risk for now that it might hang in rare situations.
What I mean with the dmp file contents is different/wrong is the following:
For our non-GUI exe we get the following OK thread / callstack information:
0xC0000005: Access violation reading location 0x00000000.
Studio automatically opens the correct source and the "breakpoint" is set to the faulty line of code.
In the Call stack tab I see my own functions in my own dll which has caused the crash: my null pointer dereference.
In the Threads tab I also see my active thread and Location which points also to the faulty function which crashed.
So all is fine and usable in this situation! Super handy functionality!
For our GUI exe, which links to the same production library code wrt MinidumpWriteDump() and ExceptionHandlingFiler() code, we get the following NOK thread / callstack information:
Unhandled exception at 0x77d66e29 (ntdll.dll) in our_exe_pid_2816_tid_2820_crash.dmp: 0xC0150010: The activation context being deactivated is not active for the current thread of execution
Visual Studio 2005 does not show my faulty code as being the cause!
In the Call stack tab I don't see my own faulty function.
The Call stack tab shows that the problem is in ntdll.dll!RtlDeactivateActivationContextUnsafeFast()
The top most function call which is shown of our code is in a totally different gui helper dll, which is not related to my intentionally introduced crash!
The Threads tab also shows the same.
For both situations I use the same visual studio 2005 (running on w7) with the same settings for symbol paths!!! Also visual studio 2017 cannot analyze the 'wrong' dmp files. In between of both test above, there is no rebuild so no mismatch occurs between exe/dlls and pdbs. In one situation it works fine and in another not!?!
The stripped-down-to-essentials code we use is shown below
typedef BOOL (_stdcall *tMiniDumpWriteDump)(HANDLE hProcess, DWORD dwPid, HANDLE hFile,
MINIDUMP_TYPE DumpType, CONST PMINIDUMP_EXCEPTION_INFORMATION ExceptionParam,
CONST PMINIDUMP_USER_STREAM_INFORMATION UserStreamParam,
CONST PMINIDUMP_CALLBACK_INFORMATION CallbackParam);
TCHAR CCrashReporter::s_szLogFileNameDmp[MAX_PATH];
CRITICAL_SECTION CCrashReporter::s_csGuard;
LPTOP_LEVEL_EXCEPTION_FILTER CCrashReporter::s_previousFilter = 0;
HMODULE CCrashReporter::s_hDbgHelp = 0;
tMiniDumpWriteDump CCrashReporter::s_fpMiniDumpWriteDump = 0;
CCrashReporter::CCrashReporter()
{
LoadDBGHELP();
s_previousFilter = ::SetUnhandledExceptionFilter(UnhandledExceptionFilter);
::InitializeCriticalSection(&s_csGuard);
}
CCrashReporter::~CCrashReporter()
{
::SetUnhandledExceptionFilter(s_previousFilter);
...
if (0 != s_hDbgHelp)
{
FreeLibrary(s_hDbgHelp);
}
::DeleteCriticalSection(&s_csGuard);
}
LONG WINAPI CCrashReporter::UnhandledExceptionFilter(PEXCEPTION_POINTERS pExceptionInfo)
{
::EnterCriticalSection(&s_csGuard);
...
GenerateMinidump(pExceptionInfo, s_szLogFileNameDmp);
::LeaveCriticalSection(&s_csGuard);
return EXCEPTION_EXECUTE_HANDLER;
}
void CCrashReporter::LoadDBGHELP()
{
/* ... search for dbghelp.dll code ... */
s_hDbgHelp = ::LoadLibrary(strDBGHELP_FILENAME);
if (0 == s_hDbgHelp)
{
/* ... report error ... */
}
if (0 != s_hDbgHelp)
{
...
s_fpMiniDumpWriteDump = (tMiniDumpWriteDump)GetProcAddress(s_hDbgHelp, "MiniDumpWriteDump");
if (!s_fpMiniDumpWriteDump)
{
FreeLibrary(s_hDbgHelp);
}
else
{
/* ... log ok ... */
}
}
}
void CCrashReporter::GenerateMinidump(const PEXCEPTION_POINTERS pExceptionInfo,
LPCTSTR pszLogFileNameDmp)
{
HANDLE hReportFileDmp(::CreateFile(pszLogFileNameDmp, GENERIC_WRITE, 0, 0,
CREATE_ALWAYS, FILE_FLAG_WRITE_THROUGH, 0));
if (INVALID_HANDLE_VALUE != hReportFileDmp)
{
MINIDUMP_EXCEPTION_INFORMATION stMDEI;
stMDEI.ThreadId = ::GetCurrentThreadId();
stMDEI.ExceptionPointers = pExceptionInfo;
stMDEI.ClientPointers = TRUE;
if(!s_fpMiniDumpWriteDump(::GetCurrentProcess(), ::GetCurrentProcessId(),
hReportFileDmp, MiniDumpWithIndirectlyReferencedMemory,
&stMDEI, 0, 0))
{
/* ... report error ...*/
}
else
{
/* ... report ok ... */
}
::CloseHandle(hReportFileDmp);
}
else
{
/* ... report error ...*/
}
}

Does a Message-Only Window consume fewer resources?

I'm creating a window with CreateWindowEx for the sole purpose of receiving messages. Currently the hWndParent parameter is 0:
Result := CreateWindowEx(WS_EX_TOOLWINDOW, WindowClassName, '', WS_POPUP,
0, 0, 0, 0, 0, 0, HInstance, nil);
I've read that a message-only window can be created by changing this parameter to HWND_MESSAGE.
Are there benefits in terms of performance and consumption of resources when using this option?
It's hard to answer definitively. One would imagine that a message only window would be less heavy on resources than a hidden window. But who's to say that it's not the other way around? And perhaps the answer differs with OS version. You can only tell for sure by profiling.
However, you tend not to have large numbers of message only windows in a process. And so even if there's a difference, will it ever be significant? Not likely.
More important differences are to be found in behaviour. The big one is that message only windows don't receive broadcast messages.

LoadLibrary() fails with error 8 (ERROR_NOT_ENOUGH_MEMORY)

Later edit: After more investigation, the Windows Updates and the OpenGL DLL were red herrings. The cause of these symptoms was a LoadLibrary() call failing with GetLastError() == ERROR_NOT_ENOUGH_MEMORY. See my answer for how to solve such issues. Below is the original question for historical interest. /edit
A map viewer I wrote in Python/wxPython for Windows with a C++ backend suddenly
stopped working, without any code changes or even recompiling. The very same
executables had been working for weeks before (same Python, same DLLs, ...).
Now, when querying Windows for a pixel format to use with OpenGL (with
ChoosePixelFormat()), I get a MessageBox saying:
LoadLibrary failed with error 8:
Not enough storage is available to process this command
The error message is displayed when executing the following code fragment:
void DevContext::SetPixelFormat() {
PIXELFORMATDESCRIPTOR pfd;
memset(&pfd, 0, sizeof(pfd));
pfd.nSize = sizeof(pfd);
pfd.nVersion = 1;
pfd.dwFlags = PFD_DRAW_TO_WINDOW | PFD_SUPPORT_OPENGL;
pfd.iPixelType = PFD_TYPE_RGBA;
pfd.cColorBits = 32;
int pf = ChoosePixelFormat(m_hdc, &pfd); // <-- ERROR OCCURS IN HERE
if (pf == 0) {
throw std::runtime_error("No suitable pixel format.");
}
if (::SetPixelFormat(m_hdc, pf, &pfd) == FALSE) {
throw std::runtime_error("Cannot set pixel format.");
}
}
It's actually an ATI GL driver DLL showing the message box. The relevant part of the call stack is this:
... More MessageBox stuff
0027e860 770cfcf1 USER32!MessageBoxTimeoutA+0x76
0027e880 770cfd36 USER32!MessageBoxExA+0x1b
*** ERROR: Symbol file not found. Defaulted to export symbols for C:\Windows\SysWOW64\atiglpxx.dll -
0027e89c 58471df1 USER32!MessageBoxA+0x18
0027e9d4 58472065 atiglpxx+0x1df1
0027e9dc 57acaf0b atiglpxx!DrvValidateVersion+0x13
0027ea00 57acb0f3 OPENGL32!wglSwapMultipleBuffers+0xc5e
0027edf0 57acb1a9 OPENGL32!wglSwapMultipleBuffers+0xe46
0027edf8 57acc6a4 OPENGL32!wglSwapMultipleBuffers+0xefc
0027ee0c 57ad5658 OPENGL32!wglGetProcAddress+0x45f
0027ee28 57ad5dd4 OPENGL32!wglGetPixelFormat+0x70
0027eec8 57ad6559 OPENGL32!wglDescribePixelFormat+0xa2
0027ef48 751c5ac7 OPENGL32!wglChoosePixelFormat+0x3e
0027ef60 57c78491 GDI32!ChoosePixelFormat+0x28
0027f0b0 57c7867a OutdoorMapper!DevContext::SetPixelFormat+0x71 [winwrap.cpp # 42]
0027f1a0 57ce3120 OutdoorMapper!OGLContext::OGLContext+0x6a [winwrap.cpp # 61]
0027f224 1e0acdf2 maplib_sip!func_CreateOGLDisplay+0xc0 [maps.sip # 96]
0027f240 1e0fac79 python33!PyCFunction_Call+0x52
... More Python stuff
I did a Windows Update two weeks ago and noticed some glitches (e.g. when
resizing the window), but my program still worked mostly OK. Just now I
rebooted, Windows installed 1 more update, and I don't get past
ChoosePixelFormat() any more. However, the last installed update was
KB2998527, a Russia timezone update?!
Things that I already checked:
Recompiling doesn't make it work.
Rebooting and running without other programs running doesn't work.
Memory consumption of my program is only 67 MB, I'm not out of memory.
Plenty of diskspace free (~50 GB).
The HDC m_hdc is obtained from the display panel's HWND and seems to be valid.
Changing my linker commandline doesn't work.
Should I update my graphics drivers or roll back the updates? Any other ideas?
System data dump: Windows 7 Ultimate SP1 x64, 4GB RAM; HP EliteBook 8470p; Python 3.3, wxPython 3.0.1.dev76673 msw (phoenix); access to C++ data structures via SIP 4.15.4; C++ code compiled with Visual Studio 2010 Express, Debug build with /MDd.
I was running out of virtual address space.
By default, LibTIFF reads TIF images by memory-mapping them (mmap() or CreateFileMapping()). This is fine for pictures of your wife, but it turns out it's a bad idea for gigabytes worth of topographic raster-maps of the Alps.
This was difficult to diagnose, because LibTIFF silently fell back to read() if the memory mapping failed, so there never was an explicit error before. Further, mapped memory is not accounted as working memory by Windows, so the Task-Manager was showing 67MB, when in fact nearly all virtual address space used up.
This blew up now because I added more TIF images to my database recently. LoadLibrary() started failing because it couldn't find any address space to put the new library. GetLastError() returned 8, which is ERROR_NOT_ENOUGH_MEMORY. That this happened within ATI's OpenGL library was just coincidence.
The solution was to pass "m" as flag to TiffOpen() to disable memory mapped IO.
Diagnosing this is easy with the Windows SysInternals tool VMMap (documentation link), which shows you how much of the virtual address space of a process is taken up by code/heap/stack/mapped files/shareable data/etc.
This should be the first thing to check if LoadLibrary() or CreateFileMapping() fails with ERROR_NOT_ENOUGH_MEMORY.

SetWindowsHook Global not very Global

I'm playing around with SetWindowsHookEx, specifically I would like be able to find out about any window (on my desktop) thats been activated, via mouse or keyboard.
Reading through MSDN docs for SetWindowsHookEx it would appear that a WH_CBT type would do the job. I've created a dll and put all the code in there, which I control from a gui app (which also handles the unhook).
BUT I only appear to be getting the activation code when I activate my gui app though, any other app I activate is ignored.
In my dll I have the setup code and the CBTProc like so:
LRESULT WINAPI CBTProc(int Code, WPARAM W, LPARAM L) {
if(Code<0) CallN....
if (Code == HCBT_ACTIVATE) { // never get unless I activate my app
HWND a = reinterpret_cast<HWND>(W);
TRACE("this window was activated %d\n", a);
}
CallNext....
}
EXPORTED HHOOK WINAPI Setup(HWND MyWind) {
...
// gDllUInstance set in dllmain
return SetWindowsHookEx(WH_CBT, CBTProc, gDllUInstance, 0);
}
All pretty simple stuff, i've tried moving the setup out of the dll but I still get the same effect.
It would appear that the dll is getting loaded into other processes, I'm counting the number of DLL_PROCESS_ATTACHs I'm getting and can see its going up (not very scientific i know.
NOTE that this is 32 bit code running on 32bit OS - win2k3.
Are my expectations of the hooking mechanism wrong? should I only be getting the activation of my app or do I need a different type of hook?
EDIT: the trace function writes to a file telling me whats sending me activations
TIA.
Turns out its working ok, as Hans points out, i'm just not seeing the output from the debugger from the other processes, if I put in some extra tracing code - one trace file per attached process - I can see can see that things are working after all.
Many thanks for the replies.

Windows Client graphics written off the window to upper-left of screen

I have a Windows WinMain() window in which I write simple graphics -- merely LineTo() and FillRect(). The rectangles move around. After about an hour, the output that used o go to the main window, all of a sudden goes to the upper left corner of my screen -- as if client coordinates were being interpreted as screen coordinates. My GetDC()'s and ReleaseDC()'s seem to be balanced, and I even checked the return value from ReleaseDC(), make sure it is not 0 (per MSDN). Sometimes the output moves back to my main window. When I got to the debugger (VS 2010), my coordinates do not seem amiss--but output is going to the wrong place. I handle WM_PAINT, WM_CREATE, WM_TIMER, and a few others. I do not know how to debug this. Any help would be appreciated.
This has 'not checking return values' written all over it. Pretty crucial in raw Win32 programming, most every API function returns a boolean or a handle where FALSE or NULL indicates failure. GetLastError() provides the error code.
A cheap way to check for this without modifying code is by using the debugger to look at the EAX register value after the API call. A 0 indicates failure. In Visual Studio you can do so by using the #eax and #err pseudo variables in the Watch window, respectively the function return value and the GetLastError value.
This goes bad once Windows starts failing API calls, probably because of a resource leak. You can see it with TaskMgr.exe, Processes tab. View + Select Columns and tick Handles, USER objects and GDI objects. It is usually the latter, restoring the device context and releasing drawing objects is very easy to fumble. You don't have to wait until it fails, a steadily climbing number in one of those columns is the giveaway. It goes belly-up when the value hits 10,000
You must be calling GetDC(NULL) somewhere by mistake, which would get the DC for the entire desktop.
You could make all your GetDC calls call a wrapper function which asserts if the argument is NULL to help track this down:
#include <assert.h>
HDC GetDCAssert(HWND hWnd)
{
assert(hWnd);
return ::GetDC(hWnd);
}

Resources