LoadLibrary() fails with error 8 (ERROR_NOT_ENOUGH_MEMORY) - windows

Later edit: After more investigation, the Windows Updates and the OpenGL DLL were red herrings. The cause of these symptoms was a LoadLibrary() call failing with GetLastError() == ERROR_NOT_ENOUGH_MEMORY. See my answer for how to solve such issues. Below is the original question for historical interest. /edit
A map viewer I wrote in Python/wxPython for Windows with a C++ backend suddenly
stopped working, without any code changes or even recompiling. The very same
executables had been working for weeks before (same Python, same DLLs, ...).
Now, when querying Windows for a pixel format to use with OpenGL (with
ChoosePixelFormat()), I get a MessageBox saying:
LoadLibrary failed with error 8:
Not enough storage is available to process this command
The error message is displayed when executing the following code fragment:
void DevContext::SetPixelFormat() {
PIXELFORMATDESCRIPTOR pfd;
memset(&pfd, 0, sizeof(pfd));
pfd.nSize = sizeof(pfd);
pfd.nVersion = 1;
pfd.dwFlags = PFD_DRAW_TO_WINDOW | PFD_SUPPORT_OPENGL;
pfd.iPixelType = PFD_TYPE_RGBA;
pfd.cColorBits = 32;
int pf = ChoosePixelFormat(m_hdc, &pfd); // <-- ERROR OCCURS IN HERE
if (pf == 0) {
throw std::runtime_error("No suitable pixel format.");
}
if (::SetPixelFormat(m_hdc, pf, &pfd) == FALSE) {
throw std::runtime_error("Cannot set pixel format.");
}
}
It's actually an ATI GL driver DLL showing the message box. The relevant part of the call stack is this:
... More MessageBox stuff
0027e860 770cfcf1 USER32!MessageBoxTimeoutA+0x76
0027e880 770cfd36 USER32!MessageBoxExA+0x1b
*** ERROR: Symbol file not found. Defaulted to export symbols for C:\Windows\SysWOW64\atiglpxx.dll -
0027e89c 58471df1 USER32!MessageBoxA+0x18
0027e9d4 58472065 atiglpxx+0x1df1
0027e9dc 57acaf0b atiglpxx!DrvValidateVersion+0x13
0027ea00 57acb0f3 OPENGL32!wglSwapMultipleBuffers+0xc5e
0027edf0 57acb1a9 OPENGL32!wglSwapMultipleBuffers+0xe46
0027edf8 57acc6a4 OPENGL32!wglSwapMultipleBuffers+0xefc
0027ee0c 57ad5658 OPENGL32!wglGetProcAddress+0x45f
0027ee28 57ad5dd4 OPENGL32!wglGetPixelFormat+0x70
0027eec8 57ad6559 OPENGL32!wglDescribePixelFormat+0xa2
0027ef48 751c5ac7 OPENGL32!wglChoosePixelFormat+0x3e
0027ef60 57c78491 GDI32!ChoosePixelFormat+0x28
0027f0b0 57c7867a OutdoorMapper!DevContext::SetPixelFormat+0x71 [winwrap.cpp # 42]
0027f1a0 57ce3120 OutdoorMapper!OGLContext::OGLContext+0x6a [winwrap.cpp # 61]
0027f224 1e0acdf2 maplib_sip!func_CreateOGLDisplay+0xc0 [maps.sip # 96]
0027f240 1e0fac79 python33!PyCFunction_Call+0x52
... More Python stuff
I did a Windows Update two weeks ago and noticed some glitches (e.g. when
resizing the window), but my program still worked mostly OK. Just now I
rebooted, Windows installed 1 more update, and I don't get past
ChoosePixelFormat() any more. However, the last installed update was
KB2998527, a Russia timezone update?!
Things that I already checked:
Recompiling doesn't make it work.
Rebooting and running without other programs running doesn't work.
Memory consumption of my program is only 67 MB, I'm not out of memory.
Plenty of diskspace free (~50 GB).
The HDC m_hdc is obtained from the display panel's HWND and seems to be valid.
Changing my linker commandline doesn't work.
Should I update my graphics drivers or roll back the updates? Any other ideas?
System data dump: Windows 7 Ultimate SP1 x64, 4GB RAM; HP EliteBook 8470p; Python 3.3, wxPython 3.0.1.dev76673 msw (phoenix); access to C++ data structures via SIP 4.15.4; C++ code compiled with Visual Studio 2010 Express, Debug build with /MDd.

I was running out of virtual address space.
By default, LibTIFF reads TIF images by memory-mapping them (mmap() or CreateFileMapping()). This is fine for pictures of your wife, but it turns out it's a bad idea for gigabytes worth of topographic raster-maps of the Alps.
This was difficult to diagnose, because LibTIFF silently fell back to read() if the memory mapping failed, so there never was an explicit error before. Further, mapped memory is not accounted as working memory by Windows, so the Task-Manager was showing 67MB, when in fact nearly all virtual address space used up.
This blew up now because I added more TIF images to my database recently. LoadLibrary() started failing because it couldn't find any address space to put the new library. GetLastError() returned 8, which is ERROR_NOT_ENOUGH_MEMORY. That this happened within ATI's OpenGL library was just coincidence.
The solution was to pass "m" as flag to TiffOpen() to disable memory mapped IO.
Diagnosing this is easy with the Windows SysInternals tool VMMap (documentation link), which shows you how much of the virtual address space of a process is taken up by code/heap/stack/mapped files/shareable data/etc.
This should be the first thing to check if LoadLibrary() or CreateFileMapping() fails with ERROR_NOT_ENOUGH_MEMORY.

Related

Resolving Error R6016 - Not enough space for thread data

My statically-linked Visual C++ 2012 program sporadically generates a CRTL error: "R6016 - Not enough space for thread data".
The minimal documentation from Microsoft says this error message is generated when a new thread is spawned, but not enough memory can be allocated for it.
However, my code only explicitly spawns a new thread in a couple of well-defined cases, neither of which are occurring here (although certainly the Microsoft libraries internally spawn threads at well). One user reported this problem when the program was just existing in the background.
Not sure if it's relevant, but I haven't overridden the default 1MB reserved stack size or heap size, and the total memory in use by my program is usually quite small (3MB-10MB on a system with 12GB actual RAM, over half of which is unallocated).
This happens very rarely (so I can't track it down), and it's been reported on more than one machine. I've only heard about this on Windows 8.1, but I wouldn't read too much into that.
Is there some compiler setting somewhere that might influence this error? Or programming mistake?
This turned out to be caused by calling CreateThread rather than _beginthread. Microsoft documentation in the Remarks section states that CreateThread causes conflicts when using the CRT library, and indeed, once we made the change, we never saw that error again.
You have to call TlsAlloc in DllMain if the Windows version is Vista or higher .
implicit TLS handling was rewritten in Windows Vista [...]
threadprivate and __declspec(thread) should function correctly in
run-time loaded DLLs since then.
BOOL APIENTRY DllMain(HINSTANCE hinstDll, DWORD fdwReason,
LPVOID lpvReserved)
{
static BOOL fFirstProcess = TRUE;
BOOL fWin32s = FALSE;
DWORD dwVersion = GetVersion();
static DWORD dwIndex;
if ( !(dwVersion & 0x80000000) && LOBYTE(LOWORD(dwVersion))<4 )
fWin32s = TRUE;
if (dwReason == DLL_PROCESS_ATTACH) {
if (fFirstProcess || !fWin32s) {
dwIndex = TlsAlloc();
}
fFirstProcess = FALSE;
}
}
kb 118816
When a program is started the size of the TLS is determined by taking
into account the TLS size required by the executable as well as the
TLS requirements of all other implicitly loaded DLLs. When you load
another DLL dynamically with LoadLibrary or unload it with
FreeLibrary, the system has to examine all running threads and to
enlarge or compact their TLS storage accordingly.
Your DLL code should be modified to use such TLS functions as TlsAlloc, and to allocate TLS if the DLL is loaded with LoadLibrary. Or, the DLL that is using __declspec(thread) should only be implicitly loaded into the application.
Bottom line: LoadLibrary ain't thread-safe.
I discovered that process is in 32 bit.
In this case I increase memory to process with the command
bcdedit /set increaseuserva 3072

Directx Texture interface to existing memory

I'm writing a rendering app that communicates with an image processor as a sort of virtual camera, and I'm trying to figure out the fastest way to write the texture data from one process to the awaiting image buffer in the other.
Theoretically I think it should be possible with 1 DirectX copy from VRAM directly to the area of memory I want it in, but I can't figure out how to specify a region of memory for a texture to occupy, and thus must perform an additional memcpy. DX9 or DX11 solutions would be welcome.
So far, the docs here: http://msdn.microsoft.com/en-us/library/windows/desktop/bb174363(v=vs.85).aspx have held the most promise.
"In Windows Vista CreateTexture can create a texture from a system memory pointer allowing the application more flexibility over the use, allocation and deletion of the system memory"
I'm running on Windows 7 with the June 2010 Directx SDK, However, whenever I try and use the function in the way it specifies, I the function fails with an invalid arguments error code. Here is the call I tried as a test:
static char s_TextureBuffer[640*480*4]; //larger than needed
void* p = (void*)s_TextureBuffer;
HRESULT res = g_D3D9Device->CreateTexture(640,480,1,0, D3DFORMAT::D3DFMT_L8, D3DPOOL::D3DPOOL_SYSTEMMEM, &g_ReadTexture, (void**)p);
I tried with several different texture formats, but with no luck. I've begun looking into DX11 solutions, it's going slowly since I'm used to DX9. Thanks!

OpenCL-GL Interop memory not in sync

I'm having troubles with OpenCL-GL shared memory.
I have a application that's working in both linux and windows. The CL-GL sharing works in linux, but not in windows.
The windows driver says that it supports sharing, the examples from AMD work so it should work. My code for creating the context in windows is:
cl_context_properties properties[] = {
CL_CONTEXT_PLATFORM, (cl_context_properties)platform_(),
CL_WGL_HDC_KHR, (intptr_t) wglGetCurrentDC(),
CL_GL_CONTEXT_KHR, (intptr_t) wglGetCurrentContext(),
0
};
platform_.getDevices(CL_DEVICE_TYPE_GPU, &devices_);
context_ = cl::Context(devices_, properties, &CL::cl_error_callback, nullptr, &err);
err = clGetGLContextInfoKHR(properties, CL_CURRENT_DEVICE_FOR_GL_CONTEXT_KHR, sizeof(device_id), &device_id, NULL);
context_device_ = cl::Device(device_id);
queue_ = cl::CommandQueue(context_, context_device_, 0, &err);
My problem is that the CL and GL memory in a shared buffer is not the same. I print them out (by memory mapping) and I notice that they differ. Changing the data in the memory works in both CL and GL, but only changes that memory, not both (that is both buffers seems intact, but not shared).
Also, clGetGLObjectInfo on the cl-buffer returns the correct gl buffer.
Update: I have found that if I create the opencl-context on the cpu it works. This seems weird, as I'm not using integrated graphics, and I don't belive the cpu is handling opengl. I'm using SDL to create the window, could that have something to do with this?
I have now confirmed that the opengl context is running on the gpu, so the problem lies elsewhere.
Update 2: Ok, so this is weird. I tried again today, and suddenly it works. As far as I know I didn't install any new drivers before I shut down the computer yesterday, so I don't know what could have brought this about.
Update 3: Right, I noticed that changing the number of particles caused this to work. When I allocated so many particles that the shared buffer is slightly above one MB it suddenly starts to work.
I solved the problem.
OpenGL buffer object must be created "after" OpenCL context was created.
If "before", we can't share the OpenGL data.
I use RadeonHD5670 ATI Catalyst 12.10
Maybe, ATI driver's problem because Nvidia-Computing-SDK samples don't depend on the order.

CreateThread() fails on 64 bit Windows, works on 32 bit Windows. Why?

Operating System: Windows XP 64 bit, SP2.
I have an unusual problem. I am porting some code from 32 bit to 64 bit. The 32 bit code works just fine. But when I call CreateThread() for the 64 bit version the call fails. I have three places where this fails. 2 call CreateThread(). 1 calls beginthreadex() which calls CreateThread().
All three calls fail with error code 0x3E6, "Invalid access to memory location".
The problem is all the input parameters are correct.
HANDLE h;
DWORD threadID;
h = CreateThread(0, // default security
0, // default stack size
myThreadFunc, // valid function to call
myParam, // my param
0, // no flags, start thread immediately
&threadID);
All three calls to CreateThread() are made from a DLL I've injected into the target program at the start of the program execution (this is before the program has got to the start of main()/WinMain()). If I call CreateThread() from the target program (same params) via say a menu, it works. Same parameters etc. Bizarre.
If I pass NULL instead of &threadID, it still fails.
If I pass NULL as myParam, it still fails.
I'm not calling CreateThread from inside DllMain(), so that isn't the problem. I'm confused and searching on Google etc hasn't shown any relevant answers.
If anyone has seen this before or has any ideas, please let me know.
Thanks for reading.
ANSWER
Short answer: Stack Frames on x64 need to be 16 byte aligned.
Longer answer:
After much banging my head against the debugger wall and posting responses to the various suggestions (all of which helped in someway, prodding me to try new directions) I started exploring what-ifs about what was on the stack prior to calling CreateThread(). This proved to be a red-herring but it did lead to the solution.
Adding extra data to the stack changes the stack frame alignment. Sooner or later one of the tests gets you to 16 byte stack frame alignment. At that point the code worked. So I retraced my steps and started putting NULL data onto the stack rather than what I thought was the correct values (I had been pushing return addresses to fake up a call frame). It still worked - so the data isn't important, it must be the actual stack addresses.
I quickly realised it was 16 byte alignment for the stack. Previously I was only aware of 8 byte alignment for data. This microsoft document explains all the alignment requirements.
If the stackframe is not 16 byte aligned on x64 the compiler may put large (8 byte or more) data on the wrong alignment boundaries when it pushes data onto the stack.
Hence the problem I faced - the hooking code was called with a stack that was not aligned on a 16 byte boundary.
Quick summary of alignment requirements, expressed as size : alignment
1 : 1
2 : 2
4 : 4
8 : 8
10 : 16
16 : 16
Anything larger than 8 bytes is aligned on the next power of 2 boundary.
I think Microsoft's error code is a bit misleading. The initial STATUS_DATATYPE_MISALIGNMENT could be expressed as a STATUS_STACK_MISALIGNMENT which would be more helpful. But then turning STATUS_DATATYPE_MISALIGNMENT into ERROR_NOACCESS - that actually disguises and misleads as to what the problem is. Very unhelpful.
Thank you to everyone that posted suggestions. Even if I disagreed with the suggestions, they prompted me to test in a wide variety of directions (including the ones I disagreed with).
Written a more detailed description of the problem of datatype misalignment here: 64 bit porting gotcha #1! x64 Datatype misalignment.
The only reason that 64bit would make a difference is that threading on 64bit requires 64bit aligned values. If threadID isn't 64bit aligned, you could cause this problem.
Ok, that idea's not it. Are you sure it's valid to call CreateThread before main/WinMain? It would explain why it works in a menu- because that's after main/WinMain.
In addition, I'd triple-check the lifetime of myParam. CreateThread returns (this I know from experience) long before the function you pass in is called.
Post the thread routine's code (or just a few lines).
It suddenly occurs to me: Are you sure that you're injecting your 64bit code into a 64bit process? Because if you had a 64bit CreateThread call and tried to inject that into a 32bit process running under WOW64, bad things could happen.
Starting to seriously run out of ideas. Does the compiler report any warnings?
Could the bug be due to a bug in the host program, rather than the DLL? There's some other code, such as loading a DLL if you used __declspec(import/export), that occurs before main/WinMain. If that DLLMain, for example, had a bug in it.
I ran into this issue today. And I checked every argument feed into _beginthread/CreateThread/NtCreateThread via rohitab's Windows API Monitor v2. Every argument is aligned properly (AFAIK).
So, where does STATUS_DATATYPE_MISALIGNMENT come from?
The first few lines of NtCreateThread validate parameters passed from user mode.
ProbeForReadSmallStructure (ThreadContext, sizeof (CONTEXT), CONTEXT_ALIGN);
for i386
#define CONTEXT_ALIGN (sizeof(ULONG))
for amd64
#define STACK_ALIGN (16UI64)
...
#define CONTEXT_ALIGN STACK_ALIGN
On amd64, if the ThreadContext pointer is not aligned to 16 bytes, NtCreateThread will return STATUS_DATATYPE_MISALIGNMENT.
CreateThread (actually CreateRemoteThread) allocated ThreadContext from stack, and did nothing special to guarantee the alignment requirement is satisfied. Things will work smoothly if every piece of your code followed Microsoft x64 calling convention, which unfortunately not true for me.
PS: The same code may work on newer Windows (say Vista and newer). I didn't check though. I'm facing this issue on Windows Server 2003 R2 x64.
I'm in the business of using parallel threads under windows
for calculations. No funny business, no dll-calls, and certainly
no call-back's. The following works in 32 bits windows. I set up the stack for my calculation, well within the area reserved for my program.
All releveant data about area's and start addresses is contained in
a data structure that is passed to CreateThread as parameter 3.
The address that is called contains a small assembler routine
that uses this data stucture.
Indeed this routine finds the address to return to on the stack,
then the address of the data structure.
There is no reason to go far into this. It just works and it calculates
the number of primes below 2,000,000,000 just fine, in one thread,
in two threads or in 20 threads.
Now CreateThread in 64 bits doesn't push the address of the data
structure. That seems implausible so I show you the smoking gun,
a dump of a debug session.
In the subwindow at the bottom right you see the stack, and
there is merely the return address, amidst a sea of zeroes.
The mechanism I use to fill in parameters is portable between 32 and 64 bits.
No other call exhibits a difference between word-sizes.
Moreover why would the code address work but not the data address?
The bottom line: one would expect that CreateThread passes the data parameter on the stack in the same way in 64 bits as in 32 bits, then does a subroutine call. At the assembler level it doesn't work that way. If there are any hidden requirements to e.g. RSP that are automatically fullfilled in C++ that would be very nasty.
P.S. No there are no 16 byte alignment problems. That lies ages behind me.
Try using _beginthread() or _beginthreadex() instead, you shouldn't be using CreateThread directly.
See this previous question.

OpenGL suppresses exceptions in MFC dialog-based application

I have an MFC-driven dialog-based application created with MSVS2005. Here is my problem step by step. I have button on my dialog and corresponding click-handler with code like this:
int* i = 0;
*i = 3;
I'm running debug version of program and when I click on the button, Visual Studio catches focus and alerts "Access violation writing location" exception, program cannot recover from the error and all I can do is to stop debugging. And this is the right behavior.
Now I add some OpenGL initialization code in the OnInitDialog() method:
HDC DC = GetDC(GetSafeHwnd());
static PIXELFORMATDESCRIPTOR pfd =
{
sizeof(PIXELFORMATDESCRIPTOR), // size of this pfd
1, // version number
PFD_DRAW_TO_WINDOW | // support window
PFD_SUPPORT_OPENGL | // support OpenGL
PFD_DOUBLEBUFFER, // double buffered
PFD_TYPE_RGBA, // RGBA type
24, // 24-bit color depth
0, 0, 0, 0, 0, 0, // color bits ignored
0, // no alpha buffer
0, // shift bit ignored
0, // no accumulation buffer
0, 0, 0, 0, // accum bits ignored
32, // 32-bit z-buffer
0, // no stencil buffer
0, // no auxiliary buffer
PFD_MAIN_PLANE, // main layer
0, // reserved
0, 0, 0 // layer masks ignored
};
int pixelformat = ChoosePixelFormat(DC, &pfd);
SetPixelFormat(DC, pixelformat, &pfd);
HGLRC hrc = wglCreateContext(DC);
ASSERT(hrc != NULL);
wglMakeCurrent(DC, hrc);
Of course this is not exactly what I do, it is the simplified version of my code. Well now the strange things begin to happen: all initialization is fine, there are no errors in OnInitDialog(), but when I click the button... no exception is thrown. Nothing happens. At all. If I set a break-point at the *i = 3; and press F11 on it, the handler-function halts immediately and focus is returned to the application, which continue to work well. I can click button again and the same thing will happen.
It seems like someone had handled occurred exception of access violation and silently returned execution into main application message-receiving cycle.
If I comment the line wglMakeCurrent(DC, hrc);, all works fine as before, exception is thrown and Visual Studio catches it and shows window with error message and program must be terminated afterwards.
I experience this problem under Windows 7 64-bit, NVIDIA GeForce 8800 with latest drivers (of 11.01.2010) available at website installed. My colleague has Windows Vista 32-bit and has no such problem - exception is thrown and application crashes in both cases.
Well, hope good guys will help me :)
PS The problem originally where posted under this topic.
Ok, I found out some more information about this. In my case it's windows 7 that installs KiUserCallbackExceptionHandler as exception handler, before calling my WndProc and giving me execution control. This is done by ntdll!KiUserCallbackDispatcher. I suspect that this is a security measure taken by Microsoft to prevent hacking into SEH.
The solution is to wrap your wndproc (or hookproc) with a try/except frame so you can catch the exception before Windows does.
Thanks to Skywing at http://www.nynaeve.net/
We've contacted nVidia about this
issue, but they say it's not their
bug, but rather the Microsoft's. Could
you please tell how you located the
exception handler? And do you have
some additional information, e.g. some
feedbacks from Microsoft?
I used the "!exchain"-command in WinDbg to get this information.
Rather than wrapping the WndProc or hooking all WndProcs, you could use Vectored Exception Handling:
http://msdn.microsoft.com/en-us/library/ms679274.aspx
First, both behaviors are correct. Dereferencing a null pointer is "undefined behavior", not a guaranteed access violation.
First, find out whether this is related to exception throwing or only to accessing memory location zero (try a different exception).
If you configure Visual Studio to stop on first-chance access violations, does it break?
Call VirtualQuery(NULL, ...) before and after glMakeCurrent and compare. Maybe the nVidia OpenGL drivers VirtualAlloc page zero (a bad idea, but not impossible or illegal).
I found this question when I was looking at a similar problem. Our problem turned out to be silent consumption of exceptions when running a 32-bit application on 64-bit Windows.
http://connect.microsoft.com/VisualStudio/feedback/details/550944/hardware-exceptions-on-x64-machines-are-silently-caught-in-wndproc-messages
There’s a fix available from Microsoft, though deploying it is somewhat challenging if you have multiple target platforms:
http://support.microsoft.com/kb/976038
Here's an article on the subject describing the behavior:
http://blog.paulbetts.org/index.php/2010/07/20/the-case-of-the-disappearing-onload-exception-user-mode-callback-exceptions-in-x64/
This thread on stack overflow also describes the problem I was experiencing:
Exceptions silently caught by Windows, how to handle manually?

Resources