Troubleshooting ERROR_NOT_ENOUGH_MEMORY - debugging

Our application is failing on one specific user's computer with ERROR_NOT_ENOUGH_MEMORY ("Not enough storage is available to process this command").
The error is apparently being raised somewhere deep within the Delphi VCL framework that we're using, so I'm not sure which Windows API function is responsible.
Is memory a problem? A call to GlobalMemoryStatus gives the following information:
dwTotalPhys - 1063150000 (~1 GB)
dwAvailPhys - 26735000 (~27 MB)
dwAvailPage - 1489000000 (~1.4 GB)
It seems strange to me that Windows would let the available physical memory get so low when so much space is available in the paging file, but I don't know enough about Windows' virtual memory management to know if this is normal or not. Is it?
If not memory, then which resource limit is being hit? From what I read online, ERROR_NOT_ENOUGH_MEMORY could be the result of the application hitting any of several limits (GDI objects, USER objects, handles, etc.) and not necessarily memory. Is there a comprehensive list of what limits Windows enforces? Is there any way to find out which limit is being hit? I tried Google, but I couldn't find any systematic overview.

A more common cause this error than any of those you've listed is fragmentation of Virtual Memory Space. This a situation where whilst the total free memory is quite reasonable the free space is fragmented with various bits of the virtual memory space being currently allocated. Hence you can get an out of memory error when a request for memory cannot be satisfied by a single contiguous block despite the being enough in total free.

Check all the possibilities.
GDI-problems can be monitored using the free GDIView utility. Its a single file that users can start without a installer.
Also, install the ProcessExplorer on the machine concerned.
If you have no access to the machine, ask the user to make screenshots of the status monitored by the applications. Very likeley, this will give you some hint.

The culprit in this case was CreateCompatibleBitmap. Apparently Windows may enforce fairly strict systemwide limits on the memory available for device-dependent bitmaps (see, e.g, this mailing list discussion), even if your system otherwise has plenty of memory and plenty of GDI resources. (These systemwide limits are apparently because Windows may allocate device-dependent bitmaps in the video card's memory.)
The solution is simply to use device-independent bitmaps (DIBs) instead (although these may not offer quite as good of a performance). This KB article describes how to pick the optimal DIB format for a device.
Other candidates for resource limits (from others' answers and my own research):
GDI resources (from this answer) - easily checked with GDIView
Virtual memory fragmentation (from this answer)
Desktop heap - see here or here

My answer may be a little bit late but, from my late experience with that same issue, doing all the tests, going step by step, creating DC, releasing it, using DIBSection instead of CompatibleBitmap, using leak GDI/Memory tools, etc.
In the end (LOL) I found that:
I was switching the priority of these two calls, then the whole problem was fixed.
DeleteDC(hdc); //do it first (always before deleting objects)
DeleteObject(obj);

Related

macOS equivalent of reserving memory without charging against commit limit

I often want large, contiguous regions of virtual address space that can grow on demand. On Windows, I do this my calling VirtualAlloc with MEM_RESERVE and a dwSize argument that I think is larger than the region is reasonably likely to ever need to grow, then committing a page at a time as needed. Not only does this defer mapping pages to physical memory until they're accessed, which committing the whole region to begin with would also do, it also defers charging the pages against the system commit limit. That way, the program doesn't limit how much memory other programs are allowed to commit for the sake of memory it isn't yet using and may never use. Basically, I want to handle my program's memory management such that it is in principle allowed to consume a large amount of memory if the user's demands require it, while at the same time allowing other programs to have that memory instead if they need it first.
Someone has already asked whether macOS has an equivalent to VirtualAlloc with MEM_RESERVE. The answers suggest that mmap with MAP_ANON | MAP_PRIVATE is roughly equivalent. What I'm wondering is, does macOS have an equivalent to the Windows commit limit, and does mmap, called with the right flags, behave like my VirtualAlloc usage in not charging against that limit?
Edit: someone else asked a similar question about Linux, for which mmap was also suggested. The answer suggests that on that platform, initially mapping the region with PROT_NONE and adding the desired privileges when the pages are needed is necessary to prevent the unused pages from counting against the commit limit. Lacking better documentation about why mapping memory in excess of the available physical memory is allowed in macOS (complete lack of commit limit? some kind of overcommit feature like Linux has?), I figure I might as well apply the same PROT_NONE trick just in case. At the very least it means I will be able to reuse the same code for Linux should I choose to support that platform as well.
I don't believe that macOS has a system commit limit. I don't find anything like that in the Mach APIs, which is the low-level VM API where I'd expect it to be. Likewise, I don't see anything like that in the output of sysctl -a, which reports many other VM details and statistics.
I can tell you for sure that I've reserved the entirety of the unused space in a 64-bit process (which has a 46-bit user address space). That is, just under 128TiB.
You can replicate that by just repeating mmap() calls with a large size until they fail, halve the size, and repeat, until your size is down to 1 page. Then, with the process paused at that point, apply vmmap -w -interleaved <pid> to it to see its allocations.

How to see exactly how much memory each add-in is using?

Is there a way for me to see exactly how much memory each Outlook add-in is using? I have a few customers on 32-bit Office who are all having issues with screen flashing and crashing and I suspect that we as a company have deployed too many add-ins, and even with Large Address Awareness (LAA), they're running out of memory which is causing Outlook to freak out.
I didn't see a way to do this in Outlook so i created a .dmp file and I've opened it via windbg, but I'm new to this application and have no clue how to see specific memory usage by specific add-ins (the .dmp file is only of outlook.exe)
The following assumes plugins created in .NET.
The allocation of memory with a new statement goes to the .NET memory manager. In order to find out which plugin allocated the memory, that information would need to be stored in the .NET heap as well.
A UST (User Mode Stack Trace) database like available for the Windows Heap Manager is not available in .NET. Also, the .NET memory manager works directly above VirtualAlloc(), so it does not use the Windows Heap Manager. Basically, the reason is garbage collection.
Is there a way for me to see exactly how much memory each Outlook add-in is using?
No, since this information is not stored in crash dumps and there's no setting to enable it.
What you need is a memory profiler which is specific for .NET.
If you work with .NET and Visual Studio already, perhaps you're using JetBrains Resharper. The Ultimate Edition comes with a tool called dotMemory, so you might already have a license and you just need to install it via the control panel ("modify" Resharper installation).
It has (and other tools probably have as well) a feature to group memory allocations by assembly:
The screenshot shows memory allocated by an application called "MemoryPerformance". It retains 202 MB in objects, and those objects are mostly objects of the .NET framework (mscorlib).
The following assumes plugins created in C++ or other "native" languages, at least not .NET.
The allocation of memory with a new statement goes to HeapAlloc(). In order to find out who allocated the memory, that information would need to be stored in the heap as well.
However, you cannot provide that information in the new statement, and even if it were possible, you would need to rewrite all the new statements in your code.
Another way would be that HeapAlloc() has a look at the call stack at the time someone wants memory. In normal operation, that's too much cost (time-wise) and too much overhead (memory-wise). However, it is possible to enable the so called User Mode Stack Trace Database, sometimes abbreviated as UST database. You can do that with the tool GFlags, which ships with WinDbg.
The tool to capture memory snapshots is UMDH, also available with WinDbg. It will store the results as plain text files. It should be possible to extract statistical data from those USTs, however, I'm not aware of a tool that would do that, which means you would need to write one yourself.
The third approach is using the concept of "heap tagging". However, it's quite complex and also needs modifications in your code. I never implemented it, but you can look at the question How to benefit from Heap tagging by DLL?
Let's say the UST approch looks most feasible. How large should the UST database be?
Until now, 50 MB was sufficient for me to identify and fix memory leaks. However, for that use case it's not important to get information about all memory. It just needs enough samples to support a hypothesis. Those 50 MB are IMHO allocated in your application's memory, so it may affect the application.
The UST database only stores the addresses, not the call stack text. So in a 32 bit application, each frame on the call stack only needs 32 bit of storage.
In your case, 50 MB will not be sufficient. Considering an average depth of 10 frames and an average allocation size of 256 bytes (4 bytes for an int, but also larger things like strings), you get
4 GB / 256 bytes = 16M allocations
16M allocations * 10 frames * 4 byte/frame = 640 MB UST
If the given assumptions are realistic (I can't guarantee that), you would need a 640 MB UST database size. This will influence your application much, since it reduces the memory from 4 GB to 3.3 GB, thus the OOM comes earlier.
The UST information should also be available in the DMP file, if it was configured at the time the crash dump was created. Certainly not in your DMP file, otherwise you would have told us. However, it's not available in a way that's good for statistics. Using the UMDH text files IMHO is a better approach.
Is there a way for me to see exactly how much memory each Outlook add-in is using?
Not with the DMP file you have at the moment. It will still be hard with the tools available with WinDbg.
There are a few other options left:
Disable all plugins and measure memory of Outlook itself. Then, enable one plugin at a time and measure the memory with that plugin enables. Calculate the difference to find out what additional memory that plugin needs.
Does it crash immediately at startup? Or later, say after 10 minutes of usage? Could it be a memory leak? Identifying a memory leak could be easier: just enable one plugin at a time and monitor memory usage over time. Use a memory profiler, not WinDbg. It will be much easier to use and it can draw the appropriate graphics you need.
Note that you need to define a clear process to measure memory. Some memory will only be allocated when you do something specific ("lazy initialization"). Perhaps you want to measure that memory, too.

When to use cudaHostRegister() and cudaHostAlloc()? What is the meaning of "Pinned or page-locked" memory? Which are the equivalent in OpenCL?

I am just new with this APIs of the Nvidia and some expressions are not so clear for me. I was wondering if somebody can help me to understand when and how to use these CUDA commands in a simply way. To be more precise:
Studing how is possible to speed up some applications with parallel execution of a kernel (with CUDA for example), at some point I was facing the problem of speeding up the interaction Host-Device.
I have some informations, taken surfing on the web, but I am little bit confused.
It clear that you can go faster when it is possible to use cudaHostRegister() and/or cudaHostAlloc(). Here it is explained that
"you can use the cudaHostRegister() command to take some data (already allocated) and pin it avoiding extra copy to take into the GPU".
What is the meaning of "pin the memory"? Why is it so fast? How can I do this previously in this field? After, in the same video in the link, they continue explaining that
"if you are transferring PINNED memory, you can use the asynchronous memory transfer, cudaMemcpyAsync(), which let's the CPU keep working during the memory transfer".
Are the PCIe transaction managed entirely from the CPU? Is there a manager of a bus that takes care of this?
Also partial answers are really appreciated to re-compose the puzzle at the end.
It is also appreciate to have some link about the equivalent APIs in OpenCL.
What is the meaning of "pin the memory"?
It means make the memory page locked. That is telling the operating system virtual memory manager that the memory pages must stay in physical ram so that they can be directly accessed by the GPU across the PCI-express bus.
Why is it so fast? 
In one word, DMA. When the memory is page locked, the GPU DMA engine can directly run the transfer without requiring the host CPU, which reduces overall latency and decreases net transfer times.
Are the PCIe transaction managed entirely from the CPU?
No. See above.
Is there a manager of a bus that takes care of this?
No. The GPU manages the transfers. In this context there is no such thing as a bus master
EDIT: Seems like CUDA treats pinned and page-locked as the same as per the Pinned Host Memory section in this blog written by Mark Harris. This means by answer is moot and the best answer should be taken as is.
I bumped into this question while looking for something else. For all future users, I think #talonmies answers the question perfectly, but I'd like to bring to notice a slight difference between locking and pinning pages - the former ensures that the memory is not pageable but the kernel is free to move it around and the latter ensures that it stays in memory (i.e. non-pageable) but also is mapped to the same address.
Here's a reference to the same.

Memory mapped files causes low physical memory

I have a 2GB RAM and running a memory intensive application and going to low available physical memory state and system is not responding to user actions, like opening any application or menu invocation etc.
How do I trigger or tell the system to swap the memory to pagefile and free physical memory?
I'm using Windows XP.
If I run the same application on 4GB RAM machine it is not the case, system response is good. After getting choked of available physical memory system automatically swaps to pagefile and free physical memory, not that bad as 2GB system.
To overcome this problem (on 2GB machine) attempted to use memory mapped files for large dataset which are allocated by application. In this case virtual memory of the application(process) is fine but system cache is high and same problem as above that physical memory is less.
Even though memory mapped file is not mapped to process virtual memory system cache is high. why???!!! :(
Any help is appreciated.
Thanks.
If your data access pattern for using the memory mapped file is sequential, you might get slightly better page recycling by specifying the FILE_FLAG_SEQUENTIAL_SCAN flag when opening the underlying file. If your data pattern accesses the mapped file in random order, this won't help.
You should consider decreasing the size of your map view. That's where all the memory is actually consumed and cached. Since it appears that you need to handle files that are larger than available contiguous free physical memory, you can probably do a better job of memory management than the virtual memory page swapper since you know more about how you're using the memory than the virtual memory manager does. If at all possible, try to adjust your design so that you can operate on portions of the large file using a smaller view.
Even if you can't get rid of the need for full random access across the entire range of the underlying file, it might still be beneficial to tear down and recreate the view as needed to move the view to the section of the file that the next operation needs to access. If your data access patterns tend to cluster around parts of the file before moving on, then you won't need to move the view as often. You'll take a hit to tear down and recreate the view object, but since tearing down the view also releases all the cached pages associated with the view, it seems likely you'd see a net gain in performance because the smaller view significantly reduces memory pressure and page swapping system wide. Try setting the size of the view based on a portion of the installed system RAM and move the view around as needed by your file processing. The larger the view, the less you'll need to move it around, but the more RAM it will consume potentially impacting system responsiveness.
As I think you are hinting in your post, the slow response time is probably at least partially due to delays in the system while the OS writes the contents of memory to the pagefile to make room for other processes in physical memory.
The obvious solution (and possibly not practical) is to use less memory in your application. I'll assume that is not an option or at least not a simple option. The alternative is to try to proactively flush data to disk to continually keep available physical memory for other applications to run. You can find the total memory on the machine with GlobalMemoryStatusEx. And GetProcessMemoryInfo will return current information about your own application's memory usage. Since you say you are using a memory mapped file, you may need to account for that in addition. For example, I believe the PageFileUsage information returned from that API will not include information about your own memory mapped file.
If your application is monitoring the usage, you may be able to use FlushViewOfFile to proactively force data to disk from memory. There is also an API (EmptyWorkingSet) that I think attempts to write as many dirty pages to disk as possible, but that seems like it would very likely hurt performance of your own application significantly. Although, it could be useful in a situation where you know your application is going into some kind of idle state.
And, finally, one other API that might be useful is SetProcessWorkingSetSizeEx. You might consider using this API to give a hint on an upper limit for your application's working set size. This might help preserve more memory for other applications.
Edit: This is another obvious statement, but I forgot to mention it earlier. It also may not be practical for you, but it sounds like one of the best things you might do considering that you are running into 32-bit limitations is to build your application as 64-bit and run it on a 64-bit OS (and throw a little bit more memory at the machine).
Well, it sounds like your program needs more than 2GB of working set.
Modern operating systems are designed to use most of the RAM for something at all times, only keeping a fairly small amount free so that it can be immediately handed out to processes that need more. The rest is used to hold memory pages and cached disk blocks that have been used recently; whatever hasn't been used recently is flushed back to disk to replenish the pool of free pages. In short, there isn't supposed to be much free physical memory.
The principle difference between using a normal memory allocation and memory mapped a files is where the data gets stored when it must be paged out of memory. It doesn't necessarily have any effect on when the memory will be paged out, and will have little effect on the time it takes to page it out.
The real problem you are seeing is probably not that you have too little free physical memory, but that the paging rate is too high.
My suggestion would be to attempt to reduce the amount of storage needed by your program, and see if you can increase the locality of reference to reduce the amount of paging needed.

Is Virtual Memory still relevant in today's world of inexpensive RAM? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Virtual memory was introduced to help run more programs with limited memory. But in todays environment of inexpensive RAM, is it still relevant?
Since there will be no disk access if it is disabled and all the programs will be memory resident, will it not improve the performance and program response times?
Is there any essential requirement for virtual memory in windows apart from running more programs as stated above? Something windows internal not known to us.
Some pedantry: virtual memory is not just the pagefile. The term encompasses a whole range of techniques that give the program the illusion that it has one single contiguous address space, some of which is the program's code, some of which is data, and some of which are DLLs or memory-mapped files.
So to your lead-in question: yes, virtual memory is required. It's what makes modern OS's work.
Don't disable virtual memory. 2GB is not nearly enough to even consider this. Regardless, you should always keep virtual memory on even if you do have enough since it will only ever be used when you actually need it. Much better to be safe than sorry since NOT having it active means you simply hit a wall, while having it active means your computer starts swapping to the hard drive but continues to run.
Yes, because it's the basis of all on-demand paging that occurs in a modern operating system, not just Windows.
Windows will always use all of your memory, if not for applications then for caching whatever you read from your hard drive. Because if that memory is not used, then you're throwing your investment in memory away. Basically, Windows uses your RAM as a big fat cache to your hard drives. And this happens all the time, as the relevant pages are only brought into main memory when you address the content of that page.
The question is really what is the use of a pagefile considering how much memory modern computers have and what's going on under the hood in the OS.
It's common for the Windows task manager to show not much physical memory being used, but, your having many page faults? Win32 will never allocate all it's physical memory. It always saves some for new resource needs. With a big pagefile vs small pagefile, Win32 will be slower to allocate physical memory to a process.
For a few days now I've been using a very small pagefile (200 MB fixed) in Vista with 3GB of addressable physical memory. I have had no crashes or problems. Haven't tried things like large video editing or many different processes open at once. I wouldn't recommend no pagefile since the OS can never shuffle pages around in physical memory leading to the development of holes. A large pagefile is fail-safe for people who wouldn't know how to manually increase the pagefile if a low memory warning pops up or the OS crashes.
Some points:
The kernel will use some of the physical memory and this will be shared through VM mapping with all other processes. Other processes will be in the remaining physical memory. VM makes each process see a 4GB mem space, the OS at the lower 2GB. Each process will need much less than the 4GB of physical memory, this amount is it's committed memory requirement. When programming, a malloc or new will reserve memory but not commit it. Things like the first write to the memory will commit it. Some memory is immedietely committed by the OS for each process.
Your question is really about using a page file, and not virtual memory, as kdgregory said. Probably the most important use for virtual memory is so that the OS can protect once process's memory from another processes memory, while still giving each process the illusion of a contiguous, flat virtual address space. The actual physical addresses can and will become fragmented, but the virtual addresses will appear contiguous.
Yes, virtual memory is vital. The page file, maybe not.
Grrr. Disk space is probably always going to be cheaper than RAM. One of my lab computers has 512MB of RAM. That used to be enough when I got it, but now it has slowed to a crawl swapping and I need to put more RAM in it. I am not running more software programs now than I was then, but they have all gotten more bloated, and they often spawn more "daemon" programs that just sit there doing nothing but wait for some event and use up memory. I look at my process list and the "in-memory" column for the file explorer is 40MB. For Firefox it's 162MB. Java's "update scheduler" jusched.exe uses another 3.6MB. And that's just the physical-memory, these numbers don't include the swap space.
So it's really important to save the quicker, more expensive memory for what can't be swapped out. I can spare tens of GB on my hard drive for swap space.
Memory is seen as cheap enough that the OS and many programs don't try to optimize any more. On the one hand it's great because it makes programs more maintainable and debuggable and quicker to develop. But I hate having to keep putting in more RAM into my computer.
A good explanation at
http://blogs.technet.com/markrussinovich/archive/2008/11/17/3155406.aspx
To optimally size your paging file you
should start all the applications you
run at the same time, load typical
data sets, and then note the commit
charge peak (or look at this value
after a period of time where you know
maximum load was attained). Set the
paging file minimum to be that value
minus the amount of RAM in your system
(if the value is negative, pick a
minimum size to permit the kind of
crash dump you are configured for). If
you want to have some breathing room
for potentially large commit demands,
set the maximum to double that number.
Virtual memory is much more than simply an extension of RAM. In reality, virtual memory is a system they virtualizes access to physical memory. Applications are presented with a consistent environment that is completely independent of RAM size. This offers a number of important advantages quite appart from the increased memory availabilty. Virtual memory is an integral part of the OS and cannot possibly be disabled.
The pagefile is NOT virtual memory. Many sources have claimed this, including some Microsoft articles. But it is wrong. You can disable the pagefile (not recommended) but this will not disable virtual memory.
Virtual mmeory has been used in large systems for some 40 years now and it is not going away anytime soon. The advantages are just too great. If virtual memory were eliminated all current 32 bit applications (and 64 bit as well) would become obsolete.
Larry Miller
Microsoft MCSA
Virtual memory is a safety net for situations when there is not enough RAM available for all running application. This was very common some time ago and today when you can have large amounts of system RAM it is less so.
Some say to leave page file alone and let it be managed by Windows. Some people say that even if you have large RAM keeping big pagefile cannot possibly hurt because it will not be used. That is not true since Windows does pre-emptive paging to prepare for spikes of memory demand. If that demand never comes this is just wasted HDD activity and we all know that HDD is the slowest component of any system. Pre-emptive paging with big enough RAM is just pointless and the only thing it does is to slow down any other disk activity that happens at the same time. Not to mention additional disk wear. Plus big page file means gigabytes of locked disk space.
A lot of people point to Mark Russinovich article to back up their strong belief that page file should not be disabled at any circumstances and so many clever people at Microsoft have thought it so thoroughly that we, little developers, should never question default Windows policy on page file size. But even Russinovich himself writes:
Set the paging file minimum to be that value (Peak Commit Charge) minus the amount of RAM in your system (if the value is negative, pick a minimum size to permit the kind of crash dump you are configured for).
So if you have large RAM amounts and your peek commit charge is never more than 50% of your RAM even when you open all your apps at once and then some, there is no need have page file at all. So in those situations 99.99% of time you will never need more memory than your RAM.
Now I am not advocating for disabling page file it but having it in size of your RAM or more is just waste of space and unnecessary activity that can slow down something else. Page file gives you a safety net in those rare (with plenty of RAM) situations when system does need more memory and to prevent it from getting out of memory which will most likely make your system unstable and unusable.
The only real need for page file is kernel dumps. If you need full kernel dumps you need at least 400 MB of paging file. But if you are happy with mini dumps, minimum is 16 MB. So to have best of both worlds which is
virtually no page file
safety net of virtual memory
I would suggest to configure Windows for mini kernel dumps, set minimum page file size to 16 MB and maximum to whatever you want. This way page file would be practically unused but would automatically expand after first out of memory error to prevent your system from being unusable. If you happen to have at least one out of memory issue you should of course reconsider your minimum size. If you really want to be safe make page file min. size 1 GB. For servers though you should be more careful.
Unfortunately, it is still needed because the windows operating system has a tendency to 'overcache'.
Plus, as stated above, 2GB isn't really enough to consider turning it off. Heck, I probably wouldn't turn it off until I had 8GB or more.
G-Man
Since there will be no disk access if it is disabled and all the programs will be memory resident, will it not improve the performance and program response times?
I'm not totally sure about other platforms, but I had a Linux machine where the swap-space had been accidently disabled. When a process used all available memory, the machine basically froze for 5 minutes, the load average went to absurd numbers and the kernel OOM killer kicked in and terminated several processes. Reenabling swap fixed this entirely.
I never experienced any unnecessary swapping to disc - it only happened when I used all the available memory. Modern OS's (even 5-10 year old Linux distros) deal with swap-space quite intelligently, and only use it when required.
You can probably get by without swap space, since it's quite rare to reach 4GB of memory usage with a single process. With a 64-bit OS and say 8GB of RAM it's even more rare.. but, there's really no point disabling swap-space, you don't gain much (if anything), and when you run out of physical memory without it, bad things happen..
Basically - any half-decent OS should only use disc-swap (or virtual-memory) when required. Disabling swap only stops the OS being able to fall back on it, which causes the OOM killer to strike (and thus data-loss when processes are terminated).

Resources