MapViewOfFile() no longer works after process hits the 2GB limit - winapi

MapViewOfFile() works without any problem if our process has not hit the 2GB limit yet. However if the process hits the limit then MapViewOfFile() no longer works even if some or all of the memory is deallocated. GetLastError() returns 8, which means ERROR_NOT_ENOUGH_MEMORY, Not enough storage is available to process this command. Here is a small program showing the problem:
#include <Windows.h>
#include <cstdio>
#include <vector>
const int sizeOfTheFileMappingObject = 20;
const int numberOfBytesToMap = sizeOfTheFileMappingObject;
const char* fileMappingObjectName = "Global\\QWERTY";
void Allocate2GBMemoryWithMalloc(std::vector<void*>* addresses)
{
const size_t sizeOfMemAllocatedAtOnce = 32 * 1024;
for (;;) {
void* address = malloc(sizeOfMemAllocatedAtOnce);
if (address != NULL) {
addresses->push_back(address);
}
else {
printf("The %dth malloc() returned NULL. Allocated memory: %d MB\n",
addresses->size() + 1,
(addresses->size() * sizeOfMemAllocatedAtOnce) / (1024 * 1024));
break;
}
}
}
void DeallocateMemoryWithFree(std::vector<void*>* addresses)
{
std::vector<void*>::iterator current = addresses->begin();
std::vector<void*>::iterator end = addresses->end();
for (; current != end; ++current) {
free(*current);
}
addresses->clear();
printf("Memory is deallocated.\n");
}
void TryToMapViewOfFile()
{
HANDLE fileMapping = OpenFileMappingA(FILE_MAP_ALL_ACCESS, FALSE,
fileMappingObjectName);
if (fileMapping == NULL) {
printf("OpenFileMapping() failed. LastError: %d\n", GetLastError());
return;
}
LPVOID mappedView = MapViewOfFile(fileMapping, FILE_MAP_READ, 0, 0,
numberOfBytesToMap);
if (mappedView == NULL) {
printf("MapViewOfFile() failed. LastError: %d\n", GetLastError());
if (!CloseHandle(fileMapping)) {
printf("CloseHandle() failed. LastError: %d\n", GetLastError());
}
return;
}
if (!UnmapViewOfFile(mappedView)) {
printf("UnmapViewOfFile() failed. LastError: %d\n", GetLastError());
if (!CloseHandle(fileMapping)) {
printf("CloseHandle() failed. LastError: %d\n", GetLastError());
}
return;
}
if (!CloseHandle(fileMapping)) {
printf("CloseHandle() failed. LastError: %d\n", GetLastError());
return;
}
printf("MapViewOfFile() succeeded.\n");
}
int main(int argc, char* argv[])
{
HANDLE fileMapping = CreateFileMappingA(INVALID_HANDLE_VALUE, NULL,
PAGE_READWRITE, 0, sizeOfTheFileMappingObject, fileMappingObjectName);
if (fileMapping == NULL) {
printf("CreateFileMapping() failed. LastError: %d\n", GetLastError());
return -1;
}
TryToMapViewOfFile();
std::vector<void*> addresses;
Allocate2GBMemoryWithMalloc(&addresses);
TryToMapViewOfFile();
DeallocateMemoryWithFree(&addresses);
TryToMapViewOfFile();
Allocate2GBMemoryWithMalloc(&addresses);
DeallocateMemoryWithFree(&addresses);
if (!CloseHandle(fileMapping)) {
printf("CloseHandle() failed. LastError: %d\n", GetLastError());
}
return 0;
}
The output of the program:
MapViewOfFile() succeeded.
The 65126th malloc() returned NULL. Allocated memory: 2035 MB
MapViewOfFile() failed. LastError: 8
Memory is deallocated.
MapViewOfFile() failed. LastError: 8
The 64783th malloc() returned NULL. Allocated memory: 2024 MB
Memory is deallocated.
As you can see MapViewOfFile() fails with 8 even after releasing all memory that was allocated. Even though MapViewOfFile() reports ERROR_NOT_ENOUGH_MEMORY we can call malloc() successfully.
We ran this example program on Windows7,32bit; Windows 8.1,32bit and Windows Server 2008 R2,64bit and the results were the same.
So the question is: Why does MapViewOfFile() fail with ERROR_NOT_ENOUGH_MEMORY after the process hits the 2GB limit?

Why MapViewOfFile fails
As IInspectable's comment explains freeing memory allocated with malloc doesn't make it available for use with MapViewOfFile. A 32-bit processes under Windows has a 4 GB virtual address space and only the first 2 GB of it is available for the application. (An exception would be a large address aware program, which increases this to 3 GB under suitably configured 32-bit kernels and to 4 GB under 64-bit kernels.) Everything in your program's memory must have a location somewhere in this first 2 GB of virtual address space. That includes the executable itself, any DLLs used by the program, any data you have in memory, whether allocated statically, on the stack or dynamically (eg. with malloc), and of course any file you map into memory with MapViewOfFile.
When your program first starts out the Visual C/C++ runtime creates a small heap for dynamic allocations for functions like malloc and operator new. As necessary the runtime increases the size of the heap in memory, and as it does so it uses up more virtual address space. Unfortunately it never shrinks the size of the heap. When you free a large block of memory the runtime only decommits the memory used. This makes the RAM used by the freed memory available for use, but virtual address space taken up by the freed memory remains allocated as part of the heap.
As mentioned previously a file mapped into memory by MapViewOfFile also takes up virtual address space. If the heap (plus your program, DLLs, and everything else) are using up the all virtual address space then there's no room to map files.
A possible solution: don't use malloc
An easy way to avoid the heap from growing to fill all of the virtual address space is to not to use the Visual C/C++ runtime to allocate large (at least 64k) blocks of memory. Instead allocate and free memory from Windows directly using VirtualAlloc(..., MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE) and VirtualFree(..., MEM_RELEASE). The later function releases both the RAM used by the region and virtual address space taken up by it, making it available for use with MapViewOfFile.
But...
You can still run into another problem, where MapViewOfFile can still fail even though the size of view is smaller, sometimes even much smaller, than the total amount free virtual address space. This is because the view needs to be mapped into a contiguous region of virtual address space. If the virtual address space becomes fragmented. the largest contiguous region of unreserved virtual address space can up being relatively small. Even when your program first starts up, before you have had a chance to do any dynamic allocations, the virtual address space can be somewhat fragmented because of DLLs loaded at various addresses. If you have a long lived program that does a lot of allocations and deallocations with VirtualAlloc and VirtualFree, you can end up with a very fragmented virtual address space. If you encounter this problem you'll have to change your pattern of allocations, maybe even implement your own heap allocator.

Related

How to identify what parts of the allocated virtual memory a process is using

I want to be able to search through the allocated memory of a process (say you open notepad and type “HelloWorld” then ran the search looking for the string “HelloWorld”). For 32bit applications this is not a problem but for 64 bit applications the large quantity of allocated virtual memory takes hours to search through.
Obviously the vast majority of applications are not utilising the full amount of virtual memory allocated. I can identify the areas in memory allocated to each process with VirtualQueryEX and read them with ReadProcessMemory but when it comes to 64 bit applications this still takes hours to complete.
Does anyone know of any resources or any methods that could be used to help narrow down the amount of memory to be searched?
It is important that you only scan proper memory. If you just scanned from 0x0 to 0xFFFFFFFFF it would take at least 5 seconds in most processes. You can skip bad regions of memory by checking the memory page settings by using VirtualQueryEx. This will retrieve a MEMORY_BASIC_INFORMATION which will define the state of that memory region.
If the MemoryBasicInformation.state is not MEM_COMMIT then it is bad memory
If the MBI.Protect is PAGE_NOACCESS you also want to skip this memory.
If VirtualQuery fails then you skip to the next region.
In this manner it should only take 0-2 seconds to scan the memory on your average process because it is only scanning good memory.
char* ScanEx(char* pattern, char* mask, char* begin, intptr_t size, HANDLE hProc)
{
char* match{ nullptr };
SIZE_T bytesRead;
DWORD oldprotect;
char* buffer{ nullptr };
MEMORY_BASIC_INFORMATION mbi;
mbi.RegionSize = 0x1000;//
VirtualQueryEx(hProc, (LPCVOID)begin, &mbi, sizeof(mbi));
for (char* curr = begin; curr < begin + size; curr += mbi.RegionSize)
{
if (!VirtualQueryEx(hProc, curr, &mbi, sizeof(mbi))) continue;
if (mbi.State != MEM_COMMIT || mbi.Protect == PAGE_NOACCESS) continue;
delete[] buffer;
buffer = new char[mbi.RegionSize];
if (VirtualProtectEx(hProc, mbi.BaseAddress, mbi.RegionSize, PAGE_EXECUTE_READWRITE, &oldprotect))
{
ReadProcessMemory(hProc, mbi.BaseAddress, buffer, mbi.RegionSize, &bytesRead);
VirtualProtectEx(hProc, mbi.BaseAddress, mbi.RegionSize, oldprotect, &oldprotect);
char* internalAddr = ScanBasic(pattern, mask, buffer, (intptr_t)bytesRead);
if (internalAddr != nullptr)
{
//calculate from internal to external
match = curr + (internalAddr - buffer);
break;
}
}
}
delete[] buffer;
return match;
}
ScanBasic is just a standard comparison function which compares your pattern against the buffer.
Second, if you know the address is relative to a module, only scan the address range of that module, you can get the size of the module via ToolHelp32Snapshot. If you know it's dynamic memory on the heap, then only scan the heap. You can get all the heaps also with ToolHelp32Snapshot and TH32CS_SNAPHEAPLIST.
You can make a wrapper for this function as well for scanning the entire address space of the process might look something like this
char* Pattern::Ex::ScanProc(char* pattern, char* mask, ProcEx& proc)
{
unsigned long long int kernelMemory = IsWow64Proc(proc.handle) ? 0x80000000 : 0x800000000000;
return Scan(pattern, mask, 0x0, (intptr_t)kernelMemory, proc.handle);
}

Freeing platform driver device struct [duplicate]

I have found devm_kzalloc() and kzalloc() in device driver programmong. But I don't know when/where to use these functions. Can anyone please specify the importance of these functions and their usage.
kzalloc() allocates kernel memory like kmalloc(), but it also zero-initializes the allocated memory. devm_kzalloc() is managed kzalloc(). The memory allocated with managed functions is associated with the device. When the device is detached from the system or the driver for the device is unloaded, that memory is freed automatically. If multiple managed resources (memory or some other resource) were allocated for the device, the resource allocated last is freed first.
Managed resources are very helpful to ensure correct operation of the driver both for initialization failure at any point and for successful initialization followed by the device removal.
Please note that managed resources (whether it's memory or some other resource) are meant to be used in code responsible for the probing the device. They are generally a wrong choice for the code used for opening the device, as the device can be closed without being disconnected from the system. Closing the device requires freeing the resources manually, which defeats the purpose of managed resources.
The memory allocated with kzalloc() should be freed with kfree(). The memory allocated with devm_kzalloc() is freed automatically. It can be freed with devm_kfree(), but it's usually a sign that the managed memory allocation is not a good fit for the task.
In simple words devm_kzalloc() and kzalloc() both are used for memory allocation in device driver but the difference is if you allocate memory by kzalloc() than you have to free that memory when the life cycle of that device driver is ended or when it is unloaded from kernel but if you do the same with devm_kzalloc() you need not to worry about freeing memory,that memory is freed automatically by device library itself.
Both of them does the exactly the same thing but by using devm_kzalloc little overhead of freeing memory is released from programmers
Let explain you by giving example, first example by using kzalloc
static int pxa3xx_u2d_probe(struct platform_device *pdev)
{
int err;
u2d = kzalloc(sizeof(struct pxa3xx_u2d_ulpi), GFP_KERNEL); 1
if (!u2d)
return -ENOMEM;
u2d->clk = clk_get(&pdev->dev, NULL);
if (IS_ERR(u2d->clk)) {
err = PTR_ERR(u2d->clk); 2
goto err_free_mem;
}
...
return 0;
err_free_mem:
kfree(u2d);
return err;
}
static int pxa3xx_u2d_remove(struct platform_device *pdev)
{
clk_put(u2d->clk);
kfree(u2d); 3
return 0;
}
In this example you can this in funtion pxa3xx_u2d_remove(),
kfree(u2d)(line indicated by 3) is there to free memory allocated by u2d
now see the same code by using devm_kzalloc()
static int pxa3xx_u2d_probe(struct platform_device *pdev)
{
int err;
u2d = devm_kzalloc(&pdev->dev, sizeof(struct pxa3xx_u2d_ulpi), GFP_KERNEL);
if (!u2d)
return -ENOMEM;
u2d->clk = clk_get(&pdev->dev, NULL);
if (IS_ERR(u2d->clk)) {
err = PTR_ERR(u2d->clk);
goto err_free_mem;
}
...
return 0;
err_free_mem:
return err;
}
static int pxa3xx_u2d_remove(struct platform_device *pdev)
{
clk_put(u2d->clk);
return 0;
}
there is no kfree() to free function because the same is done by devm_kzalloc()

Process memory, GPU shared memory and x86 process on x64 windows address space

Out of curiosity and some strange behavior observation.How is the address space for x86 process presented when allocating both for the process itself using win32 memory management functions (malloc/new afterall go down there) and allocating textures on integrated intel GPU which uses machine's shared memory? Is the GPU allocations are part of process address space? Since I've seen today strange stuff happening to my process. I'm using x86 process on x64 machine, my process committed memory size is about ~1.3Gb, the GPU shared memory consumption is ~600Mb and I start to get ENOMEM from HeapAlloc when trying to allocate 32Mb buffer. I dont believe fragmentation is something to deal with here since the process runs up to minute. So I got the impression that the GPU memory is counted in the process address space, otherwise I cant explain how come the HeapAlloc returns null for CRT heap. Side note, DLL linked without /LARGEADDRESSAWARE, so 2Gb looks as the sum of above numbers (1.3+0.6)
Am I right? Wrong? Can anyone explain how it works?
EDIT001: A little clarification, the GPU consumes ~600Gb not out of the blue, but since I allocate textures using DirectX.
EDIT002: Test added
I skipped device initialization here
constexpr size_t dim = 5000;
CD3D11_TEXTURE2D_DESC texDescriptor(DXGI_FORMAT_D24_UNORM_S8_UINT, dim, dim, 1, 1, D3D11_BIND_DEPTH_STENCIL);
std::vector<std::vector<uint8_t>> procData;
std::vector<CComPtr<ID3D11Texture2D>> gpuData;
// Some device/context init here
for(;;)
{
{
CComPtr<ID3D11Texture2D> tex;
hr = device->CreateTexture2D(&texDescriptor, nullptr, &tex);
if(SUCCEEDED(hr))
{
gpuData.emplace_back(tex);
}
else
{
std::cout << "Failed to create " << gpuData.size() << "th texture." << std::endl;
}
}
{
try
{
std::vector<uint8_t> buff(dim * dim, 0);
procData.emplace_back(buff);
}
catch(std::exception& ex)
{
std::cout << "Failed to create " << procData.size() << "th buffer." << std::endl;
}
}
}
Just to remind, it is x86 process, with no LARGEADRESSAWARE setting, so, 2Gb available to it.
The above code produces 35 buffers and 34 textures. If you comment out the texture creating block, 70 buffers created. Well...
no. "process address space" in windows means memory pages allocated for task.to deal with video memory you'll need ddk stuff.just "app" cant do things of this kind and doesnt own anything "video".

Memory limit for mmap

I am trying to mmap a char device. It works for 65536 bytes. But I get the following error if I try for more memory.
mmap: Resource temporarily unavailable
I want to mmap 1MB memory for a device. I use alloc_chrdev_region, cdev_init, cdev_add for the char device. How can I mmap memory larger than 65K? Should I use block device?
Using the MAP_LOCKED flag in the mmap call can cause this error. The used mlock can return EAGAIN if the amount of memory can not be locked.
From man mmap:
MAP_LOCKED (since Linux 2.5.37) Lock the pages of the mapped region
into memory in the manner of mlock(2). This flag is ignored in older
kernels.
From man mlock:
EAGAIN:
Some or all of the specified address range could not be
locked.
Did you implement *somedevice_mmap()* file operation?
static int somedev_mmap(struct file *filp, struct vm_area_struct *vma)
{
/* Do something. You probably need to use ioremap(). */
return 0;
}
static const struct file_operations somedev_fops = {
.owner = THIS_MODULE,
/* Initialize other file operations. */
.mmap = somedev_mmap,
};

What useful things can I do with Visual C++ Debug CRT allocation hooks except finding reproduceable memory leaks?

Visual C++ debug runtime library features so-called allocation hooks. Works this way: you define a callback and call _CrtSetAllocHook() to set that callback. Now every time a memory allocation/deallocation/reallocation is done CRT calls that callback and passes a handful of parameters.
I successfully used an allocation hook to find a reproduceable memory leak - basically CRT reported that there was an unfreed block with allocation number N (N was the same on every program run) at program termination and so I wrote the following in my hook:
int MyAllocHook( int allocType, void* userData, size_t size, int blockType,
long requestNumber, const unsigned char* filename, int lineNumber)
{
if( requestNumber == TheNumberReported ) {
Sleep( 0 );// a line to put breakpoint on
}
return TRUE;
}
since the leak was reported with the very same allocation number every time I could just put a breakpoint inside the if-statement and wait until it was hit and then inspect the call stack.
What other useful things can I do using allocation hooks?
You could also use it to find unreproducible memory leaks:
Make a data structure where you map the allocated pointer to additional information
In the allocation hook you could query the current call stack (StackWalk function) and store the call stack in the data structure
In the de-allocation hook, remove the call stack information for that allocation
At the end of your application, loop over the data structure and report all call stacks. These are the places where memory was allocated but not freed.
The value "requestNumber" is not passed on to the function when deallocating (MS VS 2008). Without this number you cannot keep track of your allocation. However, you can peek into the heap header and extract that value from there:
Note: This is compiler dependent and may change without notice/ warning by the compiler.
// This struct is a copy of the heap header used by MS VS 2008.
// This information is prepending each allocated memory object in debug mode.
struct MsVS_CrtMemBlockHeader {
MsVS_CrtMemBlockHeader * _next;
MsVS_CrtMemBlockHeader * _prev;
char * _szFilename;
int _nLine;
int _nDataSize;
int _nBlockUse;
long _lRequest;
char _gap[4];
};
int MyAllocHook(..) { // same as in question
if(nAllocType == _HOOK_FREE) {
// requestNumber isn't passed on to the Hook on free.
// However in the heap header this value is stored.
size_t headerSize = sizeof(MsVS_CrtMemBlockHeader);
MsVS_CrtMemBlockHeader* pHead;
size_t ptr = (size_t) pvData - headerSize;
pHead = (MsVS_CrtMemBlockHeader*) (ptr);
long requestNumber = pHead->_lRequest;
// Do what you like to keep track of this allocation.
}
}
You could keep record of every allocation request then remove it once the deallocation is invoked, for instance: This could help you tracking memory leak problems that are way much worse than this to track down.
Just the first idea that comes to my mind...

Resources