How to identify what parts of the allocated virtual memory a process is using - windows

I want to be able to search through the allocated memory of a process (say you open notepad and type “HelloWorld” then ran the search looking for the string “HelloWorld”). For 32bit applications this is not a problem but for 64 bit applications the large quantity of allocated virtual memory takes hours to search through.
Obviously the vast majority of applications are not utilising the full amount of virtual memory allocated. I can identify the areas in memory allocated to each process with VirtualQueryEX and read them with ReadProcessMemory but when it comes to 64 bit applications this still takes hours to complete.
Does anyone know of any resources or any methods that could be used to help narrow down the amount of memory to be searched?

It is important that you only scan proper memory. If you just scanned from 0x0 to 0xFFFFFFFFF it would take at least 5 seconds in most processes. You can skip bad regions of memory by checking the memory page settings by using VirtualQueryEx. This will retrieve a MEMORY_BASIC_INFORMATION which will define the state of that memory region.
If the MemoryBasicInformation.state is not MEM_COMMIT then it is bad memory
If the MBI.Protect is PAGE_NOACCESS you also want to skip this memory.
If VirtualQuery fails then you skip to the next region.
In this manner it should only take 0-2 seconds to scan the memory on your average process because it is only scanning good memory.
char* ScanEx(char* pattern, char* mask, char* begin, intptr_t size, HANDLE hProc)
{
char* match{ nullptr };
SIZE_T bytesRead;
DWORD oldprotect;
char* buffer{ nullptr };
MEMORY_BASIC_INFORMATION mbi;
mbi.RegionSize = 0x1000;//
VirtualQueryEx(hProc, (LPCVOID)begin, &mbi, sizeof(mbi));
for (char* curr = begin; curr < begin + size; curr += mbi.RegionSize)
{
if (!VirtualQueryEx(hProc, curr, &mbi, sizeof(mbi))) continue;
if (mbi.State != MEM_COMMIT || mbi.Protect == PAGE_NOACCESS) continue;
delete[] buffer;
buffer = new char[mbi.RegionSize];
if (VirtualProtectEx(hProc, mbi.BaseAddress, mbi.RegionSize, PAGE_EXECUTE_READWRITE, &oldprotect))
{
ReadProcessMemory(hProc, mbi.BaseAddress, buffer, mbi.RegionSize, &bytesRead);
VirtualProtectEx(hProc, mbi.BaseAddress, mbi.RegionSize, oldprotect, &oldprotect);
char* internalAddr = ScanBasic(pattern, mask, buffer, (intptr_t)bytesRead);
if (internalAddr != nullptr)
{
//calculate from internal to external
match = curr + (internalAddr - buffer);
break;
}
}
}
delete[] buffer;
return match;
}
ScanBasic is just a standard comparison function which compares your pattern against the buffer.
Second, if you know the address is relative to a module, only scan the address range of that module, you can get the size of the module via ToolHelp32Snapshot. If you know it's dynamic memory on the heap, then only scan the heap. You can get all the heaps also with ToolHelp32Snapshot and TH32CS_SNAPHEAPLIST.
You can make a wrapper for this function as well for scanning the entire address space of the process might look something like this
char* Pattern::Ex::ScanProc(char* pattern, char* mask, ProcEx& proc)
{
unsigned long long int kernelMemory = IsWow64Proc(proc.handle) ? 0x80000000 : 0x800000000000;
return Scan(pattern, mask, 0x0, (intptr_t)kernelMemory, proc.handle);
}

Related

vmalloc() allocates from vm_struct list

Kernel document https://www.kernel.org/doc/gorman/html/understand/understand010.html says, that for vmalloc-ing
It searches through a linear linked list of vm_structs and returns a new struct describing the allocated region.
Does that mean vm_struct list is already created while booting up, just like kmem_cache_create and vmalloc() just adjusts the page entries? If that is the case, say if I have a 16GB RAM in x86_64 machine, the whole ZONE_NORMAL i.e
16GB - ZONE_DMA - ZONE_DMA32 - slab-memory(cache/kmalloc)
is used to create vm_struct list?
That document is fairly old. It's talking about Linux 2.5-2.6. Things have changed quite a bit with those functions from what I can tell. I'll start by talking about code from kernel 2.6.12 since that matches Gorman's explanation and is the oldest non-rc tag in the Linux kernel Github repo.
The vm_struct list that the document is referring to is called vmlist. It is created here as a struct pointer:
struct vm_struct *vmlist;
Trying to figure out if it is initialized with any structs during bootup took some deduction. The easiest way to figure it out was by looking at the function get_vmalloc_info() (edited for brevity):
if (!vmlist) {
vmi->largest_chunk = VMALLOC_TOTAL;
}
else {
vmi->largest_chunk = 0;
prev_end = VMALLOC_START;
for (vma = vmlist; vma; vma = vma->next) {
unsigned long addr = (unsigned long) vma->addr;
if (addr >= VMALLOC_END)
break;
vmi->used += vma->size;
free_area_size = addr - prev_end;
if (vmi->largest_chunk < free_area_size)
vmi->largest_chunk = free_area_size;
prev_end = vma->size + addr;
}
if (VMALLOC_END - prev_end > vmi->largest_chunk)
vmi->largest_chunk = VMALLOC_END - prev_end;
}
The logic says that if the vmlist pointer is equal to NULL (!NULL), then there are no vm_structs on the list and the largest_chunk of free memory in this VMALLOC area is the entire space, hence VMALLOC_TOTAL. However, if there is something on the vmlist, then figure out the largest chunk based on the difference between the address of the current vm_struct and the end of the previous vm_struct (i.e. free_area_size = addr - prev_end).
What this tells us is that when we vmalloc, we look through the vmlist to find the absence of a vm_struct in a virtual memory area big enough to accomodate our request. Only then can it create this new vm_struct, which will now be part of the vmlist.
vmalloc will eventually call __get_vm_area(), which is where the action happens:
for (p = &vmlist; (tmp = *p) != NULL ;p = &tmp->next) {
if ((unsigned long)tmp->addr < addr) {
if((unsigned long)tmp->addr + tmp->size >= addr)
addr = ALIGN(tmp->size +
(unsigned long)tmp->addr, align);
continue;
}
if ((size + addr) < addr)
goto out;
if (size + addr <= (unsigned long)tmp->addr)
goto found;
addr = ALIGN(tmp->size + (unsigned long)tmp->addr, align);
if (addr > end - size)
goto out;
}
found:
area->next = *p;
*p = area;
By this point in the function we have already created a new vm_struct named area. This for loop just needs to find where to put the struct in the list. If the vmlist is empty, we skip the loop and immediately execute the "found" lines, making *p (the vmlist) point to our struct. Otherwise, we need to find the struct that will go after ours.
So in summary, this means that even though the vmlist pointer might be created at boot time, the list isn't necessarily populated at boot time. That is, unless there are vmalloc calls during boot or functions that explicitly add vm_structs to the list during boot as in future kernel versions (see below for kernel 6.0.9).
One further clarification for you. You asked if ZONE_NORMAL is used for the vmlist, but those are two separate memory address spaces. ZONE_NORMAL is describing physical memory whereas vm is virtual memory. There are lots of resources for explaining the difference between the two (e.g. this Stack Overflow question). The specific virtual memory address range for vmlist goes from VMALLOC_START to VMALLOC_END. In x86, those were defined as:
#define VMALLOC_START 0xffffc20000000000UL
#define VMALLOC_END 0xffffe1ffffffffffUL
For kernel version 6.0.9:
The creation of the vm_struct list is here:
static struct vm_struct *vmlist __initdata;
At this point, there is nothing on the list. But in this kernel version there are a few boot functions that may add structs to the list:
void __init vm_area_add_early(struct vm_struct *vm)
void __init vm_area_register_early(struct vm_struct *vm, size_t align)
As for vmalloc in this version, the vmlist is now only a list used during initialization. get_vm_area() now calls get_vm_area_node(), which is a NUMA ready function. From there, the logic goes deeper and is much more complicated than the linear search described above.

Windows memory metric to detect memory leak

We have large old legacy server code running as a 64bit windows service.
The service has a memory leak which, at the moment, we do not have the resources to fix.
As the service is resilient to restart, a temporary terrible 'solution' we want is to detect when the service's memory exceeded, e.g., 5GB, and exit the service (which has auto restart for such cases).
My question is which metric should I go for? Is using GlobalMemoryStatusEx to get
MEMORYSTATUSEX.ullTotalVirtual- MEMORYSTATUSEX.ullAvailVirtual right?
GlobalMemoryStatusEx is wrong. You do not want to fill up the machine memory until 5 GB are left in total.
You need GetProcessMemoryInfo.
BOOL WINAPI GetProcessMemoryInfo(
__in HANDLE Process,
__out PPROCESS_MEMORY_COUNTERS ppsmemCounters,
__in DWORD cb
);
From an example using GetProcessMemoryInfo:
#include <windows.h>
#include <stdio.h>
#include <psapi.h>
// To ensure correct resolution of symbols, add Psapi.lib to TARGETLIBS
// and compile with -DPSAPI_VERSION=1
void PrintMemoryInfo( DWORD processID )
{
HANDLE hProcess;
PROCESS_MEMORY_COUNTERS pmc;
// Print the process identifier.
printf( "\nProcess ID: %u\n", processID );
// Print information about the memory usage of the process.
hProcess = OpenProcess( PROCESS_QUERY_INFORMATION |
PROCESS_VM_READ,
FALSE, processID );
if (NULL == hProcess)
return;
if ( GetProcessMemoryInfo( hProcess, &pmc, sizeof(pmc)) )
{
printf( "\tWorkingSetSize: 0x%08X\n", pmc.WorkingSetSize );
printf( "\tPagefileUsage: 0x%08X\n", pmc.PagefileUsage );
}
CloseHandle( hProcess );
}
int main( void )
{
// Get the list of process identifiers.
DWORD aProcesses[1024], cbNeeded, cProcesses;
unsigned int i;
if ( !EnumProcesses( aProcesses, sizeof(aProcesses), &cbNeeded ) )
{
return 1;
}
// Calculate how many process identifiers were returned.
cProcesses = cbNeeded / sizeof(DWORD);
// Print the memory usage for each process
for ( i = 0; i < cProcesses; i++ )
{
PrintMemoryInfo( aProcesses[i] );
}
return 0;
}
Although unintuitive you need to read PagefileUsage which gets you the committed memory which was allocated by your process. WorkingSetSize is unreliable because if the machine gets tight on memory the OS will write all data to the page file. That can cause WorkingSetSize to be small (e.g. 100 MB) but in reality you leaked already 20 GB of memory. This would result in a saw tooth pattern in memory consumption until the page file is full. Working set is only the actively used memory which might hide the multi GB memory leak if the machine is under memory pressure.

Run a process at the same physical memory location

For a research project, I have a long-running process that uses various buffers and stack variables. I'd like to be able to launch this process multiple times such that the physical addresses backing its heap, stack, code, and static variables are equal each time. I know the exact size of all of these variables, and the size of the heap and stack stay constant during execution. To help with this, I use some helper code to translate arbitrary virtual addresses in my program to their corresponding physical addresses (sourced from here):
struct pagemap
{
union status
{
struct present
{
unsigned long long pfn : 54;
unsigned char soft_dirty : 1;
unsigned char exclusive : 1;
unsigned char zeroes : 4;
unsigned char type : 1;
unsigned char swapped : 1;
unsigned char present : 1;
} present;
struct swapped
{
unsigned char swaptype : 4;
unsigned long long offset : 50;
unsigned char soft_dirty : 1;
unsigned char exclusive : 1;
unsigned char zeroes : 4;
unsigned char type : 1;
unsigned char swapped : 1;
unsigned char present : 1;
} swapped;
} status;
} __attribute__ ((packed));
unsigned long get_pfn_for_addr(void *addr)
{
unsigned long offset;
struct pagemap pagemap;
FILE *pagemap_file = fopen("/proc/self/pagemap", "rb");
offset = (unsigned long) addr / getpagesize() * 8;
if(fseek(pagemap_file, offset, SEEK_SET) != 0)
{
fprintf(stderr, "failed to seek pagemap to offset\n");
exit(1);
}
fread(&pagemap, 1, sizeof(struct pagemap), pagemap_file);
fclose(pagemap_file);
return pagemap.status.present.pfn;
}
unsigned long virt_to_phys(void *addr)
{
unsigned long pfn, page_offset, phys_addr;
pfn = get_pfn_for_addr(addr);
page_offset = (unsigned long) addr % getpagesize();
phys_addr = (pfn << PAGE_SHIFT) + page_offset;
return phys_addr;
}
So far, my methodology has only required that a specific buffer in my program is located at the same physical address for each run. For this, I was just able to exit and relaunch the process whenever the physical address for that buffer was wrong, and I would end up with the correct location relatively quickly each time. However, I'd like to extend my experiment to ensure that my process is loaded identically in physical memory between runs, and this try-and-restart method does not seem to work well for this. Ideally, I would like to be able to set apart some small number of physical page frames that can't be allocated to another process, or to the kernel itself. Then, I would pass a flag down to do_fork that notifies the kernel that this is my special process and to allocate specific page frames to it.
My questions are:
Is there any sort of isolation mechanism already built into the kernel that would let me set aside an exclusive physical memory space that I could launch my process in?
If not, what would be a starting point for modifying the kernel to support behavior like this?
Is there any other solution (not involving either of the two above) that I could use for my desired behavior?
This is something that the kernel, using virtual memory, is tasked to abstract from you, so I'm not sure it is even possible to do (without insane amounts of work).
May I ask what experiment requires this? Perhaps if you describe what you want to achieve, it is easier to offer advice.

MapViewOfFile() no longer works after process hits the 2GB limit

MapViewOfFile() works without any problem if our process has not hit the 2GB limit yet. However if the process hits the limit then MapViewOfFile() no longer works even if some or all of the memory is deallocated. GetLastError() returns 8, which means ERROR_NOT_ENOUGH_MEMORY, Not enough storage is available to process this command. Here is a small program showing the problem:
#include <Windows.h>
#include <cstdio>
#include <vector>
const int sizeOfTheFileMappingObject = 20;
const int numberOfBytesToMap = sizeOfTheFileMappingObject;
const char* fileMappingObjectName = "Global\\QWERTY";
void Allocate2GBMemoryWithMalloc(std::vector<void*>* addresses)
{
const size_t sizeOfMemAllocatedAtOnce = 32 * 1024;
for (;;) {
void* address = malloc(sizeOfMemAllocatedAtOnce);
if (address != NULL) {
addresses->push_back(address);
}
else {
printf("The %dth malloc() returned NULL. Allocated memory: %d MB\n",
addresses->size() + 1,
(addresses->size() * sizeOfMemAllocatedAtOnce) / (1024 * 1024));
break;
}
}
}
void DeallocateMemoryWithFree(std::vector<void*>* addresses)
{
std::vector<void*>::iterator current = addresses->begin();
std::vector<void*>::iterator end = addresses->end();
for (; current != end; ++current) {
free(*current);
}
addresses->clear();
printf("Memory is deallocated.\n");
}
void TryToMapViewOfFile()
{
HANDLE fileMapping = OpenFileMappingA(FILE_MAP_ALL_ACCESS, FALSE,
fileMappingObjectName);
if (fileMapping == NULL) {
printf("OpenFileMapping() failed. LastError: %d\n", GetLastError());
return;
}
LPVOID mappedView = MapViewOfFile(fileMapping, FILE_MAP_READ, 0, 0,
numberOfBytesToMap);
if (mappedView == NULL) {
printf("MapViewOfFile() failed. LastError: %d\n", GetLastError());
if (!CloseHandle(fileMapping)) {
printf("CloseHandle() failed. LastError: %d\n", GetLastError());
}
return;
}
if (!UnmapViewOfFile(mappedView)) {
printf("UnmapViewOfFile() failed. LastError: %d\n", GetLastError());
if (!CloseHandle(fileMapping)) {
printf("CloseHandle() failed. LastError: %d\n", GetLastError());
}
return;
}
if (!CloseHandle(fileMapping)) {
printf("CloseHandle() failed. LastError: %d\n", GetLastError());
return;
}
printf("MapViewOfFile() succeeded.\n");
}
int main(int argc, char* argv[])
{
HANDLE fileMapping = CreateFileMappingA(INVALID_HANDLE_VALUE, NULL,
PAGE_READWRITE, 0, sizeOfTheFileMappingObject, fileMappingObjectName);
if (fileMapping == NULL) {
printf("CreateFileMapping() failed. LastError: %d\n", GetLastError());
return -1;
}
TryToMapViewOfFile();
std::vector<void*> addresses;
Allocate2GBMemoryWithMalloc(&addresses);
TryToMapViewOfFile();
DeallocateMemoryWithFree(&addresses);
TryToMapViewOfFile();
Allocate2GBMemoryWithMalloc(&addresses);
DeallocateMemoryWithFree(&addresses);
if (!CloseHandle(fileMapping)) {
printf("CloseHandle() failed. LastError: %d\n", GetLastError());
}
return 0;
}
The output of the program:
MapViewOfFile() succeeded.
The 65126th malloc() returned NULL. Allocated memory: 2035 MB
MapViewOfFile() failed. LastError: 8
Memory is deallocated.
MapViewOfFile() failed. LastError: 8
The 64783th malloc() returned NULL. Allocated memory: 2024 MB
Memory is deallocated.
As you can see MapViewOfFile() fails with 8 even after releasing all memory that was allocated. Even though MapViewOfFile() reports ERROR_NOT_ENOUGH_MEMORY we can call malloc() successfully.
We ran this example program on Windows7,32bit; Windows 8.1,32bit and Windows Server 2008 R2,64bit and the results were the same.
So the question is: Why does MapViewOfFile() fail with ERROR_NOT_ENOUGH_MEMORY after the process hits the 2GB limit?
Why MapViewOfFile fails
As IInspectable's comment explains freeing memory allocated with malloc doesn't make it available for use with MapViewOfFile. A 32-bit processes under Windows has a 4 GB virtual address space and only the first 2 GB of it is available for the application. (An exception would be a large address aware program, which increases this to 3 GB under suitably configured 32-bit kernels and to 4 GB under 64-bit kernels.) Everything in your program's memory must have a location somewhere in this first 2 GB of virtual address space. That includes the executable itself, any DLLs used by the program, any data you have in memory, whether allocated statically, on the stack or dynamically (eg. with malloc), and of course any file you map into memory with MapViewOfFile.
When your program first starts out the Visual C/C++ runtime creates a small heap for dynamic allocations for functions like malloc and operator new. As necessary the runtime increases the size of the heap in memory, and as it does so it uses up more virtual address space. Unfortunately it never shrinks the size of the heap. When you free a large block of memory the runtime only decommits the memory used. This makes the RAM used by the freed memory available for use, but virtual address space taken up by the freed memory remains allocated as part of the heap.
As mentioned previously a file mapped into memory by MapViewOfFile also takes up virtual address space. If the heap (plus your program, DLLs, and everything else) are using up the all virtual address space then there's no room to map files.
A possible solution: don't use malloc
An easy way to avoid the heap from growing to fill all of the virtual address space is to not to use the Visual C/C++ runtime to allocate large (at least 64k) blocks of memory. Instead allocate and free memory from Windows directly using VirtualAlloc(..., MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE) and VirtualFree(..., MEM_RELEASE). The later function releases both the RAM used by the region and virtual address space taken up by it, making it available for use with MapViewOfFile.
But...
You can still run into another problem, where MapViewOfFile can still fail even though the size of view is smaller, sometimes even much smaller, than the total amount free virtual address space. This is because the view needs to be mapped into a contiguous region of virtual address space. If the virtual address space becomes fragmented. the largest contiguous region of unreserved virtual address space can up being relatively small. Even when your program first starts up, before you have had a chance to do any dynamic allocations, the virtual address space can be somewhat fragmented because of DLLs loaded at various addresses. If you have a long lived program that does a lot of allocations and deallocations with VirtualAlloc and VirtualFree, you can end up with a very fragmented virtual address space. If you encounter this problem you'll have to change your pattern of allocations, maybe even implement your own heap allocator.

Is there a way to force windows to cache a file?

Is there like batch command or something that will force windows to cache that file? I am trying to create a game preloader that loads certain game files into cache before starting the game. Is there any way I can do this?
updated int main code:
int main(int argc, const char** argv)
{
if(argc >= 2) for(int i = 1; argv[i]; ++i) pf("C:\\Games\World_of_Tanks\res\packages\gui.pkg"[i]);
return 0;
}
All you need to do is load the files, either using ReadFile or by memory mapping the files and touching every page (in fact, due to allocation granularity every 16th page suffices, but in theory you should be touching every page).
Memory mapping is faster and more cache-friendly, since you do not need to allocate extra memory to hold the data (which you aren't going to use for anything useful!). The OS will reuse the same physical memory for the cache and for the virtual memory that your process can see.
Several mainstream applications, including Microsoft Office and Adobe Reader do exactly that to launch faster. It's those "delayed start" services that keep your harddisk light flashing for a dozen seconds after you log in.
Do note, however, that while you can force Windows1 to cache files that way, but you cannot force it to keep the files in the cache indefinitively. If there is not enough physical RAM available, the system will throw away cache contents in order to satisfy application demands.
EDIT: Minimum working example implementation using filemapping:
#include <windows.h>
#include <cstdio>
void pf(const char* name)
{
HANDLE file = CreateFile(name, GENERIC_READ, FILE_SHARE_READ | FILE_SHARE_WRITE, 0, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, 0);
if(file == INVALID_HANDLE_VALUE) { printf("couldn't open %s\n", name); return; };
unsigned int len = GetFileSize(file, 0);
HANDLE mapping = CreateFileMapping(file, 0, PAGE_READONLY, 0, 0, 0);
if(mapping == 0) { printf("couldn't map %s\n", name); return; }
const char* data = (const char*) MapViewOfFile(mapping, FILE_MAP_READ, 0, 0, 0);
if(data)
{
printf("prefetching %s... ", name);
// need volatile or need to use result - compiler will otherwise optimize out whole loop
volatile unsigned int touch = 0;
for(unsigned int i = 0; i < len; i += 4096)
touch += data[i];
}
else
printf("couldn't create view of %s\n", name);
UnmapViewOfFile(data);
CloseHandle(mapping);
CloseHandle(file);
}
int main(int argc, const char** argv)
{
if(argc >= 2) for(int i = 1; argv[i]; ++i) pf(argv[i]);
return 0;
}
The program will try to prefetch any filename given on the commandline.
The code isn't overly pretty but it works. It uses ANSI filenames, and leaks a file handle in case opening succeeds but mapping fails (but bleh... it's not really a problem, the OS will clean up after the program exits -- if that annoys you, wrap the handles in RAII). It's also limited to ca. 1.8GiB file size due to address space in a 32-bit build, otherwise limited to 4GiB due to GetFileSize, but that's also trivial to fix if you really need that big a file.
Instead of volatile one might want to return or otherwise consume the "result", but either way works (volatile does not truly have a measurable impact on performance, compared to a disk access!).
1Truth being told, you actually can't force Windows, but it incidentially always works that way unless you explicitly request unbuffered I/O.
In theory, you could force the OS to read pages into memory and even force it to keep them in RAM by locking the memory, but your working set quota (wich is very small, and you need aministrative rights to modify it) will not normally let you do this. That's a good thing though, since locking large amounts of memory is a very bad idea.

Resources