Windows memory metric to detect memory leak - winapi

We have large old legacy server code running as a 64bit windows service.
The service has a memory leak which, at the moment, we do not have the resources to fix.
As the service is resilient to restart, a temporary terrible 'solution' we want is to detect when the service's memory exceeded, e.g., 5GB, and exit the service (which has auto restart for such cases).
My question is which metric should I go for? Is using GlobalMemoryStatusEx to get
MEMORYSTATUSEX.ullTotalVirtual- MEMORYSTATUSEX.ullAvailVirtual right?

GlobalMemoryStatusEx is wrong. You do not want to fill up the machine memory until 5 GB are left in total.
You need GetProcessMemoryInfo.
BOOL WINAPI GetProcessMemoryInfo(
__in HANDLE Process,
__out PPROCESS_MEMORY_COUNTERS ppsmemCounters,
__in DWORD cb
);
From an example using GetProcessMemoryInfo:
#include <windows.h>
#include <stdio.h>
#include <psapi.h>
// To ensure correct resolution of symbols, add Psapi.lib to TARGETLIBS
// and compile with -DPSAPI_VERSION=1
void PrintMemoryInfo( DWORD processID )
{
HANDLE hProcess;
PROCESS_MEMORY_COUNTERS pmc;
// Print the process identifier.
printf( "\nProcess ID: %u\n", processID );
// Print information about the memory usage of the process.
hProcess = OpenProcess( PROCESS_QUERY_INFORMATION |
PROCESS_VM_READ,
FALSE, processID );
if (NULL == hProcess)
return;
if ( GetProcessMemoryInfo( hProcess, &pmc, sizeof(pmc)) )
{
printf( "\tWorkingSetSize: 0x%08X\n", pmc.WorkingSetSize );
printf( "\tPagefileUsage: 0x%08X\n", pmc.PagefileUsage );
}
CloseHandle( hProcess );
}
int main( void )
{
// Get the list of process identifiers.
DWORD aProcesses[1024], cbNeeded, cProcesses;
unsigned int i;
if ( !EnumProcesses( aProcesses, sizeof(aProcesses), &cbNeeded ) )
{
return 1;
}
// Calculate how many process identifiers were returned.
cProcesses = cbNeeded / sizeof(DWORD);
// Print the memory usage for each process
for ( i = 0; i < cProcesses; i++ )
{
PrintMemoryInfo( aProcesses[i] );
}
return 0;
}
Although unintuitive you need to read PagefileUsage which gets you the committed memory which was allocated by your process. WorkingSetSize is unreliable because if the machine gets tight on memory the OS will write all data to the page file. That can cause WorkingSetSize to be small (e.g. 100 MB) but in reality you leaked already 20 GB of memory. This would result in a saw tooth pattern in memory consumption until the page file is full. Working set is only the actively used memory which might hide the multi GB memory leak if the machine is under memory pressure.

Related

How to identify what parts of the allocated virtual memory a process is using

I want to be able to search through the allocated memory of a process (say you open notepad and type “HelloWorld” then ran the search looking for the string “HelloWorld”). For 32bit applications this is not a problem but for 64 bit applications the large quantity of allocated virtual memory takes hours to search through.
Obviously the vast majority of applications are not utilising the full amount of virtual memory allocated. I can identify the areas in memory allocated to each process with VirtualQueryEX and read them with ReadProcessMemory but when it comes to 64 bit applications this still takes hours to complete.
Does anyone know of any resources or any methods that could be used to help narrow down the amount of memory to be searched?
It is important that you only scan proper memory. If you just scanned from 0x0 to 0xFFFFFFFFF it would take at least 5 seconds in most processes. You can skip bad regions of memory by checking the memory page settings by using VirtualQueryEx. This will retrieve a MEMORY_BASIC_INFORMATION which will define the state of that memory region.
If the MemoryBasicInformation.state is not MEM_COMMIT then it is bad memory
If the MBI.Protect is PAGE_NOACCESS you also want to skip this memory.
If VirtualQuery fails then you skip to the next region.
In this manner it should only take 0-2 seconds to scan the memory on your average process because it is only scanning good memory.
char* ScanEx(char* pattern, char* mask, char* begin, intptr_t size, HANDLE hProc)
{
char* match{ nullptr };
SIZE_T bytesRead;
DWORD oldprotect;
char* buffer{ nullptr };
MEMORY_BASIC_INFORMATION mbi;
mbi.RegionSize = 0x1000;//
VirtualQueryEx(hProc, (LPCVOID)begin, &mbi, sizeof(mbi));
for (char* curr = begin; curr < begin + size; curr += mbi.RegionSize)
{
if (!VirtualQueryEx(hProc, curr, &mbi, sizeof(mbi))) continue;
if (mbi.State != MEM_COMMIT || mbi.Protect == PAGE_NOACCESS) continue;
delete[] buffer;
buffer = new char[mbi.RegionSize];
if (VirtualProtectEx(hProc, mbi.BaseAddress, mbi.RegionSize, PAGE_EXECUTE_READWRITE, &oldprotect))
{
ReadProcessMemory(hProc, mbi.BaseAddress, buffer, mbi.RegionSize, &bytesRead);
VirtualProtectEx(hProc, mbi.BaseAddress, mbi.RegionSize, oldprotect, &oldprotect);
char* internalAddr = ScanBasic(pattern, mask, buffer, (intptr_t)bytesRead);
if (internalAddr != nullptr)
{
//calculate from internal to external
match = curr + (internalAddr - buffer);
break;
}
}
}
delete[] buffer;
return match;
}
ScanBasic is just a standard comparison function which compares your pattern against the buffer.
Second, if you know the address is relative to a module, only scan the address range of that module, you can get the size of the module via ToolHelp32Snapshot. If you know it's dynamic memory on the heap, then only scan the heap. You can get all the heaps also with ToolHelp32Snapshot and TH32CS_SNAPHEAPLIST.
You can make a wrapper for this function as well for scanning the entire address space of the process might look something like this
char* Pattern::Ex::ScanProc(char* pattern, char* mask, ProcEx& proc)
{
unsigned long long int kernelMemory = IsWow64Proc(proc.handle) ? 0x80000000 : 0x800000000000;
return Scan(pattern, mask, 0x0, (intptr_t)kernelMemory, proc.handle);
}

why does VirtualAlloc fail for lpAddress greater than 0x6ffffffffff

I wanted to try seeing the limits on how "far" i can point to in 64 bit C program by trying to map very far addresses, as close to 64 bit as possible into valid memory by using VirtualAlloc.
I managed to get to 0x6ffffffffff which is a 42 bit address, but any number above that results in failure to allocate, with error code 0x57(The parameter is incorrect).
This is my code:
#include <Windows.h>
#include <stdio.h>
int main(int argc, char **argv)
{
LPVOID mem;
WCHAR message[] = L"Hello, World!";
mem = VirtualAlloc (
(LPVOID)0x00006ffffffffff,
4096,
MEM_RESERVE | MEM_COMMIT,
PAGE_READWRITE
);
if (mem == 0) {
wprintf(L"%x.\n", GetLastError());
system("pause");
exit(-1);
}
memcpy(mem, message, sizeof(message));
wprintf(L"[%llx] -> %ls\n", mem, mem);
system("pause");
return 0;
}
Why can't I VirtualAlloc above 0x6ffffffffff?
The explanation is that the address you request is outside of the available range. Although there is a theoretical 64 bit range of available addresses, in practise not all of that range is available to a user mode application. You can find the allowable range by calling GetSystemInfo and inspecting the values of lpMinimumApplicationAddress and lpMaximumApplicationAddress. If you attempt to reserve an address outside of this range, the call to VirtualAlloc will fail and the error code is set to ERROR_INVALID_PARAMETER.
Note that these minimum and maximum values are not precise. You will start observing ERROR_INVALID_PARAMETER when you get close to the limits.

Is there a way to force windows to cache a file?

Is there like batch command or something that will force windows to cache that file? I am trying to create a game preloader that loads certain game files into cache before starting the game. Is there any way I can do this?
updated int main code:
int main(int argc, const char** argv)
{
if(argc >= 2) for(int i = 1; argv[i]; ++i) pf("C:\\Games\World_of_Tanks\res\packages\gui.pkg"[i]);
return 0;
}
All you need to do is load the files, either using ReadFile or by memory mapping the files and touching every page (in fact, due to allocation granularity every 16th page suffices, but in theory you should be touching every page).
Memory mapping is faster and more cache-friendly, since you do not need to allocate extra memory to hold the data (which you aren't going to use for anything useful!). The OS will reuse the same physical memory for the cache and for the virtual memory that your process can see.
Several mainstream applications, including Microsoft Office and Adobe Reader do exactly that to launch faster. It's those "delayed start" services that keep your harddisk light flashing for a dozen seconds after you log in.
Do note, however, that while you can force Windows1 to cache files that way, but you cannot force it to keep the files in the cache indefinitively. If there is not enough physical RAM available, the system will throw away cache contents in order to satisfy application demands.
EDIT: Minimum working example implementation using filemapping:
#include <windows.h>
#include <cstdio>
void pf(const char* name)
{
HANDLE file = CreateFile(name, GENERIC_READ, FILE_SHARE_READ | FILE_SHARE_WRITE, 0, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, 0);
if(file == INVALID_HANDLE_VALUE) { printf("couldn't open %s\n", name); return; };
unsigned int len = GetFileSize(file, 0);
HANDLE mapping = CreateFileMapping(file, 0, PAGE_READONLY, 0, 0, 0);
if(mapping == 0) { printf("couldn't map %s\n", name); return; }
const char* data = (const char*) MapViewOfFile(mapping, FILE_MAP_READ, 0, 0, 0);
if(data)
{
printf("prefetching %s... ", name);
// need volatile or need to use result - compiler will otherwise optimize out whole loop
volatile unsigned int touch = 0;
for(unsigned int i = 0; i < len; i += 4096)
touch += data[i];
}
else
printf("couldn't create view of %s\n", name);
UnmapViewOfFile(data);
CloseHandle(mapping);
CloseHandle(file);
}
int main(int argc, const char** argv)
{
if(argc >= 2) for(int i = 1; argv[i]; ++i) pf(argv[i]);
return 0;
}
The program will try to prefetch any filename given on the commandline.
The code isn't overly pretty but it works. It uses ANSI filenames, and leaks a file handle in case opening succeeds but mapping fails (but bleh... it's not really a problem, the OS will clean up after the program exits -- if that annoys you, wrap the handles in RAII). It's also limited to ca. 1.8GiB file size due to address space in a 32-bit build, otherwise limited to 4GiB due to GetFileSize, but that's also trivial to fix if you really need that big a file.
Instead of volatile one might want to return or otherwise consume the "result", but either way works (volatile does not truly have a measurable impact on performance, compared to a disk access!).
1Truth being told, you actually can't force Windows, but it incidentially always works that way unless you explicitly request unbuffered I/O.
In theory, you could force the OS to read pages into memory and even force it to keep them in RAM by locking the memory, but your working set quota (wich is very small, and you need aministrative rights to modify it) will not normally let you do this. That's a good thing though, since locking large amounts of memory is a very bad idea.

Need to produce a stable 10mSec interrupt

I have an application that I need to run at a 10mSec rate (100hz) on a Windows 7/32 bit computer (that will also be running other applications at the same time). This interrupt can have some minimally late (100uSec) responses, but must not drift over a prolonged time. I have a program where I have loaded and used the NtSetTimerResolution to set the timers to 10msec resolution, and then created a timer using the CreateTimerQueue/CreateTimereQueueTimer functions with a callback routine that toggles a GPIO pin (for the time being) - this produces the expected square wave, so long as I am not doing anything else with the system. When I start a couple of other processes, the accuracy of my square wave goes out the window. Is there any way to get a higher priority level on the timer interrupt (or is there another timer that I can use) that will produce a more stable output (perhaps the SMI)? My code is below, and is built using the x86 checked build environment of the Windows DDK, and run from a command shell with administrator rights:
/*
Abstract:
Simple console test app for a 10mSec timer interrupt service
Enviroment:
Administrator Mode
*/
/* INCLUDES */
#include <windows.h>
#include <winioctl.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <conio.h>
#include <strsafe.h>
#include <stdlib.h>
#include <stdio.h>
#include <winsock2.h>
#include <mswsock.h>
#pragma warning(disable:4127) // condition expression is constant
FARPROC pNtQueryTimerResolution;
FARPROC pNtSetTimerResolution;
static HANDLE NTDLLModuleHandle;
static HINSTANCE hInpOutDll;
typedef void ( __stdcall *lpOut32 )( short , short );
typedef short ( __stdcall *lpInp32 )( short );
typedef BOOL ( __stdcall *lpIsInpOutDriverOpen )( void );
//Some global function pointers (messy but fine for an example)
lpOut32 gfpOut32;
lpInp32 gfpInp32;
lpIsInpOutDriverOpen gfpIsInpOutDriverOpen;
void CALLBACK TimerProc(void* lpParameter,
BOOLEAN TimerOrWaitFired);
// MAIN
VOID __cdecl main( void )
{
ULONG ulMinRes = 0;
ULONG ulMaxRes = 0;
ULONG ulCurRes = 0;
HANDLE phNewQueue;
HANDLE phNewTimer;
phNewQueue = CreateTimerQueue( );
NTDLLModuleHandle = LoadLibrary( "NTDLL.DLL" );
if( NULL == NTDLLModuleHandle )
{
return;
}
// Get the function pointers,
pNtQueryTimerResolution = GetProcAddress( NTDLLModuleHandle, "NtQueryTimerResolution" );
pNtSetTimerResolution = GetProcAddress( NTDLLModuleHandle, "NtSetTimerResolution" );
if( ( pNtQueryTimerResolution == NULL ) || ( pNtSetTimerResolution == NULL ) )
{
printf( "unable to link to ddl\n\n\n\n\n\n" );
return;
}
pNtQueryTimerResolution( &ulMinRes, &ulMaxRes, &ulCurRes );
printf( "MMR: %d %d %d\n", ulMinRes, ulMaxRes, ulCurRes );
ulMaxRes = 100000;
pNtSetTimerResolution( ulMaxRes, TRUE, &ulCurRes );
pNtQueryTimerResolution( &ulMinRes, &ulMaxRes, &ulCurRes );
printf( "MMR: %d %d %d\n", ulMinRes, ulMaxRes, ulCurRes );
//Dynamically load the DLL at runtime (not linked at compile time)
hInpOutDll = LoadLibrary( "InpOut32.DLL" );
if( hInpOutDll != NULL )
{
gfpOut32 = ( lpOut32 )GetProcAddress( hInpOutDll, "Out32" );
gfpInp32 = ( lpInp32 )GetProcAddress( hInpOutDll, "Inp32" );
gfpIsInpOutDriverOpen
= ( lpIsInpOutDriverOpen )GetProcAddress( hInpOutDll, "IsInpOutDriverOpen" );
if( gfpIsInpOutDriverOpen( ) )
{
gfpOut32( 0xA01, 0x00 );
}
else
{
printf( "unable to create timer system\n\n\n\n\n\n" );
return;
}
}
CreateTimerQueueTimer( &phNewTimer, phNewQueue, TimerProc, NULL, 0, 10,
WT_EXECUTEINTIMERTHREAD );
do
{
Sleep( 1 );
} while( TRUE );
}
void CALLBACK TimerProc(void* lpParameter,
BOOLEAN TimerOrWaitFired)
{
WORD wData;
UNREFERENCED_PARAMETER ( lpParameter );
UNREFERENCED_PARAMETER ( TimerOrWaitFired );
wData = gfpInp32( 0xA00 );
wData++;
gfpOut32( 0xA00, wData );
}
You can use SetThreadPriority to give priority to the critical thread. In this case, you'll probably need to create a thread explicitly and use CreateWaitableTimerEx, SetWaitableTimerEx, and WaitForSingleObjectEx instead of CreateTimerQueueTimer. Make sure the critical thread never spends too long executing between waits, or Windows may stop working properly.
This may not be enough, if the maximum lag is 100 microseconds. You might need to set your process priority class to REALTIME_PRIORITY_CLASS using the SetPriorityClass function, but make sure your program never holds the CPU for long or Windows will stop working properly. In particular, if your program hangs, the entire OS will hang; in this situation, there is no way to stop the program short of turning the power off.
Even this may not be enough. Windows is not a real-time operating system, and it may not be possible to get it do what you're asking for.
My experience with Windows and milli second is that it is not reliable.
I measured the Sleep api with an Oscilloscope via the Nusbio device.
And Sleep(0) is different from not calling the method at all.
Sleep(5) and Sleep(15) give inconsistent result sometime some time the wait is the same.
If you want this accuracy you need a micro controller, that can talk to your Windows application.

How does one use VirtualAllocEx do make room for a code cave?

How does one use VirtualAllocEx do make room for a code cave? I am currently in possession of a piece of software with very little "free space" and I read that VirtualAllocEx is used for making this space..
After the question about "code cave" is cleared, you can find interesting following code which enumerate blocks allocated by VirtualAllocEx in the current process and find all PE (DLLs and the EXE itself).
SYSTEM_INFO si;
MEMORY_BASIC_INFORMATION mbi;
DWORD nOffset = 0, cbReturned, dwMem;
GetSystemInfo(&si);
for (dwMem = 0; dwMem<(DWORD)si.lpMaximumApplicationAddress;
dwMem+=mbi.RegionSize) {
cbReturned = VirtualQueryEx (GetCurrentProcess(), (LPCVOID)dwMem, &mbi,
sizeof(mbi));
if (cbReturned) {
if ((mbi.AllocationProtect & PAGE_EXECUTE_WRITECOPY) &&
(mbi.Protect & (PAGE_EXECUTE | PAGE_EXECUTE_READ |
PAGE_EXECUTE_READWRITE | PAGE_EXECUTE_WRITECOPY))) {
if (*(LPWORD)mbi.AllocationBase == IMAGE_DOS_SIGNATURE) {
IMAGE_DOS_HEADER *pDosHeader =
(IMAGE_DOS_HEADER *)mbi.AllocationBase;
if (pDosHeader->e_lfanew) {
IMAGE_NT_HEADERS32 *pNtHeader = (IMAGE_NT_HEADERS32 *)
((PBYTE)pDosHeader + pDosHeader->e_lfanew);
if (pNtHeader->Signature != IMAGE_NT_SIGNATURE)
continue;
// now you can examine of module loaded in current process
}
}
}
}
}
The code could looks like a large loop. In reality it is a typical application it makes about 200 loops, so it is very quickly to goes through all blocks allocated with respect of VirtualAllocEx during loading of EXE all all depended DLLs.
#include <stdio.h>
#include <windows.h>
#include <commctrl.h>
unsigned long pid;
HANDLE process;
GetWindowThreadProcessId(listview, &pid);
process = OpenProcess(PROCESS_VM_OPERATION|PROCESS_VM_READ | PROCESS_VM_WRITE|PROCESS_QUERY_INFORMATION, FALSE, pid);
int *vptr = (int *)VirtualAllocEx(process, NULL, sizeof(int), MEM_COMMIT, PAGE_READWRITE);
References
- MSDN VirtualAllocEx Function
- CodeProject Stealing Program's Memory
- StackOver What is a code cave... ?
HTH,

Resources