I wanted to try seeing the limits on how "far" i can point to in 64 bit C program by trying to map very far addresses, as close to 64 bit as possible into valid memory by using VirtualAlloc.
I managed to get to 0x6ffffffffff which is a 42 bit address, but any number above that results in failure to allocate, with error code 0x57(The parameter is incorrect).
This is my code:
#include <Windows.h>
#include <stdio.h>
int main(int argc, char **argv)
{
LPVOID mem;
WCHAR message[] = L"Hello, World!";
mem = VirtualAlloc (
(LPVOID)0x00006ffffffffff,
4096,
MEM_RESERVE | MEM_COMMIT,
PAGE_READWRITE
);
if (mem == 0) {
wprintf(L"%x.\n", GetLastError());
system("pause");
exit(-1);
}
memcpy(mem, message, sizeof(message));
wprintf(L"[%llx] -> %ls\n", mem, mem);
system("pause");
return 0;
}
Why can't I VirtualAlloc above 0x6ffffffffff?
The explanation is that the address you request is outside of the available range. Although there is a theoretical 64 bit range of available addresses, in practise not all of that range is available to a user mode application. You can find the allowable range by calling GetSystemInfo and inspecting the values of lpMinimumApplicationAddress and lpMaximumApplicationAddress. If you attempt to reserve an address outside of this range, the call to VirtualAlloc will fail and the error code is set to ERROR_INVALID_PARAMETER.
Note that these minimum and maximum values are not precise. You will start observing ERROR_INVALID_PARAMETER when you get close to the limits.
Related
If I'm not wrong, A handle is an index inside a table maintained on per process basis.
For 64bit Windows, Each entry in this table is made up of 8 byte address to the kernel object + 4 byte of access mask making the entry 12 byte long. However as I understood, for alignment purpose each entry made 16 byte long.
But when you you look at handle opened by a process using process explorer, Value of handle are in multiple of 4. Shouldn't this be in multiple of 16 instead?
A Windows handle is just an index per se, it could be a multiple of 1 in principle. It has been probably more efficient to implent a word (16 bit value) alignment than the byte alignment you're implying.
The lowest two bits of a kernel handle are called "tag bits" and are available for application use. This has nothing to do with the size of an entry in the handle table.
The comment in ntdef.h (in Include\10.0.x.x\shared) says:
//
// Low order two bits of a handle are ignored by the system and available
// for use by application code as tag bits. The remaining bits are opaque
// and used to store a serial number and table index.
//
#define OBJ_HANDLE_TAGBITS 0x00000003L
My guess is that it's a similar misuse like using the most significant bit of 32 bit pointers as a boolean flag, which is why we have LAA (large address aware) and non-LAA applications.
You could (but should not) add 1, 2 or 3 to a HANDLE and it should not affect other Windows API methods. E.g. WaitForSingleObject():
#include <iostream>
#include <windows.h>
int main()
{
STARTUPINFO si;
PROCESS_INFORMATION pi;
ZeroMemory(&si, sizeof(si));
si.cb = sizeof(si);
ZeroMemory(&pi, sizeof(pi));
auto created = CreateProcess(L"C:\\Windows\\System32\\cmd.exe",
nullptr, nullptr, nullptr, FALSE, 0, nullptr, nullptr, &si, &pi
);
if (created)
{
pi.hProcess = static_cast<byte*>(pi.hProcess) + 3;
const auto result = WaitForSingleObject(pi.hProcess, INFINITE);
if (result == 0)
std::cout << "Completed!\n";
else
std::cout << "Failed!\n" << result << "\n";
CloseHandle(pi.hProcess);
CloseHandle(pi.hThread);
}
else
std::cout << "Not created";
}
We have large old legacy server code running as a 64bit windows service.
The service has a memory leak which, at the moment, we do not have the resources to fix.
As the service is resilient to restart, a temporary terrible 'solution' we want is to detect when the service's memory exceeded, e.g., 5GB, and exit the service (which has auto restart for such cases).
My question is which metric should I go for? Is using GlobalMemoryStatusEx to get
MEMORYSTATUSEX.ullTotalVirtual- MEMORYSTATUSEX.ullAvailVirtual right?
GlobalMemoryStatusEx is wrong. You do not want to fill up the machine memory until 5 GB are left in total.
You need GetProcessMemoryInfo.
BOOL WINAPI GetProcessMemoryInfo(
__in HANDLE Process,
__out PPROCESS_MEMORY_COUNTERS ppsmemCounters,
__in DWORD cb
);
From an example using GetProcessMemoryInfo:
#include <windows.h>
#include <stdio.h>
#include <psapi.h>
// To ensure correct resolution of symbols, add Psapi.lib to TARGETLIBS
// and compile with -DPSAPI_VERSION=1
void PrintMemoryInfo( DWORD processID )
{
HANDLE hProcess;
PROCESS_MEMORY_COUNTERS pmc;
// Print the process identifier.
printf( "\nProcess ID: %u\n", processID );
// Print information about the memory usage of the process.
hProcess = OpenProcess( PROCESS_QUERY_INFORMATION |
PROCESS_VM_READ,
FALSE, processID );
if (NULL == hProcess)
return;
if ( GetProcessMemoryInfo( hProcess, &pmc, sizeof(pmc)) )
{
printf( "\tWorkingSetSize: 0x%08X\n", pmc.WorkingSetSize );
printf( "\tPagefileUsage: 0x%08X\n", pmc.PagefileUsage );
}
CloseHandle( hProcess );
}
int main( void )
{
// Get the list of process identifiers.
DWORD aProcesses[1024], cbNeeded, cProcesses;
unsigned int i;
if ( !EnumProcesses( aProcesses, sizeof(aProcesses), &cbNeeded ) )
{
return 1;
}
// Calculate how many process identifiers were returned.
cProcesses = cbNeeded / sizeof(DWORD);
// Print the memory usage for each process
for ( i = 0; i < cProcesses; i++ )
{
PrintMemoryInfo( aProcesses[i] );
}
return 0;
}
Although unintuitive you need to read PagefileUsage which gets you the committed memory which was allocated by your process. WorkingSetSize is unreliable because if the machine gets tight on memory the OS will write all data to the page file. That can cause WorkingSetSize to be small (e.g. 100 MB) but in reality you leaked already 20 GB of memory. This would result in a saw tooth pattern in memory consumption until the page file is full. Working set is only the actively used memory which might hide the multi GB memory leak if the machine is under memory pressure.
In linux, when a process is running, I can check its current memory usage and historically peak memory usage by looking into /proc/self/status. Are there similar files in mac?
In mac, I found that vmmap pid gives a lot info about memory usage, but it seems peek memory usage of the pid is not monitored. May I ask if anyone could help me with any command?
A program can use the Mach API to get its own memory statistics. For example:
#include <stdio.h>
#include <mach/mach.h>
#include <stdlib.h>
int main(void)
{
kern_return_t ret;
mach_task_basic_info_data_t info;
mach_msg_type_number_t count = MACH_TASK_BASIC_INFO_COUNT;
ret = task_info(mach_task_self(), MACH_TASK_BASIC_INFO, (task_info_t)&info, &count);
if (ret != KERN_SUCCESS || count != MACH_TASK_BASIC_INFO_COUNT)
{
fprintf(stderr, "task_info failed: %d\n", ret);
exit(EXIT_FAILURE);
}
printf("resident size max: %llu (0x%08llx) bytes\n",
(unsigned long long)info.resident_size_max,
(unsigned long long)info.resident_size_max);
return 0;
}
Alternatively, you can run your program under Instruments, with the Allocations template, to observe its memory usage. (Xcode itself also has memory gauges, but I don't recall off-hand if it shows peak usage.)
I want to be able to search through the allocated memory of a process (say you open notepad and type “HelloWorld” then ran the search looking for the string “HelloWorld”). For 32bit applications this is not a problem but for 64 bit applications the large quantity of allocated virtual memory takes hours to search through.
Obviously the vast majority of applications are not utilising the full amount of virtual memory allocated. I can identify the areas in memory allocated to each process with VirtualQueryEX and read them with ReadProcessMemory but when it comes to 64 bit applications this still takes hours to complete.
Does anyone know of any resources or any methods that could be used to help narrow down the amount of memory to be searched?
It is important that you only scan proper memory. If you just scanned from 0x0 to 0xFFFFFFFFF it would take at least 5 seconds in most processes. You can skip bad regions of memory by checking the memory page settings by using VirtualQueryEx. This will retrieve a MEMORY_BASIC_INFORMATION which will define the state of that memory region.
If the MemoryBasicInformation.state is not MEM_COMMIT then it is bad memory
If the MBI.Protect is PAGE_NOACCESS you also want to skip this memory.
If VirtualQuery fails then you skip to the next region.
In this manner it should only take 0-2 seconds to scan the memory on your average process because it is only scanning good memory.
char* ScanEx(char* pattern, char* mask, char* begin, intptr_t size, HANDLE hProc)
{
char* match{ nullptr };
SIZE_T bytesRead;
DWORD oldprotect;
char* buffer{ nullptr };
MEMORY_BASIC_INFORMATION mbi;
mbi.RegionSize = 0x1000;//
VirtualQueryEx(hProc, (LPCVOID)begin, &mbi, sizeof(mbi));
for (char* curr = begin; curr < begin + size; curr += mbi.RegionSize)
{
if (!VirtualQueryEx(hProc, curr, &mbi, sizeof(mbi))) continue;
if (mbi.State != MEM_COMMIT || mbi.Protect == PAGE_NOACCESS) continue;
delete[] buffer;
buffer = new char[mbi.RegionSize];
if (VirtualProtectEx(hProc, mbi.BaseAddress, mbi.RegionSize, PAGE_EXECUTE_READWRITE, &oldprotect))
{
ReadProcessMemory(hProc, mbi.BaseAddress, buffer, mbi.RegionSize, &bytesRead);
VirtualProtectEx(hProc, mbi.BaseAddress, mbi.RegionSize, oldprotect, &oldprotect);
char* internalAddr = ScanBasic(pattern, mask, buffer, (intptr_t)bytesRead);
if (internalAddr != nullptr)
{
//calculate from internal to external
match = curr + (internalAddr - buffer);
break;
}
}
}
delete[] buffer;
return match;
}
ScanBasic is just a standard comparison function which compares your pattern against the buffer.
Second, if you know the address is relative to a module, only scan the address range of that module, you can get the size of the module via ToolHelp32Snapshot. If you know it's dynamic memory on the heap, then only scan the heap. You can get all the heaps also with ToolHelp32Snapshot and TH32CS_SNAPHEAPLIST.
You can make a wrapper for this function as well for scanning the entire address space of the process might look something like this
char* Pattern::Ex::ScanProc(char* pattern, char* mask, ProcEx& proc)
{
unsigned long long int kernelMemory = IsWow64Proc(proc.handle) ? 0x80000000 : 0x800000000000;
return Scan(pattern, mask, 0x0, (intptr_t)kernelMemory, proc.handle);
}
I am trying to modify some JNI code to limit the amount of memory that a process can consume. Here is the code that I am using to test setRlimit on linux and osx. In linux it works as expected and the buf is null.
This code sets the limit to 32 MB and then tries to malloc a 64 MB buffer, if buffer is null then setrlimit works.
#include <sys/time.h>
#include <sys/resource.h>
#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/resource.h>
#define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \
} while (0)
int main(int argc) {
pid_t pid = getpid();
struct rlimit current;
struct rlimit *newp;
int memLimit = 32 * 1024 * 1024;
int result = getrlimit(RLIMIT_AS, ¤t);
if (result != 0)
errExit("Unable to get rlimit");
current.rlim_cur = memLimit;
current.rlim_max = memLimit;
result = setrlimit(RLIMIT_AS, ¤t);
if (result != 0)
errExit("Unable to setrlimit");
printf("Doing malloc \n");
int memSize = 64 * 1024 * 1024;
char *buf = malloc(memSize);
if (buf == NULL) {
printf("Your out of memory\n");
} else {
printf("Malloc successsful\n");
}
free(buf);
}
On linux machine this is my result
memtest]$ ./m200k
Doing malloc
Your out of memory
On osx 10.8
./m200k
Doing malloc
Malloc successsful
My question is that if this does not work on osx is there a way to acomplish this task in darwin kernel. The man pages all seem to say it will work but it does not appear to do so. I have seen that launchctl has some support for limiting memory but my goal is to add this ability in code. I tried using ulimit also but this did not work either and am pretty sure ulimit uses setrlimit to set limits. Also is there a signal I can catch when setrlimit soft or hardlimit is exceeded. I haven't been able to find one.
Bonus points if it can be accomplished in windows also.
Thanks for any advice
Update
As pointed out the RLIMIT_AS is explicitly defined in the man page but is defined as the RLIMIT_RSS, so if referring to the documentation RLIMIT_RSS and RLIMIT_AS are interchangable on OSX.
/usr/include/sys/resource.h on osx 10.8
#define RLIMIT_RSS RLIMIT_AS /* source compatibility alias */
Tested trojanfoe's excellent suggestion to use RLIMIT_DATA which is described here
The RLIMIT_DATA limit specifies the maximum amount of bytes the process
data segment can occupy. The data segment for a process is the area in which
dynamic memory is located (that is, memory allocated by malloc() in C, or in C++,
with new()). If this limit is exceeded, calls to allocate new memory will fail.
The result was the same for linux and osx and that was the malloc was successful for both.
chinshaw#osx$ ./m200k
Doing malloc
Malloc successsful
chinshaw#redhat ./m200k
Doing malloc
Malloc successsful