Limiting memory usage for a single process in OSX /Darwin - macos

I am trying to modify some JNI code to limit the amount of memory that a process can consume. Here is the code that I am using to test setRlimit on linux and osx. In linux it works as expected and the buf is null.
This code sets the limit to 32 MB and then tries to malloc a 64 MB buffer, if buffer is null then setrlimit works.
#include <sys/time.h>
#include <sys/resource.h>
#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/resource.h>
#define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \
} while (0)
int main(int argc) {
pid_t pid = getpid();
struct rlimit current;
struct rlimit *newp;
int memLimit = 32 * 1024 * 1024;
int result = getrlimit(RLIMIT_AS, &current);
if (result != 0)
errExit("Unable to get rlimit");
current.rlim_cur = memLimit;
current.rlim_max = memLimit;
result = setrlimit(RLIMIT_AS, &current);
if (result != 0)
errExit("Unable to setrlimit");
printf("Doing malloc \n");
int memSize = 64 * 1024 * 1024;
char *buf = malloc(memSize);
if (buf == NULL) {
printf("Your out of memory\n");
} else {
printf("Malloc successsful\n");
}
free(buf);
}
On linux machine this is my result
memtest]$ ./m200k
Doing malloc
Your out of memory
On osx 10.8
./m200k
Doing malloc
Malloc successsful
My question is that if this does not work on osx is there a way to acomplish this task in darwin kernel. The man pages all seem to say it will work but it does not appear to do so. I have seen that launchctl has some support for limiting memory but my goal is to add this ability in code. I tried using ulimit also but this did not work either and am pretty sure ulimit uses setrlimit to set limits. Also is there a signal I can catch when setrlimit soft or hardlimit is exceeded. I haven't been able to find one.
Bonus points if it can be accomplished in windows also.
Thanks for any advice
Update
As pointed out the RLIMIT_AS is explicitly defined in the man page but is defined as the RLIMIT_RSS, so if referring to the documentation RLIMIT_RSS and RLIMIT_AS are interchangable on OSX.
/usr/include/sys/resource.h on osx 10.8
#define RLIMIT_RSS RLIMIT_AS /* source compatibility alias */
Tested trojanfoe's excellent suggestion to use RLIMIT_DATA which is described here
The RLIMIT_DATA limit specifies the maximum amount of bytes the process
data segment can occupy. The data segment for a process is the area in which
dynamic memory is located (that is, memory allocated by malloc() in C, or in C++,
with new()). If this limit is exceeded, calls to allocate new memory will fail.
The result was the same for linux and osx and that was the malloc was successful for both.
chinshaw#osx$ ./m200k
Doing malloc
Malloc successsful
chinshaw#redhat ./m200k
Doing malloc
Malloc successsful

Related

How I can get peak memory of a process on Mac OS?

In linux, when a process is running, I can check its current memory usage and historically peak memory usage by looking into /proc/self/status. Are there similar files in mac?
In mac, I found that vmmap pid gives a lot info about memory usage, but it seems peek memory usage of the pid is not monitored. May I ask if anyone could help me with any command?
A program can use the Mach API to get its own memory statistics. For example:
#include <stdio.h>
#include <mach/mach.h>
#include <stdlib.h>
int main(void)
{
kern_return_t ret;
mach_task_basic_info_data_t info;
mach_msg_type_number_t count = MACH_TASK_BASIC_INFO_COUNT;
ret = task_info(mach_task_self(), MACH_TASK_BASIC_INFO, (task_info_t)&info, &count);
if (ret != KERN_SUCCESS || count != MACH_TASK_BASIC_INFO_COUNT)
{
fprintf(stderr, "task_info failed: %d\n", ret);
exit(EXIT_FAILURE);
}
printf("resident size max: %llu (0x%08llx) bytes\n",
(unsigned long long)info.resident_size_max,
(unsigned long long)info.resident_size_max);
return 0;
}
Alternatively, you can run your program under Instruments, with the Allocations template, to observe its memory usage. (Xcode itself also has memory gauges, but I don't recall off-hand if it shows peak usage.)

How to apply Unified Memory to existing aligned host memory

I'm involved in effort integrating CUDA into some existing software. The software I'm integrating into is pseudo real-time, so it has a memory manager library that manually passes pointers from a single large memory allocation that is allocated up front. CUDA's Unified Memory is attractive to us, since in theory we'd theoretically be able to change this large memory chunk to Unified Memory, have the existing CPU code still work, and allow us to add GPU kernels with very little changes to the existing data I/O stream.
Parts of our existing CPU processing code requires memory to be aligned to certain alignment. cudaMallocManaged() does not allow me to specify the alignment for memory, and I feel like having to copy between "managed" and strict CPU buffers for these CPU sections almost defeats the purpose of UM. Is there a known way to address this issue that I'm missing?
I found this link on Stack Overflow that seems to solve it in theory, but I've been unable to produce good results with this method. Using CUDA 9.1, Tesla M40 (24GB):
#include <stdio.h>
#include <malloc.h>
#include <cuda.h>
#define USE_HOST_REGISTER 1
int main (int argc, char **argv)
{
int num_float = 10;
int num_bytes = num_float * sizeof(float);
float *f_data = NULL;
#if (USE_HOST_REGISTER > 0)
printf(
"%s: Using memalign + cudaHostRegister..\n",
argv[0]);
f_data = (float *) memalign(32, num_bytes);
cudaHostRegister(
(void *) f_data,
num_bytes,
cudaHostRegisterDefault);
#else
printf(
"%s: Using cudaMallocManaged..\n",
argv[0]);
cudaMallocManaged(
(void **) &f_data,
num_bytes);
#endif
struct cudaPointerAttributes att;
cudaPointerGetAttributes(
&att,
f_data);
printf(
"%s: ptr is managed: %i\n",
argv[0],
att.isManaged);
fflush(stdout);
return 0;
}
When using memalign() + cudaHostRegister() (USE_HOST_REGISTER == 1), the last print statement prints 0. Device accesses via kernel launches in larger files unsurprisingly report illegal accesses.
When using cudaMallocManaged() (USE_HOST_REGISTER == 0), the last print statement prints 1 as expected.
edit: cudaHostRegister() and cudaMallocManaged() do return successful error codes for me. Left this error-checking out in my sample I shared, but I did check them during my initial integration work. Just added the code to check, and both still return CUDA_SUCCESS.
Thanks for your insights and suggestions.
There is no method currently available in CUDA to take an existing host memory allocation and convert it into a managed memory allocation.

why does VirtualAlloc fail for lpAddress greater than 0x6ffffffffff

I wanted to try seeing the limits on how "far" i can point to in 64 bit C program by trying to map very far addresses, as close to 64 bit as possible into valid memory by using VirtualAlloc.
I managed to get to 0x6ffffffffff which is a 42 bit address, but any number above that results in failure to allocate, with error code 0x57(The parameter is incorrect).
This is my code:
#include <Windows.h>
#include <stdio.h>
int main(int argc, char **argv)
{
LPVOID mem;
WCHAR message[] = L"Hello, World!";
mem = VirtualAlloc (
(LPVOID)0x00006ffffffffff,
4096,
MEM_RESERVE | MEM_COMMIT,
PAGE_READWRITE
);
if (mem == 0) {
wprintf(L"%x.\n", GetLastError());
system("pause");
exit(-1);
}
memcpy(mem, message, sizeof(message));
wprintf(L"[%llx] -> %ls\n", mem, mem);
system("pause");
return 0;
}
Why can't I VirtualAlloc above 0x6ffffffffff?
The explanation is that the address you request is outside of the available range. Although there is a theoretical 64 bit range of available addresses, in practise not all of that range is available to a user mode application. You can find the allowable range by calling GetSystemInfo and inspecting the values of lpMinimumApplicationAddress and lpMaximumApplicationAddress. If you attempt to reserve an address outside of this range, the call to VirtualAlloc will fail and the error code is set to ERROR_INVALID_PARAMETER.
Note that these minimum and maximum values are not precise. You will start observing ERROR_INVALID_PARAMETER when you get close to the limits.

How do I tell the MS CRT to use a Low Fragmentation Heap on Windows XP?

It appears that the MS CRT (msvcr80.dll from VS2005 in my case) uses a heap different from the standard process heap returned by GetProcessHeap().
On Windows XP, the default for a heap created with HeapCreate is non-low-fragmentation. How do I tell the CRT to use a Low Fragmentation Heap instead?
See the example here: _get_heap_handle :
intptr_t _get_heap_handle( void );
Returns the handle to the Win32 heap used by the C run-time system.
Use this function if you want to call HeapSetInformation and enable
the Low Fragmentation Heap on the CRT heap.
// crt_get_heap_handle.cpp
// compile with: /MT
#include <windows.h>
#include <malloc.h>
#include <stdio.h>
int main(void)
{
intptr_t hCrtHeap = _get_heap_handle();
ULONG ulEnableLFH = 2;
if (HeapSetInformation((PVOID)hCrtHeap,
HeapCompatibilityInformation,
&ulEnableLFH, sizeof(ulEnableLFH)))
puts("Enabling Low Fragmentation Heap succeeded");
else
puts("Enabling Low Fragmentation Heap failed");
return 0;
}

Problems With 64bit Posix Write In Mac OS X? (2gb+ Dataset in HDF5)

I'm having some issues with HDF5 on Mac os x (10.7). After some testing, I've confirmed that POSIX write seems to have issues with buffer sizes exceeding 2gb. I've written a test program to demonstrate the issue:
#define _FILE_OFFSET_BITS 64
#include <iostream>
#include <unistd.h>
#include <fcntl.h>
void writePosix(const int64_t arraySize, const char* name) {
int fd = open(name, O_WRONLY | O_CREAT);
if (fd != -1) {
double *array = new double [arraySize];
double start = 0.0;
for (int64_t i=0;i<arraySize;++i) {
array[i] = start;
start += 0.001;
}
ssize_t result = write(fd, array, (int64_t)(sizeof(double))*arraySize);
printf("results for array size %lld = %ld\n", arraySize, result);
close(fd);
} else {
printf("file error");
}
}
int main(int argc, char *argv[]) {
writePosix(268435455, "/Users/tpav/testfolder/lessthan2gb");
writePosix(268435456, "/Users/tpav/testfolder/equal2gb");
}
Output:
results for array size 268435455 = 2147483640
results for array size 268435456 = -1
As you can see, I've even tried defining the file offsets. Is there anything I can do about this or should I start looking for a workaround in the way I write 2gb+ chunks?
In the HDF5 virtual file drivers, we break I/O operations that are too large for the call into multiple smaller I/O calls. The Mac implementation of POSIX I/O takes a size_t argument so our code assumed that the max I/O size would be the max value that can fit in a variable of type ssize_t (the return type of read/write). Sadly, this is not the case.
Note that this only applies to single I/O operations. You can create files that go above the 2GB/4GB barrier, you just can't write >2GB in a single call.
This should be fixed in HDF5 1.8.10 patch 1, due out in late January 2013.

Resources