SIGKILL while allocating memory in C++ using calloc - memory-management

This question is a follow-up to Why does malloc() or new never return NULL? and SIGKILL while allocating memory in C++:
From the answers there I can understand why a program would be killed when trying to write to memory which was "successfully" allocated by malloc. However, I see the same problem when using calloc (on SLC and Ubuntu):
Instead of returning a null pointer, the program is SIGKILLed, so checking the return value of calloc is futile. But calloc should not be affected by the "overcommit feature"? (Unless it is relying on malloc behind the scenes...)

From proc /proc/sys/vm/overcommit_memory section
The amount of memory presently allocated on the system. The committed memory is a sum of all of the memory which has been allocated by processes, even if it has not been "used" by them as of yet. A process which allocates 1GB of memory (using malloc(3) or similar), but only touches 300MB of that memory will only show up as using 300MB of memory even if it has the address space allocated for the entire 1GB. This 1GB is memory which has been "committed" to by the VM and can be used at any time by the allocating application. With strict overcommit enabled on the system (mode 2 /proc/sys/vm/overcommit_memory), allocations which would exceed the CommitLimit (detailed above) will not be permitted. This is useful if one needs to guarantee that processes will not fail due to lack of memory once that memory has been successfully allocated.
Although only malloc is explicitly listed, it does say similar, calloc (and realloc) are the similar. It has the same issue as malloc on this matter.

Related

windows memory management: Difference between VirtualAlloc() and heap functions [duplicate]

There are lots of method to allocate memory in Windows environment, such as VirtualAlloc, HeapAlloc, malloc, new.
Thus, what's the difference among them?
Each API is for different uses. Each one also requires that you use the correct deallocation/freeing function when you're done with the memory.
VirtualAlloc
A low-level, Windows API that provides lots of options, but is mainly useful for people in fairly specific situations. Can only allocate memory in (edit: not 4KB) larger chunks. There are situations where you need it, but you'll know when you're in one of these situations. One of the most common is if you have to share memory directly with another process. Don't use it for general-purpose memory allocation. Use VirtualFree to deallocate.
HeapAlloc
Allocates whatever size of memory you ask for, not in big chunks than VirtualAlloc. HeapAlloc knows when it needs to call VirtualAlloc and does so for you automatically. Like malloc, but is Windows-only, and provides a couple more options. Suitable for allocating general chunks of memory. Some Windows APIs may require that you use this to allocate memory that you pass to them, or use its companion HeapFree to free memory that they return to you.
malloc
The C way of allocating memory. Prefer this if you are writing in C rather than C++, and you want your code to work on e.g. Unix computers too, or someone specifically says that you need to use it. Doesn't initialise the memory. Suitable for allocating general chunks of memory, like HeapAlloc. A simple API. Use free to deallocate. Visual C++'s malloc calls HeapAlloc.
new
The C++ way of allocating memory. Prefer this if you are writing in C++. It puts an object or objects into the allocated memory, too. Use delete to deallocate (or delete[] for arrays). Visual studio's new calls HeapAlloc, and then maybe initialises the objects, depending on how you call it.
In recent C++ standards (C++11 and above), if you have to manually use delete, you're doing it wrong and should use a smart pointer like unique_ptr instead. From C++14 onwards, the same can be said of new (replaced with functions such as make_unique()).
There are also a couple of other similar functions like SysAllocString that you may be told you have to use in specific circumstances.
It is very important to understand the distinction between memory allocation APIs (in Windows) if you plan on using a language that requires memory management (like C or C++.) And the best way to illustrate it IMHO is with a diagram:
Note that this is a very simplified, Windows-specific view.
The way to understand this diagram is that the higher on the diagram a memory allocation method is, the higher level implementation it uses. But let's start from the bottom.
Kernel-Mode Memory Manager
It provides all memory reservations & allocations for the operating system, as well as support for memory-mapped files, shared memory, copy-on-write operations, etc. It's not directly accessible from the user-mode code, so I'll skip it here.
VirtualAlloc / VirtualFree
These are the lowest level APIs available from the user mode. The VirtualAlloc function basically invokes ZwAllocateVirtualMemory that in turn does a quick syscall to ring0 to relegate further processing to the kernel memory manager. It is also the fastest method to reserve/allocate block of new memory from all available in the user mode.
But it comes with two main conditions:
It only allocates memory blocks aligned on the system granularity boundary.
It only allocates memory blocks of the size that is the multiple of the system granularity.
So what is this system granularity? You can get it by calling GetSystemInfo. It is returned as the dwAllocationGranularity parameter. Its value is implementation (and possibly hardware) specific, but on many 64-bit Windows systems it is set at 0x10000 bytes, or 64K.
So what all this means, is that if you try to allocate, say just an 8 byte memory block with VirtualAlloc:
void* pAddress = VirtualAlloc(NULL, 8, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
If successful, pAddress will be aligned on the 0x10000 byte boundary. And even though you requested only 8 bytes, the actual memory block that you will get will be the entire page (or, something like 4K bytes. The exact page size is returned in the dwPageSize parameter.) But, on top of that, the entire memory block spanning 0x10000 bytes (or 64K in most cases) from pAddress will not be available for any further allocations. So in a sense, by allocating 8 bytes you could as well be asking for 65536.
So the moral of the story here is not to substitute VirtualAlloc for generic memory allocations in your application. It must be used for very specific cases, as is done with the heap below. (Usually for reserving/allocating large blocks of memory.)
Using VirtualAlloc incorrectly can lead to severe memory fragmentation.
HeapCreate / HeapAlloc / HeapFree / HeapDestroy
In a nutshell, the heap functions are basically a wrapper for VirtualAlloc function. Other answers here provide a pretty good concept of it. I'll add that, in a very simplistic view, the way heap works is this:
HeapCreate reserves a large block of virtual memory by calling VirtualAlloc internally (or ZwAllocateVirtualMemory to be specific). It also sets up an internal data structure that can track further smaller size allocations within the reserved block of virtual memory.
Any calls to HeapAlloc and HeapFree do not actually allocate/free any new memory (unless, of course the request exceeds what has been already reserved in HeapCreate) but instead they meter out (or commit) a previously reserved large chunk, by dissecting it into smaller memory blocks that a user requests.
HeapDestroy in turn calls VirtualFree that actually frees the virtual memory.
So all this makes heap functions perfect candidates for generic memory allocations in your application. It is great for arbitrary size memory allocations. But a small price to pay for the convenience of the heap functions is that they introduce a slight overhead over VirtualAlloc when reserving larger blocks of memory.
Another good thing about heap is that you don't really need to create one. It is generally created for you when your process starts. So one can access it by calling GetProcessHeap function.
malloc / free
Is a language-specific wrapper for the heap functions. Unlike HeapAlloc, HeapFree, etc. these functions will work not only if your code is compiled for Windows, but also for other operating systems (such as Linux, etc.)
This is a recommended way to allocate/free memory if you program in C. (Unless, you're coding a specific kernel mode device driver.)
new / delete
Come as a high level (well, for C++) memory management operators. They are specific for the C++ language, and like malloc for C, are also the wrappers for the heap functions. They also have a whole bunch of their own code that deals C++-specific initialization of constructors, deallocation in destructors, raising an exception, etc.
These functions are a recommended way to allocate/free memory and objects if you program in C++.
Lastly, one comment I want to make about what has been said in other responses about using VirtualAlloc to share memory between processes. VirtualAlloc by itself does not allow sharing of its reserved/allocated memory with other processes. For that one needs to use CreateFileMapping API that can create a named virtual memory block that can be shared with other processes. It can also map a file on disk into virtual memory for read/write access. But that is another topic.
VirtualAlloc is a specialized allocation of the OS virtual memory (VM) system. Allocations in the VM system must be made at an allocation granularity which (the allocation granularity) is architecture dependent. Allocation in the VM system is one of the most basic forms of memory allocation. VM allocations can take several forms, memory is not necessarily dedicated or physically backed in RAM (though it can be). VM allocation is typically a special purpose type of allocation, either because of the allocation has to
be very large,
needs to be shared,
must be aligned on a particular value (performance reasons) or
the caller need not use all of this memory at once...
etc...
HeapAlloc is essentially what malloc and new both eventually call. It is designed to be very fast and usable under many different types of scenarios of a general purpose allocation. It is the "Heap" in a classic sense. Heaps are actually setup by a VirtualAlloc, which is what is used to initially reserve allocation space from the OS. After the space is initialized by VirtualAlloc, various tables, lists and other data structures are configured to maintain and control the operation of the HEAP. Some of that operation is in the form of dynamically sizing (growing and shrinking) the heap, adapting the heap to particular usages (frequent allocations of some size), etc..
new and malloc are somewhat the same, malloc is essentially an exact call into HeapAlloc( heap-id-default ); new however, can [additionally] configure the allocated memory for C++ objects. For a given object, C++ will store vtables on the heap for each caller. These vtables are redirects for execution and form part of what gives C++ it's OO characteristics like inheritance, function overloading, etc...
Some other common allocation methods like _alloca() and _malloca() are stack based; FileMappings are really allocated with VirtualAlloc and set with particular bit flags which designate those mappings to be of type FILE.
Most of the time, you should allocate memory in a way which is consistent with the use of that memory ;). new in C++, malloc for C, VirtualAlloc for massive or IPC cases.
*** Note, large memory allocations done by HeapAlloc are actually shipped off to VirtualAlloc after some size (couple hundred k or 16 MB or something I forget, but fairly big :) ).
*** EDIT
I briefly remarked about IPC and VirtualAlloc, there is also something very neat about a related VirtualAlloc which none of the responders to this question have discussed.
VirtualAllocEx is what one process can use to allocate memory in an address space of a different process. Most typically, this is used in combination to get remote execution in the context of another process via CreateRemoteThread (similar to CreateThread, the thread is just run in the other process).
In outline:
VirtualAlloc, HeapAlloc etc. are Windows APIs that allocate memory of various types from the OS directly. VirtualAlloc manages pages in the Windows virtual memory system, while HeapAlloc allocates from a specific OS heap. Frankly, you are unlikely to ever need to use eiither of them.
malloc is a Standard C (and C++) library function that allocates memory to your process. Implementations of malloc will typically use one of the OS APIs to create a pool of memory when your app starts and then allocate from it as you make malloc requests
new is a Standard C++ operator which allocates memory and then calls constructors appropriately on that memory. It may be implemented in terms of malloc or in terms of the OS APIs, in which case it too will typically create a memory pool on application startup.
VirtualAlloc ===> sbrk() under UNIX
HeapAlloc ====> malloc() under UNIX
VirtualAlloc => Allocates straight into virtual memory, you reserve/commit in blocks. This is great for large allocations, for example large arrays.
HeapAlloc / new => allocates the memory on the default heap (or any other heap that you may create). This allocates per object and is great for smaller objects. The default heap is serializable therefore it has guarantee thread allocation (this can cause some issues on high performance scenarios and that's why you can create your own heaps).
malloc => uses the C runtime heap, similar to HeapAlloc but it is common for compatibility scenarios.
In a nutshell, the heap is just a chunk of virtual memory that is governed by a heap manager (rather than raw virtual memory)
The last model on the memory world is memory mapped files, this scenario is great for large chunk of data (like large files). This is used internally when you open an EXE (it does not load the EXE in memory, just creates a memory mapped file).

Linux page allocation

In linux if malloc can't allocate data chunk from preallocated page. it uses mmap() or brk() to assign new pages. I wanted to clarify a few things :
I don't understand the following statment , I thought that when I use mmap() or brk() the kernel maps me a whole page. but the allocator alocates me only what I asked for from that new page? In this case trying to access unallocated space (In the newly mapped page) will not cause a page fault? (I understand that its not recommended )
The libc allocator manages each page: slicing them into smaller blocks, assigning them to processes, freeing them, and so on. For example, if your program uses 4097 bytes total, you need to use two pages, even though in reality the allocator gives you somewhere between 4105 to 4109 bytes
How does the allocator knows the VMA bounders ?(I assume no system call used) because the VMA struct that hold that information can only get accessed from kernel mode?
The system-level memory allocation (via mmap and brk) is all page-aligned and page-sized. This means if you use malloc (or other libc API that allocates memory) some small amount of memory (say 10 bytes), you are guaranteed that all the other bytes on that page of memory are readable without triggering a page fault.
Malloc and family do their own bookkeeping within the pages returned from the OS, so the mmap'd pages used by libc also contain a bunch of malloc metadata, in addition to whatever space you've allocated.
The libc allocator knows where everything is because it invokes the brk() and mmap() calls. If it invokes mmap() it passes in a size, and the kernel returns a start address. Then the libc allocator just stores those values in its metadata.
Doug Lea's malloc implementation is a very, very well documented memory allocator implementation and its comments will shed a lot of light on how allocators work in general:
http://g.oswego.edu/dl/html/malloc.html

Impact of memory leak on other process

I have a query related to memory leak.
A 32 bit Linux based system is running multiple active processes A,B,C,D. All the processes are allocating/deallocating memory from the heap. Now if process A is contionusly leaking a significant amount of memory, could it happen that after a certain amount of time process B cant find any memory to allocate from the heap?
As per my understanding, each process is provided with a unque VM of 2GB from the OS. But there is a mappig between the VM and the physical memory.
Yes, if the total amount of VM (RAM + swap space) is exhausted by process A, then malloc in any of the other processes might fail because of that. Linux hides processes' memory spaces from other processes, but it doesn't magically create extra memory in your machine. (Although it may seem to do so due to its overcommit behavior.)
In addition, Linux may employ its OOM killer when memory is running low.
Linux kernel does memory overcommit by default.
When a process request a memory segment with malloc() the memory is not automatically allocated.
You may have 4 processes malloc()ing 2gb each and not having any problem.
The problem arise when the process make use (initialize, bzero, copy) of the malloc()ed memory.
You may even malloc more memory than the system may reserve for you, without any problem, and malloc() doesn't even return NULL!!

memory allocation vs. swapping (under Windows)

sorry for my rather general question, but I could not find a definite answer to it:
Given that I have free swap memory left and I allocate memory in reasonable chunks (~1MB) -> can memory allocation still fail for any reason?
The smartass answer would be "yes, memory allocation can fail for any reason". That may not be what you are looking for.
Generally, whether your system has free memory left is not related to whether allocations succeed. Rather, the question is whether your process address space has free virtual address space.
The allocator (malloc, operator new, ...) first looks if there is free address space in the current process that is already mapped, that is, the kernel is aware that the addresses should be usable. If there is, that address space is reserved in the allocator and returned.
Otherwise, the kernel is asked to map new address space to the process. This may fail, but generally doesn't, as mapping does not imply using physical memory yet -- it is just a promise that, should someone try to access this address, the kernel will try to find physical memory and set up the MMU tables so the virtual->physical translation finds it.
When the system is out of memory, there is no physical memory left, the process is suspended and the kernel attempts to free physical memory by moving other processes' memory to disk. The application does not notice this, except that executing a single assembler instruction apparently took a long time.
Memory allocations in the process fail if there is no mapped free region large enough and the kernel refuses to establish a mapping. For example, not all virtual addresses are useable, as most operating systems map the kernel at some address (typically, 0x80000000, 0xc0000000, 0xe0000000 or something such on 32 bit architectures), so there is a per-process limit that may be lower than the system limit (for example, a 32 bit process on Windows can only allocate 2 GB, even if the system is 64 bit). File mappings (such as the program itself and DLLs) further reduce the available space.
A very general and theoretical answer would be no, it can not. One of the reasons it could possibly under very peculiar circumstances fail is that there would be some weird fragmentation of your available / allocatable memory. I wonder whether you're trying get (probably very minor) performance boost (skipping if pointer == NULL - kind of thing) or you're just wondering and want to discuss it, in which case you should probably use chat.
Yes, memory allocation often fails when you run out of memory space in a 32-bit application (can be 2, 3 or 4 GB depending on OS version and settings). This would be due to a memory leak. It can also fail if your OS runs out of space in your swap file.

What's all this uncommitted, reserved memory in my process?

I'm using VMMap from SysInternals to look at memory allocated by my Win32 C++ process on WinXP, and I see a bunch of allocations where portions of the allocated memory are reserved but not committed. As far as I can tell, from my reading and testing, all of the common memory allocators (e.g., malloc, new, LocalAlloc, GlobalAlloc) used in a C++ program always allocate fully committed blocks of memory.
Heaps are a common example of code that reserves memory but doesn't commit it until needed. I suspect that some of these blocks are Windows/CRT heaps, but there appears to be more of these types of blocks than I would expect for heaps. I see on the order of 30 of these blocks in my process, between 64k and 8MB in size, and I know that my code never intentionally calls VirtualAlloc to allocate reserved, uncommitted memory.
Here are a couple of examples from VMMap: http://www.flickr.com/photos/95123032#N00/5280550393/
What else would allocate such blocks of memory, where much of it is reserved but not committed? Would it make sense that my process has 30 heaps? Thanks.
I figured it out - it's the CRT heap that gets allocated by calls to malloc. If you allocate a large chunk of memory (e.g., 2 MB) using malloc, it allocates a single committed block of memory. But if you allocate smaller chunks (say 177kb), then it will reserve a 1 MB chunk of memory, but only commit approximately what you asked for (e.g., 184kb for my 177kb request).
When you free that small chunk, that larger 1 MB chunk is not returned to the OS. Everything but 4k is uncommitted, but the full 1 MB is still reserved. If you then call malloc again, it will attempt to use that 1 MB chunk to satisfy your request. If it can't satisfy your request with the memory that it's already reserved, it will allocate a new chunk of memory that's twice the previous allocation (in my case it went from 1 MB to 2 MB). I'm not sure if this pattern of doubling continues or not.
To actually return your freed memory to the OS, you can call _heapmin. I would think that this would make a future large allocation more likely to succeed, but it would all depend on memory fragmentation, and perhaps heapmin already gets called if an allocation fails (?), I'm not sure. There would also be a performance hit since heapmin would release the memory (taking time) and malloc would then need to re-allocate it from the OS when needed again. This information is for Windows/32 XP, your mileage may vary.
UPDATE: In my testing, heapmin did absolutely nothing. And the malloc heap is only used for blocks that are less than 512kb. Even if there are MBs of contiguous free space in the malloc heap, it will not use it for requests over 512kb. In my case, this freed, unused, yet reserved malloc memory chewed up huge parts of my process' 2GB address space, eventually leading to memory allocation failures. And since heapmin doesn't return the memory to the OS, I haven't found any solution to this problem, other than restarting my process or writing my own memory manager.
Whenever a thread is created in your application a certain (configurable) amount of memory will be reserved in the address space for the call stack of the thread. There's no need to commit all the reserved memory unless your thread is actually going to need all of that memory. So only a portion needs to be committed.
If more than the committed amount of memory is required, it will be possible to obtain more system memory.
The practical consideration is that the reserved memory is a hard limit on the stack size that reduces address space available to the application. However, by only committing a portion of the reserve, we don't have to consume the same amount of memory from the system until needed.
Therefore it is possible for each thread to have a portion of reserved uncommitted memory. I'm unsure what the page type will be in those cases.
Could they be the DLLs loaded into your process? DLLs (and the executable) are memory mapped into the process address space. I believe this initially just reserves space. The space is backed by the files themselves (at least initially) rather than the pagefile.
Only the code that's actually touched gets paged in. If I understand the terminology correctly, that's when it's committed.
You could confirm this by running your application in a debugger and looking at the modules that are loaded and comparing their locations and sizes to what you see in VMMap.

Resources