Default heap for a process - windows

I read this article which is about Managing heap memory written by Randy Kath.
I would ask about this segment:
Every process in Windows has one heap called the default heap.
Processes can also have as many other dynamic heaps as they wish,
simply by creating and destroying them on the fly. The system uses the
default heap for all global and local memory management functions, and
the C run-time library uses the default heap for supporting malloc
functions.
I did not underestand what is the function or benefit of the default heap?
Also, I have another question, the author referred always to reserved and committed space, what is meant by committed space?

Applications need heaps to allocate dynamic memory. Windows automatically creates one heap for each process. This is the default heap. Most apps just use this single default heap.
Committing is the act of assigning reserved virtual addresses to specific memory so that it is available for use by the process. I suggest you read this article on MSDN: Managing Virtual Memory.

Related

Why does windows allow to create private heap?

I am learning memory managment in Windows. I know that process in windows has by default its heap, that can be extended in future. Also process can create additional (private) heaps. Why does windows allow to create private heap? What is benefit of such approach? As I understand usage of default heap (with possible reallocations) is enough. Or maybe is it another way to optimize reallocations?
If you look at HeapCreate you will see that it has multiple options that changes how the heap works. HEAP_NO_SERIALIZE will make it faster but you have to handle thread synchronization on your own etc.
Having multiple heaps can also be beneficial if you allocate objects of different sizes with different lifetimes. You might want to put large long-living objects on their own heap if you also have a high churn of small objects that are allocated and de-allocated as part of your work to reduce fragmentation (and lock contention if you are multithreaded).
As noted in a comment, you can call HeapDestroy to free every allocation and the heap itself in one call but this only makes sense if you have full control over everything allocated there. You are not allowed to destroy the default heap so you must create your own private heap to use this trick.

windows memory management: Difference between VirtualAlloc() and heap functions [duplicate]

There are lots of method to allocate memory in Windows environment, such as VirtualAlloc, HeapAlloc, malloc, new.
Thus, what's the difference among them?
Each API is for different uses. Each one also requires that you use the correct deallocation/freeing function when you're done with the memory.
VirtualAlloc
A low-level, Windows API that provides lots of options, but is mainly useful for people in fairly specific situations. Can only allocate memory in (edit: not 4KB) larger chunks. There are situations where you need it, but you'll know when you're in one of these situations. One of the most common is if you have to share memory directly with another process. Don't use it for general-purpose memory allocation. Use VirtualFree to deallocate.
HeapAlloc
Allocates whatever size of memory you ask for, not in big chunks than VirtualAlloc. HeapAlloc knows when it needs to call VirtualAlloc and does so for you automatically. Like malloc, but is Windows-only, and provides a couple more options. Suitable for allocating general chunks of memory. Some Windows APIs may require that you use this to allocate memory that you pass to them, or use its companion HeapFree to free memory that they return to you.
malloc
The C way of allocating memory. Prefer this if you are writing in C rather than C++, and you want your code to work on e.g. Unix computers too, or someone specifically says that you need to use it. Doesn't initialise the memory. Suitable for allocating general chunks of memory, like HeapAlloc. A simple API. Use free to deallocate. Visual C++'s malloc calls HeapAlloc.
new
The C++ way of allocating memory. Prefer this if you are writing in C++. It puts an object or objects into the allocated memory, too. Use delete to deallocate (or delete[] for arrays). Visual studio's new calls HeapAlloc, and then maybe initialises the objects, depending on how you call it.
In recent C++ standards (C++11 and above), if you have to manually use delete, you're doing it wrong and should use a smart pointer like unique_ptr instead. From C++14 onwards, the same can be said of new (replaced with functions such as make_unique()).
There are also a couple of other similar functions like SysAllocString that you may be told you have to use in specific circumstances.
It is very important to understand the distinction between memory allocation APIs (in Windows) if you plan on using a language that requires memory management (like C or C++.) And the best way to illustrate it IMHO is with a diagram:
Note that this is a very simplified, Windows-specific view.
The way to understand this diagram is that the higher on the diagram a memory allocation method is, the higher level implementation it uses. But let's start from the bottom.
Kernel-Mode Memory Manager
It provides all memory reservations & allocations for the operating system, as well as support for memory-mapped files, shared memory, copy-on-write operations, etc. It's not directly accessible from the user-mode code, so I'll skip it here.
VirtualAlloc / VirtualFree
These are the lowest level APIs available from the user mode. The VirtualAlloc function basically invokes ZwAllocateVirtualMemory that in turn does a quick syscall to ring0 to relegate further processing to the kernel memory manager. It is also the fastest method to reserve/allocate block of new memory from all available in the user mode.
But it comes with two main conditions:
It only allocates memory blocks aligned on the system granularity boundary.
It only allocates memory blocks of the size that is the multiple of the system granularity.
So what is this system granularity? You can get it by calling GetSystemInfo. It is returned as the dwAllocationGranularity parameter. Its value is implementation (and possibly hardware) specific, but on many 64-bit Windows systems it is set at 0x10000 bytes, or 64K.
So what all this means, is that if you try to allocate, say just an 8 byte memory block with VirtualAlloc:
void* pAddress = VirtualAlloc(NULL, 8, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
If successful, pAddress will be aligned on the 0x10000 byte boundary. And even though you requested only 8 bytes, the actual memory block that you will get will be the entire page (or, something like 4K bytes. The exact page size is returned in the dwPageSize parameter.) But, on top of that, the entire memory block spanning 0x10000 bytes (or 64K in most cases) from pAddress will not be available for any further allocations. So in a sense, by allocating 8 bytes you could as well be asking for 65536.
So the moral of the story here is not to substitute VirtualAlloc for generic memory allocations in your application. It must be used for very specific cases, as is done with the heap below. (Usually for reserving/allocating large blocks of memory.)
Using VirtualAlloc incorrectly can lead to severe memory fragmentation.
HeapCreate / HeapAlloc / HeapFree / HeapDestroy
In a nutshell, the heap functions are basically a wrapper for VirtualAlloc function. Other answers here provide a pretty good concept of it. I'll add that, in a very simplistic view, the way heap works is this:
HeapCreate reserves a large block of virtual memory by calling VirtualAlloc internally (or ZwAllocateVirtualMemory to be specific). It also sets up an internal data structure that can track further smaller size allocations within the reserved block of virtual memory.
Any calls to HeapAlloc and HeapFree do not actually allocate/free any new memory (unless, of course the request exceeds what has been already reserved in HeapCreate) but instead they meter out (or commit) a previously reserved large chunk, by dissecting it into smaller memory blocks that a user requests.
HeapDestroy in turn calls VirtualFree that actually frees the virtual memory.
So all this makes heap functions perfect candidates for generic memory allocations in your application. It is great for arbitrary size memory allocations. But a small price to pay for the convenience of the heap functions is that they introduce a slight overhead over VirtualAlloc when reserving larger blocks of memory.
Another good thing about heap is that you don't really need to create one. It is generally created for you when your process starts. So one can access it by calling GetProcessHeap function.
malloc / free
Is a language-specific wrapper for the heap functions. Unlike HeapAlloc, HeapFree, etc. these functions will work not only if your code is compiled for Windows, but also for other operating systems (such as Linux, etc.)
This is a recommended way to allocate/free memory if you program in C. (Unless, you're coding a specific kernel mode device driver.)
new / delete
Come as a high level (well, for C++) memory management operators. They are specific for the C++ language, and like malloc for C, are also the wrappers for the heap functions. They also have a whole bunch of their own code that deals C++-specific initialization of constructors, deallocation in destructors, raising an exception, etc.
These functions are a recommended way to allocate/free memory and objects if you program in C++.
Lastly, one comment I want to make about what has been said in other responses about using VirtualAlloc to share memory between processes. VirtualAlloc by itself does not allow sharing of its reserved/allocated memory with other processes. For that one needs to use CreateFileMapping API that can create a named virtual memory block that can be shared with other processes. It can also map a file on disk into virtual memory for read/write access. But that is another topic.
VirtualAlloc is a specialized allocation of the OS virtual memory (VM) system. Allocations in the VM system must be made at an allocation granularity which (the allocation granularity) is architecture dependent. Allocation in the VM system is one of the most basic forms of memory allocation. VM allocations can take several forms, memory is not necessarily dedicated or physically backed in RAM (though it can be). VM allocation is typically a special purpose type of allocation, either because of the allocation has to
be very large,
needs to be shared,
must be aligned on a particular value (performance reasons) or
the caller need not use all of this memory at once...
etc...
HeapAlloc is essentially what malloc and new both eventually call. It is designed to be very fast and usable under many different types of scenarios of a general purpose allocation. It is the "Heap" in a classic sense. Heaps are actually setup by a VirtualAlloc, which is what is used to initially reserve allocation space from the OS. After the space is initialized by VirtualAlloc, various tables, lists and other data structures are configured to maintain and control the operation of the HEAP. Some of that operation is in the form of dynamically sizing (growing and shrinking) the heap, adapting the heap to particular usages (frequent allocations of some size), etc..
new and malloc are somewhat the same, malloc is essentially an exact call into HeapAlloc( heap-id-default ); new however, can [additionally] configure the allocated memory for C++ objects. For a given object, C++ will store vtables on the heap for each caller. These vtables are redirects for execution and form part of what gives C++ it's OO characteristics like inheritance, function overloading, etc...
Some other common allocation methods like _alloca() and _malloca() are stack based; FileMappings are really allocated with VirtualAlloc and set with particular bit flags which designate those mappings to be of type FILE.
Most of the time, you should allocate memory in a way which is consistent with the use of that memory ;). new in C++, malloc for C, VirtualAlloc for massive or IPC cases.
*** Note, large memory allocations done by HeapAlloc are actually shipped off to VirtualAlloc after some size (couple hundred k or 16 MB or something I forget, but fairly big :) ).
*** EDIT
I briefly remarked about IPC and VirtualAlloc, there is also something very neat about a related VirtualAlloc which none of the responders to this question have discussed.
VirtualAllocEx is what one process can use to allocate memory in an address space of a different process. Most typically, this is used in combination to get remote execution in the context of another process via CreateRemoteThread (similar to CreateThread, the thread is just run in the other process).
In outline:
VirtualAlloc, HeapAlloc etc. are Windows APIs that allocate memory of various types from the OS directly. VirtualAlloc manages pages in the Windows virtual memory system, while HeapAlloc allocates from a specific OS heap. Frankly, you are unlikely to ever need to use eiither of them.
malloc is a Standard C (and C++) library function that allocates memory to your process. Implementations of malloc will typically use one of the OS APIs to create a pool of memory when your app starts and then allocate from it as you make malloc requests
new is a Standard C++ operator which allocates memory and then calls constructors appropriately on that memory. It may be implemented in terms of malloc or in terms of the OS APIs, in which case it too will typically create a memory pool on application startup.
VirtualAlloc ===> sbrk() under UNIX
HeapAlloc ====> malloc() under UNIX
VirtualAlloc => Allocates straight into virtual memory, you reserve/commit in blocks. This is great for large allocations, for example large arrays.
HeapAlloc / new => allocates the memory on the default heap (or any other heap that you may create). This allocates per object and is great for smaller objects. The default heap is serializable therefore it has guarantee thread allocation (this can cause some issues on high performance scenarios and that's why you can create your own heaps).
malloc => uses the C runtime heap, similar to HeapAlloc but it is common for compatibility scenarios.
In a nutshell, the heap is just a chunk of virtual memory that is governed by a heap manager (rather than raw virtual memory)
The last model on the memory world is memory mapped files, this scenario is great for large chunk of data (like large files). This is used internally when you open an EXE (it does not load the EXE in memory, just creates a memory mapped file).

Java Card memory leak in for loop?

I know that Java Card VM's doesn't have have a garbage collector, but what happens with a for loop:
for(short x=0;x<10;x++)
{}
Does the x variable get utilized after the for loop, or it turns into garbage?
Just in case I have a transient byte array called index from size of 2 (instead of i in for loop) and I use the array in for loops:
for(index[0]=0;index[0]<10;index[0]++)
{}
But it is a little slower than the first version. If I use a normal variable for the index in a for loop then it gets really slow.
So, what happens with the x variable in the first for loop? Is it safe to use for loops like that, or not?
The x variable does not really exist in byte code. There are operations on a location in the Java stack that represents x (be it Java byte code or the code after conversion by the Java Card converter). Now the Java stack is a virtual stack. This stack is implemented on a CPU that has registers and a non-virtual stack. In general, if there are enough registers available, then the variable x is simply put in a register until it is out of scope. The register may of course be reused. The CPU stack itself is a LIFO (last in first out) queue in transient memory (RAM). The stack continuously grows and shrinks during the execution of the byte code that makes up your Applet. Like registers, the stack memory is reused over and over again. All the local variables (those defined inside code blocks as well as method arguments) are treated this way.
If you put your variable in a transient array then you are putting the variable on a RAM based heap. The Java Card RAM heap will never go out of scope. That means that if you update the value that the change needs to be written to transient memory. That is of course slower than a localized update of a CPU register, as you found by experimentation. Usually the memory in the transient memory is never freed. That said, you can of course reuse the memory for other purposes, as long as you have a reference to the array. Note that the references themselves (the index in index[0]) may be either in persistent memory (EEPROM or flash) or transient memory.
It's unclear what you mean with "normal variable". If this is something that has been created with new or if it is a field in an object instance then it persists in the heap in persistent memory (EEPROM or flash). EEPROM and flash have limited amount of write cycles and writing to EEPROM or flash is much much slower than writing to RAM.
Java Card contains two kinds of transient memory: CLEAR_ON_RESET and CLEAR_ON_DESELECT. The difference between the two is that the CLEAR_ON_RESET allows memory to be shared between Applet instances while the CLEAR_ON_DESELECT allows memory to be reused by different Applets.
Java Card classic doesn't contain a garbage collector that runs during Applet execution, you can usually only request garbage collection during startup using JCSystem.requestObjectDeletion() which will clean up the memory that is not referenced anymore, both on the heap in transient memory as well as in persistent memory. Cleaning up the memory means scanning all the memory, marking all the unreferenced blocks and then compacting the memory. This is similar to defragmenting a hard disk; it can take a uncomfortably long time.
ROM is filled during the manufacturing phase. It may contain the operating system, the Java Card API implementation, byte code (including constants) of pre-loaded applets etc. It only be read in the field, so it isn't of any consequence to the question asked.
Let us have a short introduction about the Memory. In brief, there is 3 types of memories in the Smart cards as below:
ROM (And sometimes FLASH)
EEPROM
RAM
ROM:
The card's OS and Java Card APIs and some factory proprietary packages stored here. The contents of this memory is fixed and you can't modify it. Writing in this memory happens only once in the chip production and the process is named Masking.
EEPROM:
This is modifiable memory that your applets load into and it is consist of 4 sections named as below:
Text : also known as code segment contains the machine instructions of the program. The code can be thought of like the text of a novel: It tells the story of what the program does
Data : contains the static data of the program, i.e. the variables that exist throughout program execution. Global variables in a C or C++ program are static, as are variables declared as static in C, C++, or Java.
Heap : is a pool of memory used for dynamically allocated memory, such as with malloc() in C or new in C++ and Java.
Stack : contains the system stack, which is used as temporary storage.
A power-less (Card tearing for example) doesn't have any effect on the contents of this memory.
RAM:
This is a modifiable type of memory also. There is three main difference between RAM and EEPROM:
RAM is really faster than EEPROM. (1000 times faster)
Contents of RAM will destroyed in the power-loss.
The number of writing in EEPROM is limited (Typically 100.000 times.) while RAM has a really higher number.
And What now?
When you write for(short x=0; x<10; x++), you define x as a local variable. Local variables stores in Stack. The stack pointer will reset on the power loss and the used stack part will reclaim. So the main problem of the absence of Garbage Collector is about Heap.
i.e when you define a local variable using new keyword, you specify that part of Heap to a local variable for ever. When the Runtime Environment finished that method, the object will destroy and become unavailable, while that section of Heap doesn't reclaimed. So you will lose that part of Heap. The case that you used for yourfor loop, seems OK and doesn't make any problem because you didn't use new keyword.
Note that in newer versions of Java Card (2.2.2 and higher) there is a manual Garbage Collector (look JCSystem.requestObjectDeletion documentation). But consider that it is really slow and even dangerous in some situations(Look Java Card power less during garbage collection question).

How does the operating system know how much memory my app is using? (And why doesn't it do garbage collection?)

When my task manager (top, ps, taskmgr.exe, or Finder) says that a process is using XXX KB of memory, what exactly is it counting, and how does it get updated?
In terms of memory allocation, does an application written in C++ "appear" different to an operating system from an application that runs as a virtual machine (managed code like .NET or Java)?
And finally, if memory is so transparent - why is garbage collection not a function-of or service-provided-by the operating system?
As it turns out, what I was really interested in asking is WHY the operating system could not do garbage collection and defrag memory space - which I see as a step above "simply" allocating address space to processes.
These answers help a lot! Thanks!
This is a big topic that I can't hope to adequately answer in a single answer here. I recommend picking up a copy of Windows Internals, it's an invaluable resource. Eric Lippert had a recent blog post that is a good description of how you can view memory allocated by the OS.
Memory that a process is using is basically just address space that is reserved by the operating system that may be backed by physical memory, the page file, or a file. This is the same whether it is a managed application or a native application. When the process exits, the operating system deletes the memory that it had allocated for it - the virtual address space is simply deleted and the page file or physical memory backings are free for other processes to use. This is all the OS really maintains - mappings of address space to some physical resource. The mappings can shift as processes demand more memory or are idle - physical memory contents can be shifted to disk and vice versa by the OS to meet demand.
What a process is using according to those tools can actually mean one of several things - it can be total address space allocated, total memory allocated (page file + physical memory) or memory a process is actually using that is resident in memory. Task Manager has a separate column for each of these possibilities.
The OS can't do garbage collection since it has no insight into what that memory actually contains - it just sees allocated pages of memory, it doesn't see objects which may or may not be referenced.
Whereas the OS handles allocates at the virtual address level, in the process itself there are other memory managers which take these large, page-sized chunks and break them up into something useful for the application to use. Windows will return memory allocated in 64k boundaries, but then the heap manager breaks it up into smaller chunks for use by each individual allocation done by the program via new. In .Net applications, the CLR will hand off new objects off of the garbage collected heap and when that heap reaches its limits, will perform garbage collection.
I can't speak to your question about the differences in how the memory appears in C++ vs. virtual machines, etc, but I can say that applications are typically given a certain memory range to use upon initialization. Then, if the application ever requires more, it will request it from the operating system, and the operating system will (generally) grant it. There are many implementations of this - in some operating systems, other applications' memory is moved away so as to give yours a larger contiguous block, and in others, your application gets various chunks of memory. There may even be some virtual memory handling involved. Its all up to an abstracted implementation. In any case, the memory is still treated as contiguous from within your program - the operating system will handle that much at least.
With regard to garbage collection, the operating system knows the bounds of your memory, but not what is inside. Furthermore, even if it did look at the memory used by your application, it would have no idea what blocks are used by out-of-scope variables and such that the GC is looking for.
The primary difference is application management. Microsoft distinguishes this as Managed and Unmanaged. When objects are allocated in memory they are stored at a specific address. This is true for both managed and unmanaged applications. In the managed world the "address" is wrapped in an object handle or "reference".
When memory is garbage collected by a VM, it can safely suspend the application, move objects around in memory and update all the "references" to point to their new locations in memory.
In a Win32 style app, pointers are pointers. There's no way for the OS to know if it's an object, an arbitrary block of data, a plain-old 32-bit value, etc. So it can't make any inferences about pointers between objects, so it can't move them around in memory or update pointers between objects.
Because the way references are handled, the OS can't take over the GC process and instead it's left up to the VM to manage the memory used by the application. For that reason, VM applications appear exactly the same to the OS. They simply request blocks of memory for use and the OS gives it to them. When the VM performs GC and compacts it's memory it's able to free memory back to the OS for use by another app.

What is the purpose of allocating pages in the pagefile with CreateFileMapping?

The function CreateFileMapping can be used to allocate space in the pagefile (if the first argument is INVALID_HANDLE_VALUE). The allocated space can later be memory mapped into the process virtual address space.
Why would I want to do this instead of using just VirtualAlloc?
It seems that both functions do almost the same thing. Memory allocated by VirtualAlloc may at some point be pushed out to the pagefile. Why should I need an API that specifically requests that my pages be allocated there in the first instance? Why should I care where my private pages live?
Is it just a hint to the OS about my expected memory usage patterns? (Ie, the former is a hint to swap out those pages more aggressively.)
Or is it simply a convenience method when working with very large datasets on 32-bit processes? (Ie, I can use CreateFileMapping to make >4Gb allocations, then memory map smaller chunks of the space as needed. Using the pagefile saves me the work of manually managing my own set of files to "swap" to.)
PS. This question is sparked by an article I read recently: http://blogs.technet.com/markrussinovich/archive/2008/11/17/3155406.aspx
From the CreateFileMappingFunction:
A single file mapping object can be shared by multiple processes.
Can the Virtual memory be shared across multiple processes?
One reason is to share memory among different processes. Different processes by only knowing the name of the mapping object can communicate over page file. This is preferable over creating a real file and doing the communications. Of course there may be other use cases. You can refer to Using a File Mapping for IPC at MSDN for more information.

Resources