How to release memory allocated by gcnew?

How to release memory allocated by gcnew? - memory-management

After some tests with help of Task Manager, I understood one thing about gcnew — memory allocated for local variables remaines allocated even if control leaves function, and is re-allocated only when control re-enters this function — so I'm in perplexity, how to deallocate memory myself. Here is some example of the problem:
void Foo(void)
{
System::Text::StringBuilder ^ t = gcnew System::Text::StringBuilder("");
int i = 0;
while(++i < 20000000) t->Append(i);
return;
}
As I mentioned, memory for variable t remains after leaving Foo(), delete not work as it works for new, and calling Foo() once, only gives me pointless allocated memory.

This is gcnew, which means garbage collected allocation. It will be disposed and deallocated by GC thread

Your function uses memory for code and data. The code is a fixed amount and will be used the entire time the library or program is loaded. The data is only used when the function is executing.
Data used by a program is either static or dynamic. Static data is laid out by the compiler and is basically equivalent to code (except that it might be marked as non-executable and/or read-only to prevent accidents). Dynamic data is temporary and allocated from a stack or heap (or CPU registers).
In a classic program, the stack and heap share the same memory address range with the stack at one end, growing toward the heap and the heap at the other end, trying not to grow into the stack. However, with modern address ranges on the order of 1TB, a heap generally has a lot of room.
Keep in mind that when a program requests an address range, it's just signaling to the operating system that it's okay to use that address for data reading, data writing and/or code execution. Until it actually puts something there, there is no load on the system. Also keep in mind with a virtual memory system, process memory is effectively allocated on the swap file/device (hard drive) with optimizations especially using RAM for caching, copy on write and many other techniques. (Data written to a memory address might never make it to the swap file, but that's up to the operating system.)
The data your function needs is for the two variables: t and i. t is a reference to a garbage collected object. i is an integer. Both are quite small and short-lived. You could think of them as being on the stack. When the function returns, the stack frame is popped and their memory is reused by the next stack operation. If you are looking at memory allocation, there won't be a change because the amount of memory allocated to the stack would not be changed.
Now in the execution of your function, a new object is created and, the way it's filled with data, it takes up quite a bit of memory. You could consider that object to be created in the heap. You don't need to delete it since it is a garbage collection object. When the garbage collector runs by walking all objects reachable from a set of root objects, it will find that the object is not reachable and add its space to a free list. When space for a new object is needed that doesn't fit into any blocks on the free list, more of the heap's address range will be used.
The CLR heap is compactable, which means it can move objects around in order to coalesce free blocks. Using this ability, it can move objects out of areas of allocated memory and give it back to the operating system, thereby freeing up space in the swap file.
So, there are three things that have to happen for you to see a reduction in the amount of memory allocated to the process:
The garbage collection has run to find unreachable objects.
The heap has been compacted.
The heap allocation has been reduced.
None of these things are really necessary until the swap file can't grow anymore. Obviously, the system has been designed for performance and to be a good citizen so it wouldn't take it that far. You can influence when garbage collection runs but this is only very rarely helpful and is generally not done.

Related

Overwriting data in memory

I've written a password manager in Ocaml. In order to make it as secure as possible, I'd like to store a string (an encryption key) in memory in such a way that it can be overwritten. Since Ocaml is pass by value , and there's a garbage collector, this has proven difficult. I encrypt all buffers and variables I can, but I still need a "session key" stored to do this. To prevent this from being detected by automated key searching programs or put into swap, it's assembled from a bunch of random data in a buffer using a random increment. So really, all I need is a single variable that can be overwritten for the assembled key for a few seconds before it's passed into the Nocrypto library... Would a reference work for this?
According to this cornell "Refs and Arrays" page, refs are mutable and work similarly to pointers in C. That being said, I also found a stack overflow answer discussing Ocaml refs, in which the answer mentions "they act like pointers to new allocated memory". Does this mean every time, it just allocates a new thing in memory instead of actually mutating the stuff in memory? If so, you couldn't really "overwrite" a ref.
Other possible solutions I've come across are Bigarrays, and "custom blocks". I'm not entirely sure if "custom blocks" are actually allocated outside of the scope of garbage collection or not. They seem like they're used to access external C code. Are they copied around by the garbage collector? Could they be "overwritten?" There's also this idea of "opaque bytes" and opaque objects in memory. I'm having a pretty hard time wrapping my head around how this all fits together. A useful but confusing (to me) discussion of custom blocks in memory on stack overflow is here: Are custom blocks ever copied in memory? Answer says they can be moved around. Even so, could they be overwritten?
The last possible solution is to store it using a Cstruct like the Nocrypto library seems to do. They discuss it in this github issue: Secret material erasure The asker states:
"Granted, most key material is Cstruct.t, which is a Bigarray.Array1.t, which is allocated outside of the GC heap"
Is this even correct? If so, I can't seem to find a source file that actually does this. I'm pretty new to Ocaml and functional programming in general. If you're curious, my program is located on github here: ocaml-pass

TL;DR;
You shall not store any secret information in OCaml heap. Thus you must never copy your secret into any OCaml heap-allocated value, consequently, neither Bytes, nor Strings or Arrays could be used, even temporary.
Introduction to the OCaml Memory Model
OCaml values are uniformly represented as tagged machine words. The least significant bit of a word is used as a tag, that distinguishes between pointers (tag=0) and immediate values (tag=1). Thus a value has always a fixed size, and is a pointer or an immediate.
Immediate values store their data in the most significant part of the word, that is 31-bits in 32 bit systems, and 63 bits in 64-bit systems. Pointers store their data in blocks, that are located in a so-called OCaml Heap. The OCaml Heap is a set of blocks managed by the Garbage Collector (GC). A block is a chunk of data prefixed with a header. The header specifies the size of data, and some other meta information, used by the GC. Block can contain OCaml values (pointers or immediate values) or opaque data.
To summarize. All OCaml values are represented as machine words, that either store data directly in the word or are pointers to heap-allocated blocks. Each pointer points to one and only one block. Several pointers may point to the same block. Such values are considered physically equal. Some blocks are not pointed by any pointers. Such blocks are called dead and are reclaimed by the GC.
Introduction to the OCaml Garbage Collector
The GC manages blocks, by allocating, moving, and deallocating them. The GC itself uses an arena, that is either obtained from the C memory allocator (malloc) or directly from a kernel via the memmap syscall (depends on a particular system and runtime).
The GC is generational, that means that values are first allocated in a special region of a heap called minor heap. The minor heap is a contiguous region of memory of fixed size, represented in the runtime with three pointers: the pointer beg to the beginning of the minor heap, a pointer end to the end of the minor heap, and the pointer cur to the beginning of the free area of the minor heap. When a block is allocated, cur is increased by the size of the block. Then the block is initialized with data. When there is no more free space in the minor heap (i.e., then end - cur is less than the required block size), then a minor GC cycle is triggered. The GC analyzes all blocks stored in the Minor Heap and copies all blocks that are referenced by at least one pointer to the Major Heap. After that, the cur pointer is set to beg.
In the Major Heap, a block may also be copied several times during a process called compaction. The compactor may try to rearrange blocks in its arena in order to achieve more compact representation of the heap.
Security Consequences
As the OCaml GC is a moving GC, it may copy the heap-allocated data arbitrarily. Although it is called moving, it is still in fact just copying. I.e., when a block is moved from the minor heap to the major heap, it is in fact just bit-copied, and thus is duplicated. The block phantom in the minor heap may live for an arbitrary amount of time until it is overwritten by some newly allocated value. When an object is moved during the compaction, it is also copied, and may or may not be overwritten during the process. And, of course, it goes without saying, that once a block becomes dead, it still may survive in a heap for an arbitrary amount of time until reused by the GC.
That all means, that if a secret ends up in the OCaml heap, it will go wild, as the GC can replicate it multiple times in an arbitrary and rather unpredictable ways. Thus, we can only store secrets either in immediate values or in regions that are not controlled by the GC. As it was said before, all OCaml values that are pointers, always point to a block in the OCaml heap. A block may contain data directly, or it could contain a pointer itself, that will point outside the memory heap. The so-called custom blocks, may or may not store their information in the OCaml heap, it depeds on a particular representation of each custom block. For example, the Bigarray library provides custom blocks that store their payload outside of the OCaml heap. Thus a Bigarray is a custom block, that has two fields: a pointer and size. It is an opaque block, i.e., the GC will never treat these two values as OCaml values, and will never follow neither the size nor the pointer. The data pointed by a pointer is located outside of the OCaml heap, and is either allocated by malloc or by memmap (in fact, it could be arbitrary integer, and even point to stack, or static data, it doesn't really matter, as long as we treat bigarrays just as a ptr,len pair).
This all makes Bigarrays ideal for storing secrets. We can be sure, that they are not moved by the GC, we can overwrite them to prevent the information leakage once they are freed.
Further considerations
We should be careful, and never allow a secret to be copied into the OCaml heap from our safe place. That means, that even if our main storage is a safe bigarray the information will still leak if we will copy its contents to an OCaml string. Consequently, if we first read the information into OCaml string, and then copy it into bigarray, the information will still leak. Thus, any interface that uses OCaml heap-allocated values is unsafe and shall not be used. For example, we can't use OCaml channels to read or write secrets (we should rely on memory mapping or unbuffered IO provided by the Unix module). And again, whenever you get a string data type from a Bigarray, you get your data copied, with all the ramifications.

I would use a value of type bytes, essentially a mutable array of bytes:
# let buffer = Bytes.make 16 'x';;
val buffer : bytes = "xxxxxxxxxxxxxxxx"
# Bytes.set buffer 0 'T';;
- : unit = ()
# buffer;;
- : bytes = "Txxxxxxxxxxxxxxx"
# Bytes.fill buffer 0 16 ' ';;
- : unit = ()
# buffer;;
- : bytes = " "
You can overwrite with Bytes.fill after you're done.

What does freeing memory mean? Does it mean setting all bits to zeros?

I directly started with managed languages and have barely any experience with C++, hence this question might be too basic.
In a managed language like .net, GC frees the memory. From what I read, in C++ this is done by calling delete. But what does it do to free memory? Does it it set all the bits at a memory location to zero? Or does it in some other way tell the operating system that the memory is available for reuse?
Update:
I have been thru this before and I know what GC does. But thats not my question. I am not trying to ask how GC works. What I am trying to understand is, how do you tell some memory is free?

delete does three different things:
Runs the destructor of the object (or of all objects in the array in the case of delete[]).
Marks the chunk of memory previously used by the object as free.
If possible, informs the operating system that a chunk of memory is free for other programs to use.
Your question is about #2 and #3 together, but they are very different things. To understand how #2 works, remember that the (typically) single "heap" provided by the operating system is segmented into smaller chunks of different sizes. When you allocate a chunk of memory with new, you get a pointer to a previously free part of the heap, and the runtime performs the necessary bookkeeping that marks that region as unavailable for further allocations. delete does the reverse: it performs the bookkeeping that marks the region as available again, optionally coalescing it with adjacent free regions to reduce fragmentation. Subsequent calls to new will consider that region when looking for free memory to return.
In other words, it is wrong to ask what happens with the memory when it is deallocated. The real magic happens in the bookkeeping region! To learn about implementation of generic allocators, google for implementation of malloc.
As for #3, it is an optional step and in many cases impossible to perform. It is only possible to "give back" freed memory that happens to reside at the very end of the allocated heap. A single allocation situated after a large region will remove the possibility of giving back.

In C++, if you allocate memory using "new", that portion of memory will be allocated by the OS to that particular process, until you release that memory or until that process exits.
If some portion of memory allocated for a process means that OS does not allow other process to use that portion of memory until that process release that memory. In C++, you have to use "delete" to release memory.
By releasing memory portion, process just inform the OS that it does not use this portion any more so that OS can allocate that memory portion whenever other processes request memory. In that case content of the memory portion will not be changed.

Garbage collection is just automatic memory management (so you never need to delete anything, the system will take care of it for you). I'm not 100% on whether it sets memory locations to 0, but I would assume it doesnt, since when you delete in c++, thats not what happens, it just allows the space to be used for storage. Writing zeros over everything is much more inefficient and not necessary. Here's some links that might be able to help explain this more thoroughly:
How does garbage collection and scoping work in C#?
http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)

Inside each application, dynamic memory are managed by "heaps". When your code asks for a block of memory, it asks the heap manager to allocate a block of memory, when your application frees that block of memory, it returns it back to the heap manager. In a traditional application, you must explicitly return each memory you allocated. Otherwise you will eventually run out of memory.
In languages like C# or Java, the runtime offers a garbage collector. A garbage collector automatically identify "unreachable" memory block and free them. An unreable memory block is a block that is no longer referenced by any variables. For example, if you have a global variable p1 that points to a block of memory, because p1 is global, so it is visible to anywhere in your code, then it is always reachable. Thus it will never be released by garbage collector. On the other hand, if you have local variable p2 in one of your function Foo, vairable p2 is no longer reachable after Foo has returned. The garbage collector is able to identify such variables and free any memory block pointed by them.
As application/garbage collector interact with the heap, the heap may decide to ask for more memory from the OS or return it to the OS. The OS manages all these memory request from different process and it then decide how to allocate the actual physical memory to different process.

No, it does not set the bits to zero. In a very simplified explanation,
First the garbage collector must determine, not what objects are no longer accessible ("not reachable"), but which ones are still accessible or reachable. It does this by simply listing all object roots. A root is a memory location containing a pointer to a reference object (an object on the heap). Then, recursively, it flags as "reachable" every object referenced by a root, or referenced by a field or property of a object already flagged as reachable.
There are four types of roots.
static variables containing reference objects
reference objects on the stack for any currently active thread.
reference types in method parameters
reference objects pointed to by CPU registers.
After determining what reference objects are still accessible (reachable) by any code in the App Domain, it takes all those objects that are still reachable, and if there are any gaps in physical memory between them, it "defragments" them by moving some of them so they are all contiguous, then it sets the pointer which represents the "end" od "used" memory to the end of this new compressed defragmented list. All new memory allocations, for newly instantiated objects, are then allocated from the memory immediately after this pointer location.
If there are no gaps in the memory used by the reachable objects, it just resets the pointer to the end of the last reachable object in the list.

No, deleting a pointer does not set the bytes to zero.
It's not in the standard of course, but it would be a performance overhead and serious implementations don't bother doing it, and it does not even make sense, when the memory is used for complex objects (floats, objects, strings, etc)
You can always try it out.
Declare a pointer to an int, write an integer, delete the pointer.
Then read the content of the deleted pointer again.
Does it have the same content?
int *ptr = new int;
*ptr = 13;
cout << "Before delete: " << *ptr << endl;
delete ptr;
cout << "After delete: " << *ptr << endl;
Yes probably it will, BUT ptr is just a dangling pointer you have there, the memory has been returned to the system and it will be available when you allocate memory again, it's likely that when you allocate another int* it will be pointing where ptr was pointing.

Garbage collection and memory management in Erlang

I want to know technical details about garbage collection (GC) and memory management in Erlang/OTP.
But, I cannot find on erlang.org and its documents.
I have found some articles online which talk about GC in a very general manner, such as what garbage collection algorithm is used.

To classify things, lets define the memory layout and then talk about how GC works.
Memory Layout
In Erlang, each thread of execution is called a process. Each process has its own memory and that memory layout consists of three parts: Process Control Block, Stack and Heap.
PCB: Process Control Block holds information like process identifier (PID), current status (running, waiting), its registered name, and other such info.
Stack: It is a downward growing memory area which holds incoming and outgoing parameters, return addresses, local variables and temporary spaces for evaluating expressions.
Heap: It is an upward growing memory area which holds process mailbox messages and compound terms. Binary terms which are larger than 64 bytes are NOT stored in process private heap. They are stored in a large Shared Heap which is accessible by all processes.
Garbage Collection
Currently Erlang uses a Generational garbage collection that runs inside each Erlang process private heap independently, and also a Reference Counting garbage collection occurs for global shared heap.
Private Heap GC: It is generational, so divides the heap into two segments: young and old generations. Also there are two strategies for collecting; Generational (Minor) and Fullsweep (Major). The generational GC just collects the young heap, but fullsweep collect both young and old heap.
Shared Heap GC: It is reference counting. Each object in shared heap (Refc) has a counter of references to it held by other objects (ProcBin) which are stored inside private heap of Erlang processes. If an object's reference counter reaches zero, the object has become inaccessible and will be destroyed.
To get more details and performance hints, just look at my article which is the source of the answer: Erlang Garbage Collection Details and Why It Matters

A reference paper for the algorithm: One Pass Real-Time Generational Mark-Sweep Garbage Collection (1995) by Joe Armstrong and Robert Virding in
1995 (at CiteSeerX)
Abstract:
Traditional mark-sweep garbage collection algorithms do not allow reclamation of data until the mark phase of the algorithm has terminated. For the class of languages in which destructive operations are not allowed we can arrange that all pointers in the heap always point backwards towards "older" data. In this paper we present a simple scheme for reclaiming data for such language classes with a single pass mark-sweep collector. We also show how the simple scheme can be modified so that the collection can be done in an incremental manner (making it suitable for real-time collection). Following this we show how the collector can be modified for generational garbage collection, and finally how the scheme can be used for a language with concurrent processes.1

Erlang has a few properties that make GC actually pretty easy.
1 - Every variable is immutable, so a variable can never point to a value that was created after it.
2 - Values are copied between Erlang processes, so the memory referenced in a process is almost always completely isolated.
Both of these (especially the latter) significantly limit the amount of the heap that the GC has to scan during a collection.
Erlang uses a copying GC. During a GC, the process is stopped then the live pointers are copied from the from-space to the to-space. I forget the exact percentages, but the heap will be increased if something like only 25% of the heap can be collected during a collection, and it will be decreased if 75% of the process heap can be collected. A collection is triggered when a process's heap becomes full.
The only exception is when it comes to large values that are sent to another process. These will be copied into a shared space and are reference counted. When a reference to a shared object is collected the count is decreased, when that count is 0 the object is freed. No attempts are made to handle fragmentation in the shared heap.
One interesting consequence of this is, for a shared object, the size of the shared object does not contribute to the calculated size of a process's heap, only the size of the reference does. That means, if you have a lot of large shared objects, your VM could run out of memory before a GC is triggered.
Most if this is taken from the talk Jesper Wilhelmsson gave at EUC2012.

I don't know your background, but apart from the paper already pointed out by jj1bdx you can also give a chance to Jesper Wilhelmsson thesis.
BTW, if you want to monitor memory usage in Erlang to compare it to e.g. C++ you can check out:
Erlang Instrument Module
Erlang OS_MON Application
Hope this helps!

What's the difference between memory allocation and garbage collection, please?

I understand that 'Garbage Collection' is a form of memory management and that it's a way to automatically reclaim unused memory.
But what is 'memory allocation' and the conceptual difference from 'Garbage Collection'?

They are Polar opposites. So yeah, pretty big difference.
Allocating memory is the process of claiming a memory space to store things.
Garbage Collection (or freeing of memory) is the process of releasing that memory back to the pool of available memory.
Many newer languages perform both of these steps in the background for you when variables are declared/initialized, and fall out of scope.

Memory allocation is the act of asking for some memory to the system to use it for something.
Garbage collection is a process to check if some memory that was previously allocated is no longer really in use (i.e. is no longer accessible from the program) to free it automatically.
A subtle point is that the objective of garbage collection is not actually "freeing objects that are no longer used", but to emulate a machine with infinite memory, allowing you to continue to allocate memory and not caring about deallocating it; for this reason, it's not a substitute for the management of other kind resources (e.g. file handles, database connections, ...).

A simple pseudo-code example:
void myFoo()
{
LinkedList<int> myList = new LinkedList<int>();
return;
}
This will request enough new space on the heap to store the LinkedList object.
However, when the function body is over, myList dissapears and you do not have anymore anyway of knowing where this LinkedList is stored (the memory address). Hence, there is absolutely no way to tell to the system to free that memory, and make it available to you again later.
The Java Garbage Collector will do that for you automatically, in the cost of some performance, and with also introducing a little non-determinism (you cannot really tell when the GC will be called).
In C++ there is no native garbage collector (yet?). But the correct way of managing memory is by the use of smart_pointers (eg. std::auto_ptr (deprecated in C++11), std::shared_ptr) etc etc.

You want a book. You go to the library and request the book you want. The library checks to see if they have the book (in which case they do) and you gladly take it and know you must return it later.
You go home, sit down, read the book and finish it. You return the book back to the library the next day because you are finished with it.
That is a simple analogy for memory allocation and garbage collection. Computers have limited memory, just like libraries have limited copies of books. When you want to allocate memory you need to make a request and if the computer has sufficient memory (the library has enough copies for you) then what you receive is a chunk of memory. Computers need memory for storing data.
Since computers have limited memory, you need to return the memory otherwise you will run out (just like if no one returned the books to the library then the library would have nothing, the computer will explode and burn furiously before your very eyes if it runs out of memory... not really). Garbage collection is the term for checking whether memory that has been previously allocated is no longer in use so it can be returned and reused for other purposes.

Memory allocation asks the computer for some memory, in order to store data. For example, in C++:
int* myInts = new int[howManyIntsIWant];
tells the computer to allocate me enough memory to store some number of integers.
Another way of doing the same thing would be:
int myInts[6];
The difference here is that in the second example, we know when the code is written and compiled exactly how much space we need - it's 6 * the size of one int. This lets us do static memory allocation (which uses memory on what's called the "stack").
In the first example we don't know how much space is needed when the code is compiled, we only know it when the program is running and we have the value of howManyIntsIWant. This is dynamic memory allocation, which gets memory on the "heap".
Now, with static allocation we don't need to tell the computer when we're finished with the memory. This relates to how the stack works; the short version is that once we've left the function where we created that static array, the memory is swallowed up straight away.
With dynamic allocation, this doesn't happen so the memory has to be cleaned up some other way. In some languages, you have to write the code to deallocate this memory, in other it's done automatically. This is garbage collection - some automatic process built into the language that will sweep through all of the dynamically allocated memory on the heap, work out which bits aren't being used and deallocate them (i.e. free them up for other processes and programs).
So: memory allocation = asking for memory for your program. Garbage collection = where the programming language itself works out what memory isn't being used any more and deallocates it for you.

Does calling free or delete ever release memory back to the "system"

Here's my question: Does calling free or delete ever release memory back to the "system". By system I mean, does it ever reduce the data segment of the process?
Let's consider the memory allocator on Linux, i.e ptmalloc.
From what I know (please correct me if I am wrong), ptmalloc maintains a free list of memory blocks and when a request for memory allocation comes, it tries to allocate a memory block from this free list (I know, the allocator is much more complex than that but I am just putting it in simple words). If, however, it fails, it gets the memory from the system using say sbrk or brk system calls. When a memory is free'd, that block is placed in the free list.
Now consider this scenario, on peak load, a lot of objects have been allocated on heap. Now when the load decreases, the objects are free'd. So my question is: Once the object is free'd will the allocator do some calculations to find whether it should just keep this object in the free list or depending upon the current size of the free list it may decide to give that memory back to the system i.e decrease the data segment of the process using sbrk or brk?
Documentation of glibc tells me that if the allocation request is much larger than page size, it will be allocated using mmap and will be directly released back to the system once free'd. Cool. But let's say I never ask for allocation of size greater than say 50 bytes and I ask a lot of such 50 byte objects on peak load on the system. Then what?
From what I know (correct me please), a memory allocated with malloc will never be released back to the system ever until the process ends i.e. the allocator will simply keep it in the free list if I free it. But the question that is troubling me is then, if I use a tool to see the memory usage of my process (I am using pmap on Linux, what do you guys use?), it should always show the memory used at peak load (as the memory is never given back to the system, except when allocated using mmap)? That is memory used by the process should never ever decrease(except the stack memory)? Is it?
I know I am missing something, so please shed some light on all this.
Experts, please clear my concepts regarding this. I will be grateful. I hope I was able to explain my question.

There isn't much overhead for malloc, so you are unlikely to achieve any run-time savings. There is, however, a good reason to implement an allocator on top of malloc, and that is to be able to trace memory leaks. For example, you can free all memory allocated by the program when it exits, and then check to see if your memory allocator calls balance (i.e. same number of calls to allocate/deallocate).
For your specific implementation, there is no reason to free() since the malloc won't release to system memory and so it will only release memory back to your own allocator.
Another reason for using a custom allocator is that you may be allocating many objects of the same size (i.e you have some data structure that you are allocating a lot). You may want to maintain a separate free list for this type of object, and free/allocate only from this special list. The advantage of this is that it will avoid memory fragmentation.

No.
It's actually a bad strategy for a number of reasons, so it doesn't happen --except-- as you note, there can be an exception for large allocations that can be directly made in pages.
It increases internal fragmentation and therefore can actually waste memory. (You can only return aligned pages to the OS, so pulling aligned pages out of a block will usually create two guaranteed-to-be-small blocks --smaller than a page, anyway-- to either side of the block. If this happens a lot you end up with the same total amount of usefully-allocated memory plus lots of useless small blocks.)
A kernel call is required, and kernel calls are slow, so it would slow down the program. It's much faster to just throw the block back into the heap.
Almost every program will either converge on a steady-state memory footprint or it will have an increasing footprint until exit. (Or, until near-exit.) Therefore, all the extra processing needed by a page-return mechanism would be completely wasted.

It is entirely implementation dependent. On Windows VC++ programs can return memory back to the system if the corresponding memory pages contain only free'd blocks.

I think that you have all the information you need to answer your own question. pmap shows the memory that is currenly being used by the process. So, if you call pmap before the process achieves peak memory, then no it will not show peak memory. if you call pmap just before the process exits, then it will show peak memory for a process that does not use mmap. If the process uses mmap, then if you call pmap at the point where maximum memory is being used, it will show peak memory usage, but this point may not be at the end of the process (it could occur anywhere).
This applies only to your current system (i.e. based on the documentation you have provided for free and mmap and malloc) but as the previous poster has stated, behavior of these is implmentation dependent.

This varies a bit from implementation to implementation.
Think of your memory as a massive long block, when you allocate to it you take a bit out of your memory (labeled '1' below):
111
If I allocate more more memory with malloc it gets some from the system:
1112222
If I now free '1':
___2222
It won't be returned to the system, because two is in front of it (and memory is given as a continous block). However if the end of the memory is freed, then that memory is returned to the system. If I freed '2' instead of '1'. I would get:
111
the bit where '2' was would be returned to the system.
The main benefit of freeing memory is that that bit can then be reallocated, as opposed to getting more memory from the system. e.g:
33_2222

I believe that the memory allocator in glibc can return memory back to the system, but whether it will or not depends on your memory allocation patterns.
Let's say you do something like this:
void *pointers[10000];
for(i = 0; i < 10000; i++)
pointers[i] = malloc(1024);
for(i = 0; i < 9999; i++)
free(pointers[i]);
The only part of the heap that can be safely returned to the system is the "wilderness chunk", which is at the end of the heap. This can be returned to the system using another sbrk system call, and the glibc memory allocator will do that when the size of this last chunk exceeds some threshold.
The above program would make 10000 small allocations, but only free the first 9999 of them. The last one should (assuming nothing else has called malloc, which is unlikely) be sitting right at the end of the heap. This would prevent the allocator from returning any memory to the system at all.
If you were to free the remaining allocation, glibc's malloc implementation should be able to return most of the pages allocated back to the system.
If you're allocating and freeing small chunks of memory, a few of which are long-lived, you could end up in a situation where you have a large chunk of memory allocated from the system, but you're only using a tiny fraction of it.

Here are some "advantages" to never releasing memory back to the system:
Having already used a lot of memory makes it very likely you will do so again, and
when you release memory the OS has to do quite a bit of paperwork
when you need it again, your memory allocator has to re-initialise all its data structures in the region it just received
Freed memory that isn't needed gets paged out to disk where it doesn't actually make that much difference
Often, even if you free 90% of your memory, fragmentation means that very few pages can actually be released, so the effort required to look for empty pages isn't terribly well spent

Many memory managers can perform TRIM operations where they return entirely unused blocks of memory to the OS. However, as several posts here have mentioned, it's entirely implementation dependent.
But lets say I never ask for allocation of size greater than say 50 bytes and I ask a lot of such 50 byte objects on peak load on the system. Then what ?
This depends on your allocation pattern. Do you free ALL of the small allocations? If so and if the memory manager has handling for a small block allocations, then this may be possible. However, if you allocate many small items and then only free all but a few scattered items, you may fragment memory and make it impossible to TRIM blocks since each block will have only a few straggling allocations. In this case, you may want to use a different allocation scheme for the temporary allocations and the persistant ones so you can return the temporary allocations back to the OS.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio