Memory Management in Glib - memory-management

How is storage/memory reclaimed in Glib? I've called g_object_unref() and the ref-counts are zero but I'm not sure any storage is ever reclaimed.
Do I need to call a routine? If so, which routine. If not, what?

Much of the memory allocation in GLib is done using the slice allocator, which has better performance when allocating lots of identical-sized blocks of memory, as happens a lot in GLib-using code.
You won't see memory usage jump up and down with the slice allocator in the same way that you would when using traditional malloc. The slice allocator often keeps memory in use for a while in order to reallocate it to other blocks.
If you want to force the slice allocator to behave like malloc, use the environment variable G_SLICE=always-malloc. That's not recommended for production, but it is the recommended way to use valgrind on GLib programs.

Related

How does gc Go handle heap allocation?

Does gc Go (specifically go1.11) pre-allocates a chunk of memory and take from it for each allocation (like JVM), or it allocates every time a variable is created, and is it a kernel call (malloc)?
If it is one kernel call per allocation, that would make variable creation expensive. How can I force allocation on the stack/heap?
This is covered in various places, like the FAQ:
How do I know whether a variable is allocated on the heap or the stack?
From a correctness standpoint, you don't need to know. Each variable
in Go exists as long as there are references to it. The storage
location chosen by the implementation is irrelevant to the semantics
of the language.
The storage location does have an effect on writing efficient
programs. When possible, the Go compilers will allocate variables that
are local to a function in that function's stack frame. However, if
the compiler cannot prove that the variable is not referenced after
the function returns, then the compiler must allocate the variable on
the garbage-collected heap to avoid dangling pointer errors. Also, if
a local variable is very large, it might make more sense to store it
on the heap rather than the stack.
In the current compilers, if a variable has its address taken, that
variable is a candidate for allocation on the heap. However, a basic
escape analysis recognizes some cases when such variables will not
live past the return from the function and can reside on the stack.
Go's memory allocation is carefully optimized for its needs, for example with a custom malloc. I suspect you have a slightly different underlying question/problem that you're struggling with - it would be better to ask that instead. If this is just exploration/curiosity, you'll have to make your question much more specific.

windows memory management: Difference between VirtualAlloc() and heap functions [duplicate]

There are lots of method to allocate memory in Windows environment, such as VirtualAlloc, HeapAlloc, malloc, new.
Thus, what's the difference among them?
Each API is for different uses. Each one also requires that you use the correct deallocation/freeing function when you're done with the memory.
VirtualAlloc
A low-level, Windows API that provides lots of options, but is mainly useful for people in fairly specific situations. Can only allocate memory in (edit: not 4KB) larger chunks. There are situations where you need it, but you'll know when you're in one of these situations. One of the most common is if you have to share memory directly with another process. Don't use it for general-purpose memory allocation. Use VirtualFree to deallocate.
HeapAlloc
Allocates whatever size of memory you ask for, not in big chunks than VirtualAlloc. HeapAlloc knows when it needs to call VirtualAlloc and does so for you automatically. Like malloc, but is Windows-only, and provides a couple more options. Suitable for allocating general chunks of memory. Some Windows APIs may require that you use this to allocate memory that you pass to them, or use its companion HeapFree to free memory that they return to you.
malloc
The C way of allocating memory. Prefer this if you are writing in C rather than C++, and you want your code to work on e.g. Unix computers too, or someone specifically says that you need to use it. Doesn't initialise the memory. Suitable for allocating general chunks of memory, like HeapAlloc. A simple API. Use free to deallocate. Visual C++'s malloc calls HeapAlloc.
new
The C++ way of allocating memory. Prefer this if you are writing in C++. It puts an object or objects into the allocated memory, too. Use delete to deallocate (or delete[] for arrays). Visual studio's new calls HeapAlloc, and then maybe initialises the objects, depending on how you call it.
In recent C++ standards (C++11 and above), if you have to manually use delete, you're doing it wrong and should use a smart pointer like unique_ptr instead. From C++14 onwards, the same can be said of new (replaced with functions such as make_unique()).
There are also a couple of other similar functions like SysAllocString that you may be told you have to use in specific circumstances.
It is very important to understand the distinction between memory allocation APIs (in Windows) if you plan on using a language that requires memory management (like C or C++.) And the best way to illustrate it IMHO is with a diagram:
Note that this is a very simplified, Windows-specific view.
The way to understand this diagram is that the higher on the diagram a memory allocation method is, the higher level implementation it uses. But let's start from the bottom.
Kernel-Mode Memory Manager
It provides all memory reservations & allocations for the operating system, as well as support for memory-mapped files, shared memory, copy-on-write operations, etc. It's not directly accessible from the user-mode code, so I'll skip it here.
VirtualAlloc / VirtualFree
These are the lowest level APIs available from the user mode. The VirtualAlloc function basically invokes ZwAllocateVirtualMemory that in turn does a quick syscall to ring0 to relegate further processing to the kernel memory manager. It is also the fastest method to reserve/allocate block of new memory from all available in the user mode.
But it comes with two main conditions:
It only allocates memory blocks aligned on the system granularity boundary.
It only allocates memory blocks of the size that is the multiple of the system granularity.
So what is this system granularity? You can get it by calling GetSystemInfo. It is returned as the dwAllocationGranularity parameter. Its value is implementation (and possibly hardware) specific, but on many 64-bit Windows systems it is set at 0x10000 bytes, or 64K.
So what all this means, is that if you try to allocate, say just an 8 byte memory block with VirtualAlloc:
void* pAddress = VirtualAlloc(NULL, 8, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
If successful, pAddress will be aligned on the 0x10000 byte boundary. And even though you requested only 8 bytes, the actual memory block that you will get will be the entire page (or, something like 4K bytes. The exact page size is returned in the dwPageSize parameter.) But, on top of that, the entire memory block spanning 0x10000 bytes (or 64K in most cases) from pAddress will not be available for any further allocations. So in a sense, by allocating 8 bytes you could as well be asking for 65536.
So the moral of the story here is not to substitute VirtualAlloc for generic memory allocations in your application. It must be used for very specific cases, as is done with the heap below. (Usually for reserving/allocating large blocks of memory.)
Using VirtualAlloc incorrectly can lead to severe memory fragmentation.
HeapCreate / HeapAlloc / HeapFree / HeapDestroy
In a nutshell, the heap functions are basically a wrapper for VirtualAlloc function. Other answers here provide a pretty good concept of it. I'll add that, in a very simplistic view, the way heap works is this:
HeapCreate reserves a large block of virtual memory by calling VirtualAlloc internally (or ZwAllocateVirtualMemory to be specific). It also sets up an internal data structure that can track further smaller size allocations within the reserved block of virtual memory.
Any calls to HeapAlloc and HeapFree do not actually allocate/free any new memory (unless, of course the request exceeds what has been already reserved in HeapCreate) but instead they meter out (or commit) a previously reserved large chunk, by dissecting it into smaller memory blocks that a user requests.
HeapDestroy in turn calls VirtualFree that actually frees the virtual memory.
So all this makes heap functions perfect candidates for generic memory allocations in your application. It is great for arbitrary size memory allocations. But a small price to pay for the convenience of the heap functions is that they introduce a slight overhead over VirtualAlloc when reserving larger blocks of memory.
Another good thing about heap is that you don't really need to create one. It is generally created for you when your process starts. So one can access it by calling GetProcessHeap function.
malloc / free
Is a language-specific wrapper for the heap functions. Unlike HeapAlloc, HeapFree, etc. these functions will work not only if your code is compiled for Windows, but also for other operating systems (such as Linux, etc.)
This is a recommended way to allocate/free memory if you program in C. (Unless, you're coding a specific kernel mode device driver.)
new / delete
Come as a high level (well, for C++) memory management operators. They are specific for the C++ language, and like malloc for C, are also the wrappers for the heap functions. They also have a whole bunch of their own code that deals C++-specific initialization of constructors, deallocation in destructors, raising an exception, etc.
These functions are a recommended way to allocate/free memory and objects if you program in C++.
Lastly, one comment I want to make about what has been said in other responses about using VirtualAlloc to share memory between processes. VirtualAlloc by itself does not allow sharing of its reserved/allocated memory with other processes. For that one needs to use CreateFileMapping API that can create a named virtual memory block that can be shared with other processes. It can also map a file on disk into virtual memory for read/write access. But that is another topic.
VirtualAlloc is a specialized allocation of the OS virtual memory (VM) system. Allocations in the VM system must be made at an allocation granularity which (the allocation granularity) is architecture dependent. Allocation in the VM system is one of the most basic forms of memory allocation. VM allocations can take several forms, memory is not necessarily dedicated or physically backed in RAM (though it can be). VM allocation is typically a special purpose type of allocation, either because of the allocation has to
be very large,
needs to be shared,
must be aligned on a particular value (performance reasons) or
the caller need not use all of this memory at once...
etc...
HeapAlloc is essentially what malloc and new both eventually call. It is designed to be very fast and usable under many different types of scenarios of a general purpose allocation. It is the "Heap" in a classic sense. Heaps are actually setup by a VirtualAlloc, which is what is used to initially reserve allocation space from the OS. After the space is initialized by VirtualAlloc, various tables, lists and other data structures are configured to maintain and control the operation of the HEAP. Some of that operation is in the form of dynamically sizing (growing and shrinking) the heap, adapting the heap to particular usages (frequent allocations of some size), etc..
new and malloc are somewhat the same, malloc is essentially an exact call into HeapAlloc( heap-id-default ); new however, can [additionally] configure the allocated memory for C++ objects. For a given object, C++ will store vtables on the heap for each caller. These vtables are redirects for execution and form part of what gives C++ it's OO characteristics like inheritance, function overloading, etc...
Some other common allocation methods like _alloca() and _malloca() are stack based; FileMappings are really allocated with VirtualAlloc and set with particular bit flags which designate those mappings to be of type FILE.
Most of the time, you should allocate memory in a way which is consistent with the use of that memory ;). new in C++, malloc for C, VirtualAlloc for massive or IPC cases.
*** Note, large memory allocations done by HeapAlloc are actually shipped off to VirtualAlloc after some size (couple hundred k or 16 MB or something I forget, but fairly big :) ).
*** EDIT
I briefly remarked about IPC and VirtualAlloc, there is also something very neat about a related VirtualAlloc which none of the responders to this question have discussed.
VirtualAllocEx is what one process can use to allocate memory in an address space of a different process. Most typically, this is used in combination to get remote execution in the context of another process via CreateRemoteThread (similar to CreateThread, the thread is just run in the other process).
In outline:
VirtualAlloc, HeapAlloc etc. are Windows APIs that allocate memory of various types from the OS directly. VirtualAlloc manages pages in the Windows virtual memory system, while HeapAlloc allocates from a specific OS heap. Frankly, you are unlikely to ever need to use eiither of them.
malloc is a Standard C (and C++) library function that allocates memory to your process. Implementations of malloc will typically use one of the OS APIs to create a pool of memory when your app starts and then allocate from it as you make malloc requests
new is a Standard C++ operator which allocates memory and then calls constructors appropriately on that memory. It may be implemented in terms of malloc or in terms of the OS APIs, in which case it too will typically create a memory pool on application startup.
VirtualAlloc ===> sbrk() under UNIX
HeapAlloc ====> malloc() under UNIX
VirtualAlloc => Allocates straight into virtual memory, you reserve/commit in blocks. This is great for large allocations, for example large arrays.
HeapAlloc / new => allocates the memory on the default heap (or any other heap that you may create). This allocates per object and is great for smaller objects. The default heap is serializable therefore it has guarantee thread allocation (this can cause some issues on high performance scenarios and that's why you can create your own heaps).
malloc => uses the C runtime heap, similar to HeapAlloc but it is common for compatibility scenarios.
In a nutshell, the heap is just a chunk of virtual memory that is governed by a heap manager (rather than raw virtual memory)
The last model on the memory world is memory mapped files, this scenario is great for large chunk of data (like large files). This is used internally when you open an EXE (it does not load the EXE in memory, just creates a memory mapped file).

Does free() free the memory immediately

In my program, I am using malloc to allocate large amounts of memory (several hundred mbs, in chunks of say 25mb to 75mb at a time), I am subsequently freeing some of the chunks, then again reallocating some more. My question is when I use free() to free memory, does it immediately free the concerned block of memory, or it merely marks it for freeing. If it is merely marking for freeing later, is there some standard C library function to force it to be freed immediately.
I am actually required to develop my program to be portable between linux and vxworks. In Vxworks, in one library I am using(vsipl) , I find 'free' is not freeing up, immediately on the call.
The answer depends upon malloc and free implementation. However, I can safely say that in nearly any implementation the memory does not get deallocated. Instead it goes back into a pool where it can be reused by the process.
If you are using large blocks of memory, it is usually better to use operating system memory functions and do your own memory management. However, it is not a good idea to map pages in and out of memory.

What's the difference between memory allocation and garbage collection, please?

I understand that 'Garbage Collection' is a form of memory management and that it's a way to automatically reclaim unused memory.
But what is 'memory allocation' and the conceptual difference from 'Garbage Collection'?
They are Polar opposites. So yeah, pretty big difference.
Allocating memory is the process of claiming a memory space to store things.
Garbage Collection (or freeing of memory) is the process of releasing that memory back to the pool of available memory.
Many newer languages perform both of these steps in the background for you when variables are declared/initialized, and fall out of scope.
Memory allocation is the act of asking for some memory to the system to use it for something.
Garbage collection is a process to check if some memory that was previously allocated is no longer really in use (i.e. is no longer accessible from the program) to free it automatically.
A subtle point is that the objective of garbage collection is not actually "freeing objects that are no longer used", but to emulate a machine with infinite memory, allowing you to continue to allocate memory and not caring about deallocating it; for this reason, it's not a substitute for the management of other kind resources (e.g. file handles, database connections, ...).
A simple pseudo-code example:
void myFoo()
{
LinkedList<int> myList = new LinkedList<int>();
return;
}
This will request enough new space on the heap to store the LinkedList object.
However, when the function body is over, myList dissapears and you do not have anymore anyway of knowing where this LinkedList is stored (the memory address). Hence, there is absolutely no way to tell to the system to free that memory, and make it available to you again later.
The Java Garbage Collector will do that for you automatically, in the cost of some performance, and with also introducing a little non-determinism (you cannot really tell when the GC will be called).
In C++ there is no native garbage collector (yet?). But the correct way of managing memory is by the use of smart_pointers (eg. std::auto_ptr (deprecated in C++11), std::shared_ptr) etc etc.
You want a book. You go to the library and request the book you want. The library checks to see if they have the book (in which case they do) and you gladly take it and know you must return it later.
You go home, sit down, read the book and finish it. You return the book back to the library the next day because you are finished with it.
That is a simple analogy for memory allocation and garbage collection. Computers have limited memory, just like libraries have limited copies of books. When you want to allocate memory you need to make a request and if the computer has sufficient memory (the library has enough copies for you) then what you receive is a chunk of memory. Computers need memory for storing data.
Since computers have limited memory, you need to return the memory otherwise you will run out (just like if no one returned the books to the library then the library would have nothing, the computer will explode and burn furiously before your very eyes if it runs out of memory... not really). Garbage collection is the term for checking whether memory that has been previously allocated is no longer in use so it can be returned and reused for other purposes.
Memory allocation asks the computer for some memory, in order to store data. For example, in C++:
int* myInts = new int[howManyIntsIWant];
tells the computer to allocate me enough memory to store some number of integers.
Another way of doing the same thing would be:
int myInts[6];
The difference here is that in the second example, we know when the code is written and compiled exactly how much space we need - it's 6 * the size of one int. This lets us do static memory allocation (which uses memory on what's called the "stack").
In the first example we don't know how much space is needed when the code is compiled, we only know it when the program is running and we have the value of howManyIntsIWant. This is dynamic memory allocation, which gets memory on the "heap".
Now, with static allocation we don't need to tell the computer when we're finished with the memory. This relates to how the stack works; the short version is that once we've left the function where we created that static array, the memory is swallowed up straight away.
With dynamic allocation, this doesn't happen so the memory has to be cleaned up some other way. In some languages, you have to write the code to deallocate this memory, in other it's done automatically. This is garbage collection - some automatic process built into the language that will sweep through all of the dynamically allocated memory on the heap, work out which bits aren't being used and deallocate them (i.e. free them up for other processes and programs).
So: memory allocation = asking for memory for your program. Garbage collection = where the programming language itself works out what memory isn't being used any more and deallocates it for you.

Why is free() not allowed in garbage-collected languages?

I was reading the C# entry on Wikipedia, and came across:
Managed memory cannot be explicitly freed; instead, it is automatically garbage collected.
Why is it that in languages with automatic memory management, manual management isn't even allowed? I can see that in most cases it wouldn't be necessary, but wouldn't it come in handy where you are tight on memory and don't want to rely on the GC being smart?
Languages with automatic memory management are designed to provide substantial memory safety guarantees that can't be offered in the presence of any manual memory management.
Among the problems prevented are
Double free()s
Calling free() on a pointer to memory that you do not own, leading to illegal access in other places
Calling free() on a pointer that was not the return value of an allocation function, such as taking the address of some object on the stack or in the middle of an array or other allocation.
Dereferencing a pointer to memory that has already been free()d
Additionally, automatic management can result in better performance when the GC moves live objects to a consolidated area. This improves locality of reference and hence cache performance.
Garbage collection enforces the type safety of a memory allocator by guaranteeing that memory allocations never alias. That is, if a piece of memory is currently being viewed as a type T, the memory allocator can guarantee (with garbage collection) that while that reference is alive, it will always refer to a T. More specifically, it means that the memory allocator will never return that memory as a different type.
Now, if a memory allocator allows for manual free() and uses garbage collection, it must ensure that the memory you free()'d is not referenced by anyone else; in other words, that the reference you pass in to free() is the only reference to that memory. Most of the time this is prohibitively expensive to do given an arbitrary call to free(), so most memory allocators that use garbage collection do not allow for it.
That isn't to say it is not possible; if you could express a single-referrent type, you could manage it manually. But at that point it would be easier to either stop using a GC language or simply not worry about it.
Calling GC.Collect is almost always the better than having an explicit free method. Calling free would make sense only for pointers/object refs that are referenced from nowhere. That is something that is error prone, since there is a chance that your call free for the wrong kind of pointer.
When the runtime environment does reference counting monitoring for you, it knows which pointers can be freed safely, and which not, so letting the GC decide which memory can be freed avoids a hole class of ugly bugs. One could think of a runtime implementation with both GC and free where explicitly calling free for a single memory block might be much faster than running a complete GC.Collect (but don't expect freeing every possible memory block "by hand" to be faster than the GC). But I think the designers of C#, CLI (and other languages with garbage collectors like Java) have decided to favor robustness and safety over speed here.
In systems that allow objects to be manually freed, the allocation routines have to search through a list of freed memory areas to find some free memory. In a garbage-collection-based system, any immediately-available free memory is going to be at the end of the heap. It's generally faster and easier for the system to ignore unused areas of memory in the middle of the heap than it would be to try to allocate them.
Interestingly enough, you do have access to the garbage collector through System.GC -- Though from everything I've read, it's highly recommended that you allow the GC manage itself.
I was advised once to use the following 2 lines by a 3rd party vendor to deal with a garbage collection issue with a DLL or COM object or some-such:
// Force garbage collection (cleanup event objects from previous run.)
GC.Collect(); // Force an immediate garbage collection of all generations
GC.GetTotalMemory(true);
That said, I wouldn't bother with System.GC unless I knew exactly what was going on under the hood. In this case, the 3rd party vendor's advice "fixed" the problem that I was dealing with regarding their code. But I can't help but wonder if this was actually a workaround for their broken code...
If you are in situation that you "don't want to rely on the GC being smart" then most probably you picked framework for your task incorrectly. In .net you can manipulate GC a bit (http://msdn.microsoft.com/library/system.gc.aspx), in Java not sure.
I think you can't call free because you start doing one task of GC. GC's efficiency can be somehow guaranteed overall when it does things the way it finds it best and it does them when it decides. If developers will interfere with GC it might decrease it's overall efficiency.
I can't say that it is the answer, but one that comes to mind is that if you can free, you can accidentally double free a pointer/reference or even worse, use one after free. Which defeats the main point of using languages like c#/java/etc.
Of course one possible solution to that, would be to have your free take it's argument by reference and set it to null after freeing. But then, what if they pass an r-value like this: free(whatever()). I suppose you could have an overload for r-value versions, but don't even know if c# supports such a thing :-P.
In the end, even that would be insufficient because as has been pointed out, you can have multiple references to the same object. Setting one to null would do nothing to prevent the others from accessing the now deallocated object.
Many of the other answers provide good explanations of how the GC work and how you should think when programming against a runtime system which provides a GC.
I would like to add a trick that I try to keep in mind when programming in GC'd languages. The rule is this "It is important to drop pointers as soon as possible." By dropping pointers I mean that I no longer point to objects that I no longer will use. For instance, this can be done in some languages by setting a variable to Null. This can be seen as a hint to the garbage collector that it is fine to collect this object, provided there are no other pointers to it.
Why would you want to use free()? Suppose you have a large chunk of memory you want to deallocate.
One way to do it is to call the garbage collector, or let it run when the system wants. In that case, if the large chunk of memory can't be accessed, it will be deallocated. (Modern garbage collectors are pretty smart.) That means that, if it wasn't deallocated, it could still be accessed.
Therefore, if you can get rid of it with free() but not the garbage collector, something still can access that chunk (and not through a weak pointer if the language has the concept), which means that you're left with the language's equivalent of a dangling pointer.
The language can defend itself against double-frees or trying to free unallocated memory, but the only way it can avoid dangling pointers is by abolishing free(), or modifying its meaning so it no longer has a use.
Why is it that in languages with automatic memory management, manual management isn't even allowed? I can see that in most cases it wouldn't be necessary, but wouldn't it come in handy where you are tight on memory and don't want to rely on the GC being smart?
In the vast majority of garbage collected languages and VMs it does not make sense to offer a free function although you can almost always use the FFI to allocate and free unmanaged memory outside the managed VM if you want to.
There are two main reasons why free is absent from garbage collected languages:
Memory safety.
No pointers.
Regarding memory safety, one of the main motivations behind automatic memory management is eliminating the class of bugs caused by incorrect manual memory management. For example, with manual memory management calling free with the same pointer twice or with an incorrect pointer can corrupt the memory manager's own data structures and cause non-deterministic crashes later in the program (when the memory manager next reaches its corrupted data). This cannot happen with automatic memory management but exposing free would open up this can of worms again.
Regarding pointers, the free function releases a block of allocated memory at a location specified by a pointer back to the memory manager. Garbage collected languages and VMs replace pointers with a more abstract concept called references. Most production GCs are moving which means the high-level code holds a reference to a value or object but the underlying location in memory can change as the VM is capable of moving allocated blocks of memory around without the high-level language knowing. This is used to compact the heap, preventing fragmentation and improving locality.
So there are good reasons not to have free when you have a GC.
Manual management is allowed. For example, in Ruby calling GC.start will free everything that can be freed, though you can't free things individually.

Resources