What deterministic garbage collection algorithms are out there?

What deterministic garbage collection algorithms are out there? - algorithm

By deterministic I vaguely mean that can be used in critical real-time software like aerospace flight software. Garbage collectors (and dynamic memory allocation for that matter) are big no-no's in flight software because they are considered non-deterministic. However, I know there's ongoing research on this, so I wonder if this problem has been solved yet.
I'm also including in the question any garbage collection algorithms that put restrictions on how they're used.

I know I might get a lot of down-votes for this reply, but if you are already trying to avoid dynamic memory in the first place, because you said it's a no-no, why do you use GC at all? I'd never use GC in a real-time system where predictable runtime speed is the major concern. I'd avoid dynamic memory wherever possible, thus there are very, very little dynamic objects to start with and then I'd handle the very few dynamic allocations I have manually, so I have 100% control when something is released and where it is released. After all not just GC is not deterministic, free() is as little deterministic as malloc() is. Nobody says that a free() call just has to mark the memory as free. It might as well try to combine smaller free memory blocks surrounding the free'd one to a big one and this behavior is not deterministic, nor is the runtime for it (sometimes free won't do that and malloc will do that instead on next allocation, but nowhere is written that free mustn't do that).
In a critical realtime system, you might even replace the system standard malloc()/free() with a different implementation, maybe even writing your own one (it's not as hard as it sounds! I've done that before just for the fun of it) that works most deterministic. For me GC is a plain convenience thingy, it is to get programmers away from focusing on sophisticated malloc()/free() planing and instead having the system deal with this automatically. It helps doing rapid software development and saves hours of debugging working finding and fixing memory leaks. But just like I'd never use GC within an operating system kernel, I'd never use it within a critical realtime application either.
If I need a more sophisticated memory handling, I'd maybe write my own malloc()/free() that works as desired (and most deterministic) and write my own reference counting model on top of it. Reference counting is still manual memory management, but much more comfortable than just using malloc()/free(). It is not ultra fast, but deterministic (at least increasing/decreasing the ref counter is deterministic in speed) and unless you may have circular references, it will catch all dead memory if you follow a retain/release strategy throughout your application. The only non deterministic part about is that you won't know if calling release will just decrease the ref counter or really free the object (depending if the ref count goes to zero or not), but you could delay the actual free by offering a function to say "releaseWithoutFreeing", which decreases the ref counter by one, but even if it reaches zero, it won't free() the object yet. Your malloc()/free() implementation can have a function "findDeadObjects" that searches for all objects with a retain counter of zero, that have not yet been released and free them (at a later point, when you are in a less critical part of your code that has more time for such kind of tasks). Since this is also not deterministic, you could limit the amount of time it may use for this like "findDeadObjectsForUpTo(ms)", and ms is the amount of milliseconds it may use for finding and freeing them, coming back as soon as this time quantum has been used, so you won't spent too much time in this task.

Metronome GC and BEA JRockit are two deterministic GC implementations that I'm aware of (both for Java).

Happened to be searching through Stack Overflow and noticed this rather old post.
Jon Anderson mentioned JamaicaVM. Since these posts have been up for over 4 years now,
I think its important to respond to some of the information posted here.
I work for aicas, the developers and marketers of JamaicaVM, JamaicaCAR, and Veriflux.
JamaicaVM does have a hard realtime garbage collector. It is fully preemptive. The exact
same behavior required in a realtime operating system. Although the preemption latency is
CPU speed dependent, assume that on a Ghz class processor preemption of the garbage collector is less than 1 microsecond. There is a 32 bit singlecore version that supports up to 3 GB of memory per process address space. There is a 32 bit multicore version that supports 3 GB of memory per process address space and multiple cores. There are also 64 bit singlecore and multicore versions that support up to 128 GB of memory per process address space. The performance of the garbage collector is independent of the size of memory. In response to one of the responses regarding running the GC completely out of memory, for a hard realtime system you would not design your program to ever do that. Although you can, in fact, use a hard realtime GC in this scenario, you would have to account for a worst case execution time that probably would not be acceptable to your application.
Instead, the correct approach would be to analyze your program for maximum memory allocation, and then configure the hard realtime garbage collector to incrementally free blocks during all previous allocations so that the specific scenario described never occurs. This is known as thread-distributed, work-paced garbage collection.
Dr. Siebert's book on Hard Realtime Garbage Collectors describes how to accomplish this and presents a formal proof that the garbage collector will keep up with the application, while not becoming an O(N) operation.
It is very important to understand that realtime garbage collection means several things:
The garbage collector is preemptible, just like any other operating system service
It can be proven, mathematically that the garbage collector will keep up, such that memory will not be exhausted because some memory has not been reclaimed yet.
The garbage collector does not fragment memory, such that as long as there is memory available, a memory request will succeed.
Additionally, you will need this to be part of a system with priority inversion protection, a fixed priority thread scheduler and other features. Refer to the RTSJ for some information on this.
Although hard realtime garbage collection is needed for safety-critical applications, it can be used mission critical, and general purpose Java applications as well. There is no inherent limitations in using a hard realtime garbage collector. For general use, you can expect smoother program execution since there are no long garbage collector pauses.

To me, 100% real-time Java is still very much a hit-and-miss technology, but I don't claim to be an expert.
I'd recommend reading up on these articles - Cliff Click blog. He's the architect of Azul, has pretty much coded all of the standard 1.5 Java concurrent classes etc... FYI, Azul is designed for systems which require very large heap sizes, rather than just standard RT requirements.

It's not GC, but there are simple O(1) fixed sized block allocation/free schemes you can use for simple usage. For example, you can use a free list of fixed sized blocks.
struct Block {
Block *next;
}
Block *free_list = NULL; /* you will need to populate this at start, an
* easy way is to just call free on each block you
* want to add */
void release(void *p) {
if(p != NULL) {
struct Block *b_ptr = (struct Block *)p;
b_ptr->next = free_list;
free_list = b_ptr;
}
}
void *acquire() {
void *ret = (void *)free_list;
if(free_list != NULL) {
free_list = free_list->next;
}
return ret;
}
/* call this before you use acquire/free */
void init() {
/* example of an allocator supporting 100 blocks each 32-bytes big */
static const int blocks = 100;
static const int size = 32;
static unsigned char mem[blocks * size];
int i;
for(i = 0; i < blocks; ++i) {
free(&mem[i * size]);
}
}
If you plan accordingly, you could limit your design to only a few specific sizes for dynamic allocation and have a free_list for each potential size. If you are using c++, you can implement something simple like scoped_ptr (for each size, i'd use a template param) to get simpler yet still O(1) memory management.
The only real caveat, is that you will have no protection from double frees or even accidentally passing a ptr to release which didn't come from acquire.

Sun has extensively documented their real-time garbage collector, and provided benchmarks you can run for yourself here. Others mentioned Metronome, which is the other major production-grade RTGC algorithm. Many other vendors of RT JVMs have their own implementations -- see my list of vendors over here and most of them provide extensive documentation.
If your interest is particularly in avionics/flight software, I suggest you take a look at aicas, an RTSJ vendor who specifically markets to the avionics industry. Dr. Siebert's (aicas CEO) home page lists some academic publications that go into great detail about PERC's GC implementation.

You may have some luck with the following PhD thesis
CMU-CS-01-174 - Scalable Real-time Parallel Garbage Collection for Symmetric Multiprocessors.

Real-time means a guaranteed upper bound on response time. This means an upper bound on the instructions that you can execute until you deliver the result. This also puts an upper limit on the amount of data you can touch. If you don't know how much memory you're going to need, it is extremely likely that you'll have a computation for which you cannot give an upper limit of its execution time. Otoh, if you know the upper bound of your computation, you also know how much memory gets touched by it (unless you don't really know what your software does). So, the amount of knowledge you have about your code obviates the need for a GC.
There are features, like regions in RT-Java, that allow for expressiveness beyond local and global (static) variables. But they will not relieve you from your obligation to manage the memory you allocate, because otherwise you cannot guarantee that the next upcoming allocation will not fail because of insufficient memory resources.
Admittedly: I've gotten somewhat suspicious about things that call themselves "realtime garbage collectors". Of course, any GC is real time if you assume that every allocation runs a full collection (which still doesn't guarantee that it will succeed afterwards, because all memory blocks might found to be reachable). But for any GC that promises a lower time bound on allocation, consider its performance on the following example code:
// assume that on `Link` object needs k bytes:
class Link {
Link next = null;
/* further fields */
static Link head = null;
}
public static void main (String args) {
// assume we have N bytes free now
// set n := floor (N/k), assume that n > 1
for (int i = 0; i < n; i ++) {
Link tmp = new Link ();
tmp.next = Link.head;
Link.head = tmp;
}
// (1)
Link.head = Link.head.next; // (2)
Link tmp = new Link (); // (3)
}
At point (1), we have less than k
bytes free (allocation of another
Link object would fail), and all
Link objects allocated so far are
reachable starting from the
Link.static Link head field.
At point (2),
(a) what used to be the first entry in the list is now not reachable, but
(b) it is still allocated, as far as the memory management part is concerned.
At
point (3), the allocation should
succeed because of (2a) - we can use
what used to be the first link - but,
because of (2b), we must start the
GC, which will end up traversing n-1
objects, hence have a running time
of O(N).
So, yes, it's a contrived example. But a GC that claims to have a bound on allocation should be able to master this example as well.

I know this post is a bit dated, but I have done some interesting research and want to make sure this is updated.
Deterministic GC can be offered by Azul Systems "Zing JVM" and JRocket. Zing comes with some very interesting added features and is now "100% software based" (can run on x86 machines). It is only for Linux at this time though ...
Price:
If you are on Java 6 or before Oracle is now charging a 300% uplift and forcing support for this capability ($15,000 per processor & $3,300 support). Azul, from what I have heard is around $10,000 - $12,000, but charges by physical machine, not core / processor. Also, the process are graduated by volume so the more servers you leverage the deeper the discounting. My conversations with them showed them to be quite flexible. Oracle is a perpetual license and Zing is subscription based ... but if you do the math and add in other features that Zing has (see differences below).
You can cut cost by moving to Java 7, but then incur development costs. Given Oracle's roadmap (a new release every 18 months or so), and the fact that they historically only offer the latest plus one older versions of Java SE updates for free, the "free" horizon is expected to be 3 years from the initial GA release if any major version. Since initial GA releases are typically not adopted in production for 12-18 months, and that moving production systems to new major java releases typically carries major costs, this means that Java SE support bills will start hitting somewhere between 6 and 24 months after initial deployment.
Notable differences:
JRocket does still have some scalability limitations in terms of RAM (though improved from days of old). You can improve your results with a bit of tuning. Zing has engineered their algorithm to allow continuous, concurrent, compaction (no stop the world pauses and no "tuning" required). This allows Zing to scale without a theoretical memory ceiling (they are doing 300+ GB heaps without suffering stop the world or crashing). Talk about a paradigm changer (think of the implications to big data). Zing has some really cool improvements to locking giving it amazing performance with a bit of work (if tuned, can go sub-millisecond average). Finally, they have visibility into classes, methods, and thread behavior in production (no overhead). We are considering this as a huge time saver when considering updates, patches, and bug-fixes (e.g. leaks & locks). This can practically eliminate the need to recreate many of the issues in Dev / Test.
Links to JVM Data I found:
JRocket Deterministic GC
Azul Presentation - Java without Jitter
Azul / MyChannels Test

I know azul systems has a jvm whose GC is hardware assisted. It can also run concurrently and collect massive amounts of data pretty fast.
Not sure how deterministic it is though.

Related

What is Go's memory footprint

This article on Wired about Dropbox's switch from Go to Rust for its MagicPocket product says
“memory footprint”—the amount of computer memory it demands while running Magic Pocket—was too high for the massive storage systems the company was trying to build.
Question(s): What exactly is Go's "memory footprint" (where does it come from, how is it measured etc, is it related to garbage collection,binary size, is it something that will always be high) and why is it higher than Rust's?

"It had a high memory footprint" is just another way to say their program used a lot of RAM. It is related to garbage collection in that GC'd programs only free memory periodically (because each GC cycle takes CPU time), whereas manual memory management tends to free memory more or less as soon as it's unused.
The downside of manual memory management is either that mistakes can cause crashes and security bugs (as in C++, where you can accidentally use a freed variable after the memory has been reused for something else) or you have to put effort into expressing the exact lifetimes of each variable, reference, etc. in your code so that the compiler can check that they're being used in a valid way (as in Rust, where you interact with the borrow checker to root out potentially incorrect uses of memory in your code).
The sentence in the Wired story makes it sound like "memory footprint" is a simple measurable quantity you could assign to any language (and your question takes that idea to its logical conclusion). It's not quite that simple. In different languages, doing different things has different costs in memory, performance, and so on, and you kind of have to understand languages'/runtimes' details to know how the language will work with a given sort of program.
For example, CPython has reference counting, and that frees unused memory sooner but at the cost of having to store and update reference counts. Java has, on the one hand, things like object headers that add a certain amount of memory overhead per object, but uses some tricks to speed garbage collection (like generational collection) that Go doesn't (yet). Or in Go, you might try to reduce the memory footprint of a program by recycling memory with free pools and adjusting GOGC to free unused memory more often as kostya said.
The bigger point there is not that those specific details I listed are super important, but that there can be a lot of details to consider other than "higher memory footprint" or "lower memory footprint."
So: "memory footprint" refers to the amount of RAM a particular program with a particular workload takes up. Bigger picture, it's one factor in a large set of tradeoffs that folks like you or I or Dropbox's team have to navigate.

The garbage collector requires free memory to be available to work efficiently. By default Go application needs roughly twice as much memory as size of live data set (memory occupied by application objects).
This can be tuned using GOGC environment variable. By setting it to a lower value the application will request less memory from OS but GC will run more frequently therefore will use more CPU resources. By setting it to a higher value the GC will run less frequently and use less resources but the application will have higher "memory footprint".
This is general idea but the exact memory, performance requirements and GOGC effect are highly application specific.

Alternative for Garbage Collector

I'd like to know the best alternative for a garbage collector, with its pros and cons. My priority is speed, memory is less important. If there is garbage collector which doesn't make any pause, let me know.
I'm working on a safe language (i.e. a language with no dangling pointers, checking bounds, etc), and garbage collection or its alternative has to be used.

I suspect you will be best sticking with garbage collection (as per the JVM) unless you have a very good reason otherwise. Modern GCs are extremely fast, general purpose and safe. Unless you can design your language to take advantage of a very specific special case (as in one of the above allocators) then you are unlikely to beat the JVM.
The only really compelling reason I see nowadays as an argument against modern GC is latency issues caused by GC pauses. These are small, rare and not really an issue for most purposes (e.g. I've successfully written 3D engines in Java), but they still can cause problems in very tight realtime situations.
Having said that, there may still be some special cases where a different memory allocation scheme may make sense so I've listed a few interesting options below:
An example of a very fast, specialised memory management approach is the "per frame" allocator used in many games. This works by incrementing a single pointer to allocate memory, and at the end of a time period (typically a visual "frame") all objects are discarded at once by simply setting the pointer back to the base address and overwriting them in the next allocation. This can be "safe", however the constraints of object lifetime would be very strict. Might be a winner if you can guarantee that all memory allocation is bounded in size and only valid for the scope of handling e.g. a single server request.
Another very fast approach is to have dedicated object pools for different classes of object. Released objects can just be recycled in the pool, using something like a linked list of free object slots. Operating systems often used this kind of approach for common data structures. Again however you need to watch object lifetime and explicitly handle disposals by returning objects to the pool.
Reference counting looks superficially good but usually doesn't make sense because you frequently have to dereference and update the count on two objects whenever you change a pointer value. This cost is usually worse than the advantage of having simple and fast memory management, and it also doesn't work in the presence of cyclic references.
Stack allocation is extremely fast and can run safely. Depending on your language, it is possible to make do without a heap and run entirely on a stack based system. However I suspect this will somewhat constrain your language design so that might be a non-starter. Still might be worth considering for certain DSLs.
Classic malloc/free is pretty fast and can be made safe if you have sufficient constraints on object creation and lifetime which you may be able to enforce in your language. An example would be if e.g. you placed significant constraints on the use of pointers.
Anyway - hope this is useful food for thought!

If speed matters but memory does not, then the fastest and simplest allocation strategy is to never free. Allocation is simply a matter of bumping a pointer up. You cannot get faster than that.
Of course, never releasing anything has a huge potential for overflowing available memory. It is very rare that memory is truly "unimportant". Usually there is a large but finite amount of available memory. One strategy is called "region based allocation". Namely you allocate memory in a few big blocks called "regions", with the pointer-bumping strategy. Release occurs only by whole regions. This strategy can be applied with some success if the problem at hand can be structured into successive "tasks", each having its own region.
For more generic solutions, if you want real-time allocation (i.e. guaranteed limits on the response time from allocation requests) then garbage collection is the way to go. A real-time GC may look like this: objects are allocated with a pointer-bumping strategy. Also, on every allocation, the allocator performs a little bit of garbage collection, in which "live" objects are copied somewhere else. In a way the GC runs "at the same time" than the application. This implies a bit of extra work for accessing objects, because you cannot move an object and update all pointers to point to the new object location while keeping the "real-time" promise. Solutions may imply barriers, e.g. an extra indirection. Generational GC allow for barrier-free access to most objects while keeping pause times under strict bounds.
This article is a must-read for whoever wants to study memory allocation, in particular garbage collection.

With C++ it's possible to make a heap allocation ONCE for your objects, then reuse that memory for subsequent objects, I've seen it work and it was blindingly fast.
It's only applicable to a certian set of problems, and it's difficult to do it right, but it is possible.
One of the joys of C++ is you have complete control over memory management, you can decide to use classic new/delete, or implement your own reference counting or Garbage Collection.
However - here be dragons - you really, really need to know what you're doing.

If memory doesn't matter, then what #Thomas says applies. Considering the gargantuan memory spaces of modern hardware, this may very well be a viable option -- it really depends on the process.
Manual memory management doesn't necessarily solve your problems directly, but it does give you complete control over WHEN memory events happen. Generic malloc, for example, is not an O(1) operation. It does all sorts of potentially horrible things in there, both within the heap managed by malloc itself as well as the operating system. For example, ya never know when "malloc(10)" may cause the VM to page something out, now your 10 bytes of RAM have an unknown disk I/O component -- oops! Even worse, that page out could be YOUR memory, which you'll need to immediately page back in! Now c = *p is a disk hit. YAY!
But if you are aware of these, then you can safely set up your code so that all of the time critical parts effectively do NO memory management, instead they work off of pre-allocated structures for the task.
With a GC system, you may have a similar option -- it depends on the collector. I don't think the Sun JVM, for example, has the ability to be "turned off" for short periods of time. But if you work with pre-allocated structures, and call all of your own code (or know exactly what's going on in the library routine you call), you probably have a good chance of not hitting the memory manager.
Because, the crux of the matter is that memory management is a lot of work. If you want to get rid of memory management, the write old school FORTRAN with ARRAYs and COMMON blocks (one of the reasons FORTRAN can be so fast). Of course, you can write "FORTRAN" in most any language.
With modern languages, modern GCs, etc., memory management has been pushed aside and become a "10%" problem. We are now pretty sloppy with creating garbage, copying memory, etc. etc., because the GCs et al make it easy for us to be sloppy. And for 90% of the programs, this is not an issue, so we don't worry about. Nowadays, it's a tuning issue, late in the process.
So, your best bet is set it all up at once, use it, then toss it all away. The "use it" part is where you will get consistent, reliable results (assuming enough memory on the system of course).

As an "alternative" to garbage collection, C++ specifically has smart pointers. boost::shared_ptr<> (or std::tr1::shared_ptr<>) works exactly like Python's reference counted garbage collection. In my eyes, shared_ptr IS garbage collection. (although you may need to do a few weak_ptr<> stuff to make sure that circular references don't happen)
I would argue that auto_ptr<> (or in C++0x, the unique_ptr<>...) is a viable alternative, with its own set of benefits and tradeoffs. Auto_ptr has a clunky syntax and can't be used in STL containers... but it gets the job done. During compile-time, you "move" the ownership of the pointer from variable to variable. If a variable owns the pointer when it goes out of scope, it will call its destructor and free the memory. Only one auto_ptr<> (or unique_ptr<>) is allowed to own the real pointer. (at least, if you use it correctly).
As another alternative, you can store everything on the stack and just pass references around to all the functions you need.
These alternatives don't really solve the general memory management problem that garbage collection solves. Nonetheless, they are efficient and well tested. An auto_ptr doesn't use any more space than the pointer did originally... and there is no overhead on dereferencing an auto_ptr. "Movement" (or assignment in Auto_ptr) has a tiny amount of overhead to keep track of the owner. I haven't done any benchmarks, but I'm pretty sure they're faster than garbage collection / shared_ptr.

If you truly want no pauses at all, disallow all memory allocation except for stack allocation, region-based buffers, and static allocation. Despite what you may have been told, malloc() can actually cause severe pauses if the free list becomes fragmented, and if you often find yourself building massive object graphs, naive manual free can and will lose to stop-and-copy; the only way to really avoid this is to amortize over preallocated pages, such as the stack or a bump-allocated pool that's freed all at once. I don't know how useful this is, but I know that the proprietary graphical programming language LabVIEW by default allocates a static region of memory for each subroutine-equivalent, requiring programmers to manually enable stack allocation; this is the kind of thing that's useful in a hard-real-time environment where you need absolute guarantees on memory usage.
If what you want is to make it easy to reason about pauses and give your developers control over allocation and placement, then there is already a language called Rust that has the same stated goals as your language; while not a completely safe language, it does have a safe subset, allowing you to create safe abstractions for raw bit-twiddling. It uses pointer type annotations to eliminate use-after-free bugs. It also doesn't have null pointers in safe code, because null pointers cost a billion dollars at least.
If bounded pauses are enough, though, there are a wide variety of algorithms that will work. If you really have a small working set compared to available memory, then I would recommend the MOS collector (aka the Train Algorithm), which collects incrementally and provably always makes progress toward freeing unreferenced objects.

It's a common fallacy that managed languages are not suitable for high performance low latency scenarios. Yes, with limited resources (such as an embedded platform) and sloppy programming you can shoot yourself in the foot just as spectacularly as with C++ (and that can be VERY VERY spectacular).
This problem has come whilst developing games in Java/C# and the solution was to utilise a memory pool and not let object die, hence not needing garbage collector to run when you don't expect it. This is really the same approach as with low latency unmanaged systems - TO TRY REALLY REALLY HARD NOT TO ALLOCATE MEMORY.
So, considering the fact that implementing such system in Java/C# is very similar to C++, the advantage of doing it the girly man way(managed), you have the "niceness" of other language features that free up your mental clock cycles to concentrate on important things.

When might you not want to use garbage collection? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Garbage collection has been around since the early days of LISP, and now - several decades on - most modern programming languages utilize it.
Assuming that you're using one of these languages, what reasons would you have to not use garbage collection, and instead manually manage the memory allocations in some way?
Have you ever had to do this?
Please give solid examples if possible.

I can think of a few:
Deterministic deallocation/cleanup
Real time systems
Not giving up half the memory or processor time - depending on the algorithm
Faster memory alloc/dealloc and application-specific allocation, deallocation and management of memory. Basically writing your own memory stuff - typically for performance sensitive apps. This can be done where the behavior of the application is fairly well understood. For general purpose GC (like for Java and C#) this is not possible.
EDIT
That said, GC has certainly been good for much of the community. It allows us to focus more on the problem domain rather than nifty programming tricks or patterns. I'm still an "unmanaged" C++ developer though. Good practices and tools help in that case.

Memory allocations? No, I think the GC is better at it than I am.
But scarce resource allocations, like file handles, database connections, etc.? I write the code to close those when I'm done. GC won't do that for you.

I do a lot of embedded development, where the question is more likely to be whether to use malloc or static allocation and garbage collection is not an option.
I also write a lot of PC-based support tools and will happily use GC where it is available & fast enough and it means that I don't have to use pedant::std::string.
I write a lot of compression & encryption code and GC performance is usually not good enough unless I really bend the implementation. GC also requires you to be very careful with address aliasing tricks. I normally write performance sensitive code in C and call it from Python / C# front ends.
So my answer is that there are reasons to avoid GC, but the reason is almost always performance and it's then best to code the stuff that needs it in another language rather than trying to trick the GC.
If I develop something in MSVC++, I never use garbage collection. Partly because it is non-standard, but also because I've grown up without GC in C++ and automatically design in safe memory reclamation. Having said this, I think that C++ is an abomination which fails to offer the translation transparency and predictability of C or the scoped memory safety (amongst other things) of later OO languages.

Real time applications are probably difficult to write with a garbage collector. Maybe with an incremental GC that works in another thread, but this is an additional overhead.

One case I can think of is when you are dealing with large data sets amounting to hundreads of megabytes or more. Depending on the situation you might want to free this memory as soon as you are done with it, so that other applications can use it.
Also, when dealing with some unmanaged code there might be a situation where you might want to prevent the GC from collecting some data because it's still being used by the unmanaged part. Though I still have to think of a good reason why simply keeping a reference to it might not be good enough. :P

One situation I've dealt with is image processing. While working on an algorithm for cropping images, I've found that managed libraries just aren't fast enough to cut it on large images or on multiple images at a time.
The only way to do processing on an image at a reasonable speed was to use non-managed code in my situation. This was while working on a small personal side-project in C# .NET where I didn't want to learn a third-party library because of the size of the project and because I wanted to learn it to better myself. There may have been an existing third-party library (perhaps Paint.NET) that could do it, but it still would require unmanaged code.

Two words: Space Hardening
I know its an extreme case, but still applicable. One of the coding standards that applied to the core of the Mars rovers actually forbid dynamic memory allocation. While this is indeed extreme, it illustrates a "deploy and forget about it with no worries" ideal.
In short, have some sense as to what your code is actually doing to someone's computer. If you do, and you are conservative .. then let the memory fairy take care of the rest. While you develop on a quad core, your user might be on something much older, with much less memory to spare.
Use garbage collection as a safety net, be aware of what you allocate.

There are two major types of real time systems, hard and soft. The main distinction is that hard real time systems require that an algorithm always finish in a particular time budget where as a soft system would like it to normally happen. Soft systems can potentially use well designed garbage collectors although a normal one would not be acceptable. However if a hard real time system algorithm did not complete in time then lives could be in danger. You will find such sorts of systems in nuclear reactors, aeroplanes and space shuttles and even then only in the specialist software that the operating systems and drivers are made of. Suffice to say this is not your common programming job.
People who write these systems don't tend to use general purpose programming languages. Ada was designed for the purpose of writing these sorts of real time systems. Despite being a special language for such systems in some systems the language is cut down further to a subset known as Spark. Spark is a special safety critical subset of the Ada language and one of the features it does not allow is the creation of a new object. The new keyword for objects is totally banned for its potential to run out of memory and its variable execution time. Indeed all memory access in Spark is done with absolute memory locations or stack variables and no new allocations on the heap is made. A garbage collector is not only totally useless but harmful to the guaranteed execution time.
These sorts of systems are not exactly common, but where they exist some very special programming techniques are required and guaranteed execution times are critical.

Just about all of these answers come down to performance and control. One angle I haven't seen in earlier posts is that skipping GC gives your application more predictable cache behavior in two ways.
In certain cache sensitive applications, having the language automatically trash your cache every once in a while (although this depends on the implementation) can be a problem.
Although GC is orthogonal to allocation, most implementations give you less control over the specifics. A lot of high performance code has data structures tuned for caches, and implementing stuff like cache-oblivious algorithms requires more fine grained control over memory layout. Although conceptually there's no reason GC would be incompatible with manually specifying memory layout, I can't think of a popular implementation that lets you do so.

Assuming that you're using one of these languages, what reasons would you have to not use garbage collection, and instead manually manage the memory allocations in some way?
Potentially, several possible reasons:
Program latency due to the garbage collector is unacceptably high.
Delay before recycling is unacceptably long, e.g. allocating a big array on .NET puts it in the Large Object Heap (LOH) which is infrequently collected so it will hang around for a while after it has become unreachable.
Other overheads related to garbage collection are unacceptably high, e.g. the write barrier.
The characteristics of the garbage collector are unnacceptable, e.g. redoubling arrays on .NET fragments the Large Object Heap (LOH) causing out of memory when 32-bit address space is exhausted even though there is theoretically plenty of free space. In OCaml (and probably most GC'd languages), functions with deep thread stacks run asymptotically slower. Also in OCaml, threads are prevented from running in parallel by a global lock on the GC so (in theory) parallelism can be achieved by dropping to C and using manual memory management.
Have you ever had to do this?
No, I have never had to do that. I have done it for fun. For example, I wrote a garbage collector in F# (a .NET language) and, in order to make my timings representative, I adopted an allocationless style in order to avoid GC latency. In production code, I have had to optimize my programs using knowledge of how the garbage collector works but I have never even had to circumvent it from within .NET, much less drop .NET entirely because it imposes a GC.
The nearest I have come to dropping garbage collection was dropping the OCaml language itself because its GC impedes parallelism. However, I ended up migrating to F# which is a .NET language and, consequently, inherits the CLR's excellent multicore-capable GC.

I don't quite understand the question. Since you ask about a language that uses GC, I assume you are asking for examples like
Deliberately hang on to a reference even when I know it's dead, maybe to reuse the object to satisfy a future allocation request.
Keep track of some objects and close them explicitly, because they hold resources that can't easily be managed with the garbage collector (open file descriptors, windows on the screen, that sort of thing).
I've never found a reason to do #1, but #2 is one that comes along occasionally. Many garbage collectors offer mechanisms for finalization, which is an action that you bind to an object and the system runs that action before the object is reclaimed. But oftentimes the system provides no guarantees about whether or if finalizers actually run, so finalization can be of limited utility.
The main thing I do in a garbage-collected language is to keep a tight watch on the number of allocations per unit of other work I do. Allocation is usually the performance bottleneck, especially in Java or .NET systems. It is less of an issue in languages like ML, Haskell, or LISP, which are typically designed with the idea that the program is going to allocate like crazy.
EDIT: longer response to comment.
Not everyone understands that when it comes to performance, the allocator and the GC must be considered as a team. In a state-of-the-art system, allocation is done from contiguous free space (the 'nursery') and is as quick as test and increment. But unless the object allocated is incredibly short-lived, the object incurs a debt down the line: it has to be copied out of the nursery, and if it lives a while, it may be copied through several generatations. The best systems use contiguous free space for allocation and at some point switch from copying to mark/sweep or mark/scan/compact for older objects. So if you're very picky, you can get away with ignoring allocations if
You know you are dealing with a state-of-the art system that allocates from continuous free space (a nursery).
The objects you allocate are very short-lived (less than one allocation cycle in the nursery).
Otherwise, allocated objects may be cheap initially, but they represent work that has to be done later. Even if the cost of the allocation itself is a test and increment, reducing allocations is still the best way to improve performance. I have tuned dozens of ML programs using state-of-the-art allocators and collectors and this is still true; even with the very best technology, memory management is a common performance bottleneck.
And you'd be surprised how many allocators don't deal well even with very short-lived objects. I just got a big speedup from Lua 5.1.4 (probably the fastest of the scripting language, with a generational GC) by replacing a sequence of 30 substitutions, each of which allocated a fresh copy of a large expression, with a simultaneous substitution of 30 names, which allocated one copy of the large expression instead of 30. Performance problem disappeared.

In video games, you don't want to run the garbage collector in between a game frame.
For example, the Big Bad is in front
of you and you are down to 10 life.
You decided to run towards the Quad
Damage powerup. As soon as you pick up
the powerup, you prepare yourself to
turn towards your enemy to fire with
your strongest weapon.
When the powerup disappeared, it would
be a bad idea to run the garbage
collector just because the game world
has to delete the data for the
powerup.
Video games usually manages their objects by figuring out what is needed in a certain map (this is why it takes a while to load maps with a lot of objects). Some game engines would call the garbage collector after certain events (after saving, when the engine detects there's no threat in the vicinity, etc).
Other than video games, I don't find any good reasons to turn off garbage collecting.
Edit: After reading the other comments, I realized that embedded systems and Space Hardening (Bill's and tinkertim's comments, respectively) are also good reasons to turn off the garbage collector

The more critical the execution, the more you want to postpone garbage collection, but the longer you postpone garbage collection, the more of a problem it will eventually be.
Use the context to determine the need:
1.
Garbage collection is supposed to protect against memory leaks
Do you need more state than you can manage in your head?
2.
Returning memory by destroying objects with no references can be unpredictable
Do you need more pointers than you can manage in your head?
3.
Resource starvation can be caused by garbage collection
Do you have more CPU and memory than you can manage in your head?
4.
Garbage collection cannot address files and sockets
Do you have I/O as your primary concern?
In systems that use garbage collection, weak pointers are sometimes used to implement a simple caching mechanism because objects with no strong references are deallocated only when memory pressure triggers garbage collection. However, with ARC, values are deallocated as soon as their last strong reference is removed, making weak references unsuitable for such a purpose.
References
GC FAQ
Smart Pointer Guidelines
Transitioning to ARC Release Notes
Accurate Garbage Collection with LLVM
Memory management in various languages
jwz on Garbage Collection
Apple Could Power the Web
How Do The Script Garbage Collectors Work?
Minimize Garbage Generation: GC is your Friend, not your Servant
Garbage Collection in IE6
Slow web browser performance when you view a Web page that uses JScript in Internet Explorer 6
Transitioning to ARC Release Notes: Which classes don’t support weak references?
Automatic Reference Counting: Weak References

Memory Allocation/Deallocation Bottleneck?

How much of a bottleneck is memory allocation/deallocation in typical real-world programs? Answers from any type of program where performance typically matters are welcome. Are decent implementations of malloc/free/garbage collection fast enough that it's only a bottleneck in a few corner cases, or would most performance-critical software benefit significantly from trying to keep the amount of memory allocations down or having a faster malloc/free/garbage collection implementation?
Note: I'm not talking about real-time stuff here. By performance-critical, I mean stuff where throughput matters, but latency doesn't necessarily.
Edit: Although I mention malloc, this question is not intended to be C/C++ specific.

It's significant, especially as fragmentation grows and the allocator has to hunt harder across larger heaps for the contiguous regions you request. Most performance-sensitive applications typically write their own fixed-size block allocators (eg, they ask the OS for memory 16MB at a time and then parcel it out in fixed blocks of 4kb, 16kb, etc) to avoid this issue.
In games I've seen calls to malloc()/free() consume as much as 15% of the CPU (in poorly written products), or with carefully written and optimized block allocators, as little as 5%. Given that a game has to have a consistent throughput of sixty hertz, having it stall for 500ms while a garbage collector runs occasionally isn't practical.

Nearly every high performance application now has to use threads to exploit parallel computation. This is where the real memory allocation speed killer comes in when writing C/C++ applications.
In a C or C++ application, malloc/new must take a lock on the global heap for every operation. Even without contention locks are far from free and should be avoided as much as possible.
Java and C# are better at this because threading was designed in from the start and the memory allocators work from per-thread pools. This can be done in C/C++ as well, but it isn't automatic.

First off, since you said malloc, I assume you're talking about C or C++.
Memory allocation and deallocation tend to be a significant bottleneck for real-world programs. A lot goes on "under the hood" when you allocate or deallocate memory, and all of it is system-specific; memory may actually be moved or defragmented, pages may be reorganized--there's no platform-independent way way to know what the impact will be. Some systems (like a lot of game consoles) also don't do memory defragmentation, so on those systems, you'll start to get out-of-memory errors as memory becomes fragmented.
A typical workaround is to allocate as much memory up front as possible, and hang on to it until your program exits. You can either use that memory to store big monolithic sets of data, or use a memory pool implementation to dole it out in chunks. Many C/C++ standard library implementations do a certain amount of memory pooling themselves for just this reason.
No two ways about it, though--if you have a time-sensitive C/C++ program, doing a lot of memory allocation/deallocation will kill performance.

In general the cost of memory allocation is probably dwarfed by lock contention, algorithmic complexity, or other performance issues in most applications. In general, I'd say this is probably not in the top-10 of performance issues I'd worry about.
Now, grabbing very large chunks of memory might be an issue. And grabbing but not properly getting rid of memory is something I'd worry about.
In Java and JVM-based languages, new'ing objects is now very, very, very fast.
Here's one decent article by a guy who knows his stuff with some references at the bottom to more related links:
http://www.ibm.com/developerworks/java/library/j-jtp09275.html

A Java VM will claim and release memory from the operating system pretty much indepdently of what the application code is doing. This allows it to grab and release memory in large chunks, which is hugely more efficient than doing it in tiny individual operations, as you get with manual memory management.
This article was written in 2005, and JVM-style memory management was already streets ahead. The situation has only improved since then.
Which language boasts faster raw
allocation performance, the Java
language, or C/C++? The answer may
surprise you -- allocation in modern
JVMs is far faster than the best
performing malloc implementations. The
common code path for new Object() in
HotSpot 1.4.2 and later is
approximately 10 machine instructions
(data provided by Sun; see Resources),
whereas the best performing malloc
implementations in C require on
average between 60 and 100
instructions per call (Detlefs, et.
al.; see Resources). And allocation
performance is not a trivial component
of overall performance -- benchmarks
show that many real-world C and C++
programs, such as Perl and
Ghostscript, spend 20 to 30 percent of
their total execution time in malloc
and free -- far more than the
allocation and garbage collection
overhead of a healthy Java
application.

In Java (and potentially other languages with a decent GC implementation) allocating an object is very cheap. In the SUN JVM it only needs 10 CPU Cycles. A malloc in C/c++ is much more expensive, just because it has to do more work.
Still even allocation objects in Java is very cheap, doing so for a lot of users of a web application in parallel can still lead to performance problems, because more Garbage Collector runs will be triggered.
Therefore there are those indirect costs of an allocation in Java caused by the deallocation done by the GC. These costs are difficult to quantify because they depend very much on your setup (how much memory do you have) and your application.

Allocating and releasing memory in terms of performance are relatively costly operations. The calls in modern operating systems have to go all the way down to the kernel so that the operating system is able to deal with virtual memory, paging/mapping, execution protection etc.
On the other side, almost all modern programming languages hide these operations behind "allocators" which work with pre-allocated buffers.
This concept is also used by most applications which have a focus on throughput.

I know I answered earlier, however, that was ananswer to the other answer's, not to your question.
To speak to you directly, if I understand correctly, your performance use case criteria is throughput.
This to me, means's that you should be looking almost exclusivly at NUMA aware allocators.
None of the earlier references; IBM JVM paper, Microquill C, SUN JVM. Cover this point so I am highly suspect of their application today, where, at least on the AMD ABI, NUMA is the pre-eminent memory-cpu governer.
Hands down; real world, fake world, whatever world... NUMA aware memory request/use technologies are faster. Unfortunately, I'm running Windows currently, and I have not found the "numastat" which is available in linux.
A friend of mine has written about this in depth in his implmentation for the FreeBSD kernel.
Dispite me being able to show at-hoc, the typically VERY large amount of local node memory requests on top of the remote node (underscoring the obvious performance throughput advantage), you can surly benchmark yourself, and that would likely be what you need todo as your performance charicterisitc is going to be highly specific.
I do know that in a lot of ways, at least earlier 5.x VMWARE faired rather poorly, at that time at least, for not taking advantage of NUMA, frequently demanding pages from the remote node. However, VM's are a very unique beast when it comes to memory compartmentailization or containerization.
One of the references I cited is to Microsoft's API implmentation for the AMD ABI, which has NUMA allocation specialized interfaces for user land application developers to exploit ;)
Here's a fairly recent analysis, visual and all, from some browser add-on developers who compare 4 different heap implmentations. Naturally the one they developed turns out on top (odd how the people who do the testing often exhibit the highest score's).
They do cover in some ways quantifiably, at least for their use case, what the exact trade off is between space/time, generally they had identified the LFH (oh ya and by the way LFH is simply a mode apparently of the standard heap) or similarly designed approach essentially consumes signifcantly more memory off the bat however over time, may wind up using less memory... the grafix are neat too...
I would think however that selecting a HEAP implmentation based on your typical workload after you well understand it ;) is a good idea, but to well understand your needs, first make sure your basic operations are correct before you optimize these odds and ends ;)

This is where c/c++'s memory allocation system works the best. The default allocation strategy is OK for most cases but it can be changed to suit whatever is needed. In GC systems there's not a lot you can do to change allocation strategies. Of course, there is a price to pay, and that's the need to track allocations and free them correctly. C++ takes this further and the allocation strategy can be specified per class using the new operator:
class AClass
{
public:
void *operator new (size_t size); // this will be called whenever there's a new AClass
void *operator new [] (size_t size); // this will be called whenever there's a new AClass []
void operator delete (void *memory); // if you define new, you really need to define delete as well
void operator delete [] (void *memory);define delete as well
};
Many of the STL templates allow you to define custom allocators as well.
As with all things to do with optimisation, you must first determine, through run time analysis, if memory allocation really is the bottleneck before writing your own allocators.

According to MicroQuill SmartHeap Technical Specification, "a typical application [...] spends 40% of its total execution time on managing memory". You can take this figure as an upper bound, i personally feel that a typical application spends more like 10-15% of execution time allocating/deallocating memory. It rarely is a bottleneck in single-threaded application.
In multithreaded C/C++ applications standard allocators become an issue due to lock contention. This is where you start to look for more scalable solutions. But keep in mind Amdahl's Law.

Pretty much all of you are off base if you are talking about the Microsoft heap. Syncronization is effortlessly handled as is fragmentation.
The current perferrred heap is the LFH, (LOW FRAGMENTATION HEAP), it is default in vista+ OS's and can be configured on XP, via gflag, with out much trouble
It is easy to avoid any locking/blocking/contention/bus-bandwitth issues and the lot with the
HEAP_NO_SERIALIZE
option during HeapAlloc or HeapCreate. This will allow you to create/use a heap without entering into an interlocked wait.
I would reccomend creating several heaps, with HeapCreate, and defining a macro, perhaps, mallocx(enum my_heaps_set, size_t);
would be fine, of course, you need realloc, free also to be setup as appropiate. If you want to get fancy, make free/realloc auto-detect which heap handle on it's own by evaluating the address of the pointer, or even adding some logic to allow malloc to identify which heap to use based on it's thread id, and building a heierarchy of per-thread heaps and shared global heap's/pools.
The Heap* api's are called internally by malloc/new.
Here's a nice article on some dynamic memory management issues, with some even nicer references. To instrument and analyze heap activity.

Others have covered C/C++ so I'll just add a little information on .NET.
In .NET heap allocation is generally really fast, as it it just a matter of just grabbing the memory in the generation zero part of the heap. Obviously this cannot go on forever, which is where garbage collection comes in. Garbage collection may affect the performance of your application significantly since user threads must be suspended during compaction of memory. The fewer full collects, the better.
There are various things you can do to affect the workload of the garbage collector in .NET. Generally if you have a lot of memory reference the garbage collector will have to do more work. E.g. by implementing a graph using an adjacency matrix instead of references between nodes the garbage collector will have to analyze fewer references.
Whether that is actually significant in your application or not depends on several factors and you should profile the application with actual data before turning to such optimizations.

Region based memory management

I'm designing a high level language, and I want it to have the speed of C++ (it will use LLVM), but be safe and high level like C#. Garbage collection is slow, and new/delete is unsafe. I decided to try to use "region based memory management" (there are a few papers about it on the web, mostly for functional languages). The only "useful" language using it is Cyclone, but that also has GC. Basically, objects are allocated on a lexical stack, and are freed when the block closes. Objects can only refer to other objects in the same region or higher, to prevent dangling references. To make this more flexible, I added parallel regions that can be moved up and down the stack, and retained through loops. The type system would be able to verify assignments in most cases, but low overhead runtime checks would be necessary in some places.
Ex:
region(A) {
Foo#A x=new Foo(); //x is deleted when this region closes.
region(B,C) while(x.Y) {
Bar#B n=new Bar();
n.D=x; //OK, n is in lower region than x.
//x.D=n; would cause error: x is in higher region than n.
n.DoSomething();
Bar#C m=new Bar();
//m.D=n; would cause error: m and n are parallel.
if(m.Y)
retain(C); //On the next iteration, m is retained.
}
}
Does this seem practical? Would I need to add non-lexically scoped, reference counted regions? Would I need to add weak variables that can refer to any object, but with a check on region deletion? Can you think of any algorithms that would be hard to use with this system or that would leak?

I would discourage you from trying regions. The problem is that in order to make regions guaranteed to be safe, you need a very sophisticated type system---I'm sure you've looked at the papers by Tofte and Talpin and you have an idea of the complexities involved. Even if you do get regions working successfully, the chances are very hight that your program will require a whose lifetime is the lifetime of the program---and that region at least has to be garbage collected. (This is why Cyclone has regions and GC.)
Since you're just getting started, I'd encourage you to go with garbage collection. Modern garbage collectors can be made pretty fast without a lot of effort. The main issue is to allocate from contiguous free space so that allocation is fast. It helps to be targeting AMD64 or other machine with spare registers so you can use a hardware register as the allocation pointer.
There are lots of good ideas to adapt; one of the easiest to implement is a page-based collector like Joel Bartlett's mostly-copying collector, where the idea is you allocate only from completely empty pages.
If you want to study existing garbage collectors, Lua has a fairly sophisticated incremental garbage collector (so there are no visible pause times) and the implementation is only 700 lines. It is fast enough to be used in a lot of games, where performance matters.

If I were implementing a language with region based memory management, I would probably read A language-independent framework for region inference. That said, it's been a while since I looked into this stuff, and I'm sure the state of the art has moved on, if I ever even knew what the state of the art was.

Well you should go study Apples memory management. It has release pools and zones, which sure sound a lot like what you're doing here.
I won't comment on the "GC is slow" remark,

You can start by Tofte and Talpin's papers about region-based memory management.

How would it return a dynamically created object? Who would "own" it and be responsible for freeing the memory?
Refcounting or GC are so common because they are almost always the best choices. Generational garbage collectors can be very efficient.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio