What is the Go language garbage collection approach compared to others? - performance

I do not know much about the Go programming language, but I have seen several claims that said Go has latency-free garbage collection, and it is much better than other garbage collectors (like JVM garbage collector). I have developed application for JVM and i know that JVM garbage collector is not latency-free (specially in large memory usage).
I was wondering, what is difference between the garbage collection approach in Go and and the others which make it latency-free?
Go does not have latency-free garbage collection. If you can point out where those claims are, I'd like to try to correct them.
One advantage that we believe Go has over Java is that it gives you more control over memory layout. For example, a simple 2D graphics package might define:
type Rect struct {
Min Point
Max Point
type Point struct {
X int
Y int
In Go, a Rect is just four integers contiguous in memory. You can still pass &r.Max to function expecting a *Point, that's just a pointer into the middle of the Rect variable r.
In Java, the equivalent expression would be to make Rect and Point classes, in which case the Min and Max fields in Rect would be pointers to separately allocated objects. This requires more allocated objects, taking up more memory, and giving the garbage collector more to track and more to do. On the other hand, it does avoid ever needing to create a pointer to the middle of an object.
Compared to Java, then, Go gives you the programmer more control over memory layout, and you can use that control to reduce the load on the garbage collector. That can be very important in programs with large amounts of data. Control over memory layout may also be important for extracting performance from the hardware due to cache effects and such, but that's tangential to the original question.
The collector in the current Go distributions is reasonable but by no means state of the art. We have plans to spend more effort improving it over the next year or two. To be clear,
Go's garbage collector is certainly not as good as modern Java garbage collectors, but we believe it is easier in Go to write programs that don't need as much garbage collection to begin with, so the net effect can still be that garbage collection is less of an issue in a Go program than in an equivalent Java program.


Why don't Haskell compilers facilitate deterministic memory management?

With the wealth of type information available why can't Haskell runtimes avoid running GC to clean up? It should be possible to figure out all usages and insert appropriate calls to alloc/release in the compiled code, right? This would avoid the overhead of a runtime GC.
It is sensible to ask whether functional programming languages can do less GC by tracking usage. Although the general problem of whether some data can safely be discarded is undecidable (because of conditional branching), it's surely plausible to work harder statically and find more opportunities for direct deallocation.
It's worth paying attention to the work of Martin Hofmann and the team on the Mobile Resource Guarantees project, who made type-directed memory (de/re)allocation a major theme. The thing that makes their stuff work, though, is something Haskell doesn't have in its type system --- linearity. If you know that a function's input data are secret from the rest of the computation, you can reallocate the memory they occupy. The MRG stuff is particularly nice because it manages a realistic exchange rate between deallocation for one type and allocation for another which turns into good old-fashioned pointer-mangling underneath a purely functional exterior. In fact, lots of lovely parsimonious mutation algorithms (e.g. pointer-reversing traversal, overwrite-the-tail-pointer construction, etc) can be made to look purely functional (and checked for nasty bugs) using these techniques.
In effect, the linear typing of resources gives a conservative but mechanically checkable approximation to the kind of usage analysis that might well help to reduce GC. Interesting questions then include how to mix this treatment cleanly (deliberate adverb choice) with the usual persistent deal. It seems to me that quite a lot of intermediate data structures has an initial single-threaded phase in recursive computation, before being either shared or dropped when the computation finishes. It may be possible to reduce the garbage generated by such processes.
TL;DR There are good typed approaches to usage analysis which cut GC, but Haskell has the wrong sort of type information just now to be particularly useful for this purpose.
Region-based memory management is what programmers in C and C++ often end up programming by hand: Allocate a chunk of memory ("region", "arena", etc.), allocate the individual data in it, use them and eventually deallocate the whole chunk when you know none of the individual data are needed any more. Work in the 90s by Tofte, Aiken, and others (incl. yours truly, with our respective colleagues), has shown that it is possible to statically infer region allocation and deallocation points automatically ("region inference") in such a way as to guarantee that chunks are never freed too early and, in practice, early enough to avoid having too much memory being held for long after the last data in it was needed. The ML Kit for Regions, for example, is a full Standard ML compiler based on region inference. In its final version it is combined with intra-region garbage collection: If the static inference shows there is a long-living region, use garbage collection inside it. You get to have your cake and eat it, too: You have garbage collection for long living data, but a lot of data is managed like a stack, even though it would end in a heap ordinarily.
Consider the following pseudo-code:
func a = if some_computation a
then a
else 0
To know whether a is garbage after calling func a, the compiler has to be able to know the result of some_computation a. If it could do that in the general case (which requires solving the halting problem), thrn there'd be no need to emit code for this function at all, let alone garbage collect it. Type information is not sufficient.
It's not easy to determine object lifetime with lazy evaluation. The JHC compiler does have (had?) region memory management which tries to release memory by deallocation when the lifetime is over.
I'm also curious exactly what you mean by deterministic memory management.
Type information has mostly to do with compile time where as memory management is a runtime thing, so I don't think they are related to each other.

Overhead of memory allocator

I've been creating a temporary object stack- mainly for the use of heap-based STL structures which only actually have temporary lifetimes, but any other temporary dynamically sized allocation too. The one stack performs all types- storing in an unrolled linked list.
I've come a cropper with alignment. I can get the alignment with std::alignment_of<T>, but this isn't really great, because I need the alignment of the next type I want to allocate. Right now, I've just arbitrarily sized each object at a multiple of 16, which as far as I know, is the maximal alignment for any x86 or x64 type. But now, I'm having two pointers of memory overhead per object, as well as the cost of allocating them in my vector, plus the cost of making every size round up to a multiple of 16.
On the plus side, construction and destruction is fast and reliable.
How does this compare to regular operator new/delete? And, what kind of test suites can I run? I'm pretty pleased with my current progress and don't want to find out later that it's bugged in some nasty subtle fashion, so any advice on testing the operations would be nice.
This doesn't really answer your question, but Boost has just recently added a memory pool library in the most recent version.
It may not be exactly what you want, but there is a thorough treatment of alignment which might spark an idea? If the docs are not enough, there is always the source code.

Gc using type information

Does anyone know of a GC algorithm which utilises type information to allow incremental collection, optimised collection, parallel collection, or some other nice feature?
By type information, I mean real semantics. Let me give an example: suppose we have an OO style class with methods to maintain a list which hide the representation. When the object becomes unreachable, the collector can just run down the list deleting all the nodes. It knows they're all unreachable now, because of encapsulation. It also knows there's no need to do a general scan of the nodes for pointers, because it knows all the nodes are the same type.
Obviously, this is a special case and easily handled with destructors in C++. The real question is whether there is way to analyse types used in a program, and direct the collector to use the resulting information to advantage. I guess you'd call this a type directed garbage collector.
The idea of at least exploiting containers for garbage collection in some way is not new, though in Java, you cannot generally assume that a container holds the only reference to objects within it, so your approach will not work in that context.
Here are a couple of references. One is for leak detection, and the other (from my research group) is about improving cache locality.
You might want to visit Richard Jones's extensive garbage collection bibliography for more references, or ask the folks on gc-list.
I don't think it has anything to do with a specific algorithm.
When the GC computes the graph of objects relationship, the information that a Collection object is sole responsible for those elements of the list is implicitly present in the graph if the compiler was good enough to extract it.
Whatever the GC algorithm chosen: the information depends more on how the compiler/runtime will extract this information.
Also, I would avoid C and C++ with GC. Because of pointer arithmetic, aliasing and the possibility to point within an object (reference on a data member or in an array), it's incredibly hard to perform accurate garbage collection in these languages. They have not been crafted for it.

In a lay-man terminology how does the garbage collection mechanism work?
How an object is identified to be available for garbage collection?
Also, what do Reference Counting, Mark and Sweep, Copying, Train mean in GC algorithms?
When you use a language with garbage collection you wont get access to the memory directly. Rather you are given access to some abstraction on top of that data. One of the things that is properly abstracted away is the the actual location in memory of the data block, as well as pointers to other datablocks. When the garbage collector runs (this happens occasionally) it will check if you still hold a reference to each of the memory blocks it has allocated for you. If you don't it will free that memory.
The main difference between the different types of garbage collectors is their efficiency as well as any limitations on what kind of allocation schemes they can handle.
The simplest is properly reference counting. When ever you create a reference to an object an internal counter on that object is incremented, when you chance the reference or it is no longer in scope, the counter on the (former) target object is decremented. When this counter reaches zero, the object is no longer referred at all and can be freed.
The problem with reference counting garbage collectors is that they cannot deal with circular data. If object A has a reference to object B and that in turn has some (direct or indirect) reference to object A, they can never be freed, even if none of the objects in the chain are refereed outside the chain (and therefore aren't accessible to the program at all).
The Mark and sweep algorithm on the other hand can handle this. The mark and sweep algorithm works by periodically stopping the execution of the program, mark each item the program has allocated as unreachable. The program then runs through all the variables the program has and marks what they point to as reachable. If either of these allocations contain references to other data in the program, that data is then likewise marked as reachable, etc.
This is the mark part of the algorithm. At this point everything the program can access, no matter how indirectly, is marked as reachable and everything the program can't reach is marked as unreachable. The garbage collector can now safely reclaim the memory associated with the objects marked as unreachable.
The problem with the mark and sweep algorithm is that it isn't that efficient -- the entire program has to be stopped to run it, and a lot of the object references aren't going to change.
To improve on this, the mark and sweep algorithm can be extended with so called "generational garbage collection". In this mode objects that have been in the system for some number of garbage collections are promoted to the old generation, which is not checked that often.
This improves efficiency because objects tend to die young (think of a string being changed inside a loop, resulting in perhaps a lifetime of a few hundred cycles) or live very long (the objects used to represent the main window of an application, or the database connection of a servlet).
Much more detailed information can be found on wikipedia.
Added based on comments:
With the mark and sweep algorithm (as well as any other garbage collection algorithm except reference counting) the garbage collection do not run in the context of your program, since it has to be able to access stuff that your program is not capable of accessing directly. Therefore it is not correct to say that the garbage collector runs on the stack.
Reference counting - Each object has
a count which is incremented when
someone takes a reference to the
object, and decremented when someone
releases the reference. When the reference count goes to zero, the object is deleted. COM uses
this approach.
Mark and sweep - Each object has a flag if it is in use. Starting at the root of the object graph (global variables, locals on stacks, etc.) each referenced object gets its flag set, and so on down the chain. At the end, all objects that are not referenced in the graph are deleted.
The garbage collector for the CLR is described in this slidedeck. "Roots" on slide 15 are the sources for the objects that first go into the graph. Their member fields and so on are used to find the other objects in the graph.
Wikipedia describes several of these approaches in much more and better detail.
Garbage collection is simply knowing if there is any future need for variables in your program, and if not, collect and delete them.
Emphasis is on the word Garbage, something that is completely used out in your house is thrown in the trash and the garbage man handles it for you by coming to pick it up and take it away to give you more room in your house trash can.
Reference Counting, Mark and Sweep, Copying, Train etc. are discussed in good detail at GC FAQ
The general way it is done is that the number of references to an object are kept track of in the background, and when that number goes to zero, the object is SUBJECT TO garbage collection, however the GC will not fire up until it is explicitly needed because it is an expensive operation. What happens when it starts is that the GC goes through the managed area of memory and finds every object that has no references left. The gc deletes those objects by first calling their destructors, allowing them to clean up after themselves, then frees the memory. Commonly the GC will then compact the managed memory area by moving every surviving object to one area of memory, allowing more allocations to take place.
Like i said this is one method that i know of, and there is a lot of research being done in this area.
Garbage collection is a big topic, and there are a lot of ways to implement it.
But for the most common in a nutshell, the garbage collector keeps a record of all references to anything created via the new operator, even if that operator's use was hidden from you (for example, in a Type.Create() method). Each time you add a new reference to the object, the root of that reference is determined and added to the list, if needed. A reference is removed whenever it goes out of scope.
When there are no more references to an object, it can (not "will") be collected. To improve performance and make sure necessary cleanup is done correctly, collections are batched for several objects at once and happen over multiple generations.

What are your strategies to keep the memory usage low?

Ruby is truly memory-hungry - but also worth every single bit.
What do you do to keep the memory usage low? Do you avoid big strings and use smaller arrays/hashes instead or is it no problem to concern about for you and let the garbage collector do the job?
Edit: I found a nice article about this topic here - old but still interesting.
I've found Phusion's Ruby Enterprise Edition (a fork of mainline Ruby with much-improved garbage collection) to make a dramatic difference in memory usage... Plus, they've made it extraordinarily easy to install (and to remove, if you find the need).
You can find out more and download it on their website.
I really don't think it matters all that much.
Making your code less readable in order to improve memory consumption is something you should only ever do if you need it. And by need, I mean have a specific case for the performance profile and specific metrics that indicate that any change will address the issue.
If you have an application where memory is going to be the limiting factor, then Ruby may not be the best choice. That said, I have found that my Rails apps generally consume about 40-60mb of RAM per Mongrel instance. In the scheme of things, this isn't very much.
You might be able to run your application on the JVM with JRuby - the Ruby VM is currently not as advanced as the JVM for memory management and garbage collection. The 1.9 release is adding many improvements and there are alternative VM's under development as well.
Choose date structures that are efficient representations, scale well, and do what you need.
Use algorithms that work using efficient data structures rather than bloated, but easier ones.
Look else where. Ruby has a C bridge and its much easier to be memory conscious in C than in Ruby.
Ruby developers are quite lucky since they don’t have to manage the memory themselves.
Be aware that ruby allocates objects, for instance something as simple as
100.times{ 'foo' }
allocates 100 string objects (strings are mutable and each version requires its own memory allocation).
Make sure that if you are using a library allocating a lot of objects, that other alternatives are not available and your choice is worth paying the garbage collector cost. (you might not have a lot of requests/s or might not care for a few dozen ms per requests).
Creating a hash object really allocates more than an object, for instance
{'joe' => 'male', 'jane' => 'female'}
doesn’t allocate 1 object but 7. (one hash, 4 strings + 2 key strings)
If you can use symbol keys as they won’t be garbage collected. However because they won’t be garbage collected you want to make sure to not use totally dynamic keys like converting the username to a symbol, otherwise you will ‘leak’ memory.
Example: Somewhere in your app, you apply a to_sym on an user’s name like :
hash[current_user.name.to_sym] = something
When you have hundreds of users, that’s could be ok, but what is happening if you have one million of users ? Here are the numbers :
ruby-1.9.2-head >
# Current memory usage : 6608K
# Now, add one million randomly generated short symbols
ruby-1.9.2-head > 1000000.times { (Time.now.to_f.to_s).to_sym }
# Current memory usage : 153M, even after a Garbage collector run.
# Now, imagine if symbols are just 20x longer than that ?
ruby-1.9.2-head > 1000000.times { (Time.now.to_f.to_s * 20).to_sym }
# Current memory usage : 501M
Be aware to never convert non controlled arguments in symbol or check arguments before, this can easily lead to a denial of service.
Also remember to avoid nested loops more than three levels deep because it makes the maintenance difficult. Limiting nesting of loops and functions to three levels or less is a good rule of thumb to keep the code performant.
Here are some links in regards:
When deploying a Rails/Rack webapp, use REE or some other copy-on-write friendly interpreter.
Tweak the garbage collector (see https://www.engineyard.com/blog/tuning-the-garbage-collector-with-ruby-1-9-2 for example)
Try to cut down the number of external libraries/gems you use since additional code uses memory.
If you have a part of your app that is really memory-intensive then it's maybe worth rewriting it in a C extension or completing it by invoking other/faster/better optimized programs (if you have to process vast amounts of text data, maybe you can replace that code with calls to grep, awk, sed etc.)
I am not a ruby developer but I think some techniques and methods are true of any language:
Use the minimum size variable suitable for the job
Destroy and close variables and connections when not in use
However if you have an object you will need to use many times consider keeping it in scope
Any loops with manipulations of a big string dp the work on a smaller string and then append to bigger string
Use decent (try catch finally) error handling to make sure objects and connections are closed
When dealing with data sets only return the minimum necessary
Other than in extreme cases memory usage isn't something to worry about. The time you spend trying to reduce memory usage will buy a LOT of gigabytes.
Take a look at Small Memory Software - Patterns for Systems with Limited Memory. You don't specify what sort of memory constraint, but I assume RAM. While not Ruby-specific, I think you'll find some useful ideas in this book - the patterns cover RAM, ROM and secondary storage, and are divided into major techniques of small data structures, memory allocation, compression, secondary storage, and small architecture.
The only thing we've ever had which has actually been worth worrying about is RMagick.
The solution is to make sure you're using RMagick version 2, and call Image#destroy! when you're done using your image
Avoid code like this:
str = ''
veryLargeArray.each do |foo|
str += foo
# but str << foo is fine (read update below)
which will create each intermediate string value as a String object and then remove its only reference on the next iteration. This junks up the memory with tons of increasingly long strings that have to be garbage collected.
Instead, use Array#join:
str = veryLargeArray.join('')
This is implemented in C very efficiently and doesn't incur the String creation overhead.
UPDATE: Jonas is right in the comment below. My warning holds for += but not <<.
I'm pretty new at Ruby, but so far I haven't found it necessary to do anything special in this regard (that is, beyond what I just tend to do as a programmer generally). Maybe this is because memory is cheaper than the time it would take to seriously optimize for it (my Ruby code runs on machines with 4-12 GB of RAM). It might also be because the jobs I'm using it for are not long-running (i.e. it's going to depend on your application).
I'm using Python, but I guess the strategies are similar.
I try to use small functions/methods, so that local variables get automatically garbage collected when you return to the caller.
In larger functions/methods I explicitly delete large temporary objects (like lists) when they are no longer needed. Closing resources as early as possible might help too.
Something to keep in mind is the life cycle of your objects. If you're objects are not passed around that much, the garbage collector will eventually kick in and free them up. However, if you keep referencing them it may require some cycles for the garbage collector to free them up. This is particularly true in Ruby 1.8, where the garbage collector uses a poor implementation of the mark and sweep technique.
You may run into this situation when you try to apply some "design patterns" like decorator that keep objects in memory for a long time. It may not be obvious when trying example in isolation, but in real world applications where thousands of objects are created at the same time the cost of memory growth will be significant.
When possible, use arrays instead of other data structures. Try not to use floats when integers will do.
Be careful when using gem/library methods. They may not be memory optimized. For example, the Ruby PG::Result class has a method 'values' which is not optimized. It will use a lot of extra memory. I have yet to report this.
Replacing malloc(3) implementation to jemalloc will immediately decrease your memory consumption up to 30%. I've created 'jemalloc' gem to achieve this instantly.
'jemalloc' GEM: Inject jemalloc(3) into your Ruby app in 3 min
I try to keep arrays & lists & datasets as small as possible. The individual object do not matter much, as creation and garbage collection is pretty fast in most modern languages.
In the cases you have to read some sort of huge dataset from the database, make sure to read in a forward/only manner and process it in little bits instead og loading everything into memory first.
dont use a lot of symbols, they stay in memory until the process gets killed.. this because symbols never get garbage collected.
