Confuse about browser garbage collection algorithm - algorithm

According to the Javascript: the definitive guide, there are two garbage collection ways:
the Mark-and-Sweep and Reference Counting, and in the early browser, garbage collection is performed by reference counting.
But why they turn to the Mark and sweep? I think the collection situation is the same, when one value is unreachable, its reference count is zero.
So what is the difference?

Reference counting is a simple heuristic for finding garbage objects, but it's not perfect. In particular, reference counting can lead to reference cycles, where a cycle of unreachable objects all point to one another. When this happens, all of the refcounts for the objects are nonzero, so the objects are never cleaned up, although the objects are indeed garbage. Firefox 2 used pure reference counting for its garbage collection, which over time led to enormous memory leaks as reference cycles began eating up all of memory.
Mark-and-sweep doesn't have this problem because it explicitly finds all reachable objects starting from all known reachable locations. It can handle reference cycles perfectly, though it's slower to run. The combination of reference counting (fast but inaccurate) plus mark-and-sweep (slow but perfect) is a good compromise.
That said, there are much better GC techniques out there. Search "generational garbage collection" for some of the better hybrid techniques.
Hope this helps!

Related

The research papers say it's possible to have real time reference counting AND cycle collection, but how?

Note: there should be a cycle-collection tag. Cycle collection is really the main topic here, but I don't have enough points to create a tag. Also I'm at the max number of tags already. Also a reference-cycles tag would make sense. That also doesn't exist.
I have a lot of ideas about what would make a better computer language, but I'm hung up on making a state of the art garbage collector for it.
I've noticed that Apple and Microsoft (for Window's 8) are moving to reference counting. Apparently people found that regular garbage collection required too much overhead memory and didn't like the thrashing when memory got too tight.
But if I want a programming style that isn't limited by loops in references then normal reference counting hardly seems like an improvement because scanning for reference loops and handling them properly requires algorithms with multiple passes over potential reference loops.
Now there are some papers which suggest that it's possible to scan for loops in parallel with program execution, but I get lost reading them.
For instance "Concurrent Cycle Collection in Reference Counted Systems by David F. Bacon and V.T. Rajan http://researcher.ibm.com/files/us-bacon/Bacon01Concurrent.pdf
"A Pure Reference Counting Garbage Collector by DAVID F. BACON, CLEMENT R. ATTANASIO, V.T. RAJAN, STEPHEN E. SMITH & HAN B. LEE"
and
"A Unified Theory of Garbage Collection by David F. Bacon, Perry Cheng and V.T. Rajan"
http://www.cs.virginia.edu/~cs415/reading/bacon-garbage.pdf
Can anyone explain to me how it's possible to find all loops, including connected ones, and on a trial basis, reduce reference counts by the self reference amount, and do it safely while objects are being mutated?
It seems like a tall order.
I remember the "mostly concurrent garbage collection" with it's rescanning of dirty objects or dirty pages but I'm not sure if this is the same sort of thing.
I've played a bit with scanning for loops and I've convinced myself that there is no mostly local algorithm that can keep track of loops, no matter how much memory you waste on it. It really is a non-local property. There's no such thing as tagging for loops.
Anyway does anyone understand the parallel algorithm? Can anyone explain it to me?
edit: http://researcher.ibm.com/files/us-bacon/Paz05Efficient.pdf
That paper looks promising. Though even assuming I can get to the point where I'm sure I understand it, non-blocking parallel algorithms are so hard that it's fairly common for published algorithms to be buggy. And fixing them is hard when it's possible at all, I know both of those facts from experience.
Also I want to be sure exactly what they mean by a "sliding view" :/

Garbage collecting a circular list in ruby

I'm learning about ruby's mark and sweep approach to garbage collecting. I bumped into a few threads here and there (and this article via a SO thread which I can no longer spot), but they seemed to apply to older versions of ruby and a the information in them wasn't always consistent. (As things stand I'm getting the impression that it's mostly reference counting.)
Might anyone with some unrstanding of ruby 1.9.2's internals be around to chime in, on whether ruby knew how to handle the trickier back references and circular references? (Ideally with a few details/good pointers on how it's actually implemented.)
Mark-and-sweep GC, like almost every algorithm commonly labeled as garbage collection save reference counting, handles circular references just fine. This has nothing to do with the specific implementation. Regardless of the actual GC used by Ruby 1.9, it won't have trouble with cycles. Here's a sketch of the approach of mark-and-sweep collectors, but be assured that other collection schemes handle cyclic references just as well.
Mark all things known to be always reachable ("roots", basically everything that's directly in scope - global variables, local variables, etc.)
Mark all not-yet-marked objects referenced by marked objects
Repeat 2 until no references from marked to not-yet-marked objects remain
Enumerate all objects allocated, deallocates those not marked
You see, a circle of references that's reachable "from the outside" doesn't lead to infinite recursion (we don't visit a given object's references more than once) and a circle of references that isn't reachable isn't marked as reachable and thus freed (each element independently) after marking.

Overhead of memory allocator

I've been creating a temporary object stack- mainly for the use of heap-based STL structures which only actually have temporary lifetimes, but any other temporary dynamically sized allocation too. The one stack performs all types- storing in an unrolled linked list.
I've come a cropper with alignment. I can get the alignment with std::alignment_of<T>, but this isn't really great, because I need the alignment of the next type I want to allocate. Right now, I've just arbitrarily sized each object at a multiple of 16, which as far as I know, is the maximal alignment for any x86 or x64 type. But now, I'm having two pointers of memory overhead per object, as well as the cost of allocating them in my vector, plus the cost of making every size round up to a multiple of 16.
On the plus side, construction and destruction is fast and reliable.
How does this compare to regular operator new/delete? And, what kind of test suites can I run? I'm pretty pleased with my current progress and don't want to find out later that it's bugged in some nasty subtle fashion, so any advice on testing the operations would be nice.
This doesn't really answer your question, but Boost has just recently added a memory pool library in the most recent version.
It may not be exactly what you want, but there is a thorough treatment of alignment which might spark an idea? If the docs are not enough, there is always the source code.

Gc using type information

Does anyone know of a GC algorithm which utilises type information to allow incremental collection, optimised collection, parallel collection, or some other nice feature?
By type information, I mean real semantics. Let me give an example: suppose we have an OO style class with methods to maintain a list which hide the representation. When the object becomes unreachable, the collector can just run down the list deleting all the nodes. It knows they're all unreachable now, because of encapsulation. It also knows there's no need to do a general scan of the nodes for pointers, because it knows all the nodes are the same type.
Obviously, this is a special case and easily handled with destructors in C++. The real question is whether there is way to analyse types used in a program, and direct the collector to use the resulting information to advantage. I guess you'd call this a type directed garbage collector.
The idea of at least exploiting containers for garbage collection in some way is not new, though in Java, you cannot generally assume that a container holds the only reference to objects within it, so your approach will not work in that context.
Here are a couple of references. One is for leak detection, and the other (from my research group) is about improving cache locality.
http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4814126
http://www.cs.umass.edu/~emery/pubs/06-06.pdf
You might want to visit Richard Jones's extensive garbage collection bibliography for more references, or ask the folks on gc-list.
I don't think it has anything to do with a specific algorithm.
When the GC computes the graph of objects relationship, the information that a Collection object is sole responsible for those elements of the list is implicitly present in the graph if the compiler was good enough to extract it.
Whatever the GC algorithm chosen: the information depends more on how the compiler/runtime will extract this information.
Also, I would avoid C and C++ with GC. Because of pointer arithmetic, aliasing and the possibility to point within an object (reference on a data member or in an array), it's incredibly hard to perform accurate garbage collection in these languages. They have not been crafted for it.

How does the Garbage Collection mechanism work? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
In a lay-man terminology how does the garbage collection mechanism work?
How an object is identified to be available for garbage collection?
Also, what do Reference Counting, Mark and Sweep, Copying, Train mean in GC algorithms?
When you use a language with garbage collection you wont get access to the memory directly. Rather you are given access to some abstraction on top of that data. One of the things that is properly abstracted away is the the actual location in memory of the data block, as well as pointers to other datablocks. When the garbage collector runs (this happens occasionally) it will check if you still hold a reference to each of the memory blocks it has allocated for you. If you don't it will free that memory.
The main difference between the different types of garbage collectors is their efficiency as well as any limitations on what kind of allocation schemes they can handle.
The simplest is properly reference counting. When ever you create a reference to an object an internal counter on that object is incremented, when you chance the reference or it is no longer in scope, the counter on the (former) target object is decremented. When this counter reaches zero, the object is no longer referred at all and can be freed.
The problem with reference counting garbage collectors is that they cannot deal with circular data. If object A has a reference to object B and that in turn has some (direct or indirect) reference to object A, they can never be freed, even if none of the objects in the chain are refereed outside the chain (and therefore aren't accessible to the program at all).
The Mark and sweep algorithm on the other hand can handle this. The mark and sweep algorithm works by periodically stopping the execution of the program, mark each item the program has allocated as unreachable. The program then runs through all the variables the program has and marks what they point to as reachable. If either of these allocations contain references to other data in the program, that data is then likewise marked as reachable, etc.
This is the mark part of the algorithm. At this point everything the program can access, no matter how indirectly, is marked as reachable and everything the program can't reach is marked as unreachable. The garbage collector can now safely reclaim the memory associated with the objects marked as unreachable.
The problem with the mark and sweep algorithm is that it isn't that efficient -- the entire program has to be stopped to run it, and a lot of the object references aren't going to change.
To improve on this, the mark and sweep algorithm can be extended with so called "generational garbage collection". In this mode objects that have been in the system for some number of garbage collections are promoted to the old generation, which is not checked that often.
This improves efficiency because objects tend to die young (think of a string being changed inside a loop, resulting in perhaps a lifetime of a few hundred cycles) or live very long (the objects used to represent the main window of an application, or the database connection of a servlet).
Much more detailed information can be found on wikipedia.
Added based on comments:
With the mark and sweep algorithm (as well as any other garbage collection algorithm except reference counting) the garbage collection do not run in the context of your program, since it has to be able to access stuff that your program is not capable of accessing directly. Therefore it is not correct to say that the garbage collector runs on the stack.
Reference counting - Each object has
a count which is incremented when
someone takes a reference to the
object, and decremented when someone
releases the reference. When the reference count goes to zero, the object is deleted. COM uses
this approach.
Mark and sweep - Each object has a flag if it is in use. Starting at the root of the object graph (global variables, locals on stacks, etc.) each referenced object gets its flag set, and so on down the chain. At the end, all objects that are not referenced in the graph are deleted.
The garbage collector for the CLR is described in this slidedeck. "Roots" on slide 15 are the sources for the objects that first go into the graph. Their member fields and so on are used to find the other objects in the graph.
Wikipedia describes several of these approaches in much more and better detail.
Garbage collection is simply knowing if there is any future need for variables in your program, and if not, collect and delete them.
Emphasis is on the word Garbage, something that is completely used out in your house is thrown in the trash and the garbage man handles it for you by coming to pick it up and take it away to give you more room in your house trash can.
Reference Counting, Mark and Sweep, Copying, Train etc. are discussed in good detail at GC FAQ
The general way it is done is that the number of references to an object are kept track of in the background, and when that number goes to zero, the object is SUBJECT TO garbage collection, however the GC will not fire up until it is explicitly needed because it is an expensive operation. What happens when it starts is that the GC goes through the managed area of memory and finds every object that has no references left. The gc deletes those objects by first calling their destructors, allowing them to clean up after themselves, then frees the memory. Commonly the GC will then compact the managed memory area by moving every surviving object to one area of memory, allowing more allocations to take place.
Like i said this is one method that i know of, and there is a lot of research being done in this area.
Garbage collection is a big topic, and there are a lot of ways to implement it.
But for the most common in a nutshell, the garbage collector keeps a record of all references to anything created via the new operator, even if that operator's use was hidden from you (for example, in a Type.Create() method). Each time you add a new reference to the object, the root of that reference is determined and added to the list, if needed. A reference is removed whenever it goes out of scope.
When there are no more references to an object, it can (not "will") be collected. To improve performance and make sure necessary cleanup is done correctly, collections are batched for several objects at once and happen over multiple generations.

Resources