In MRI Ruby 2.0, how thorough is GC.start?
Does it try to garbage collect all objects that no longer have a reference to them? Or does it only GC objects if it thinks it's necessary?
I'm trying to track the number of objects of a certain class I have, and it seems to keep on increasing even though I think some of the objects no longer have a reference to them. Using GC.start doesn't fix this. I don't use any C extensions, so that can't be complicating things.
Edit: The problem I was having was the same as in Ruby Symbol#to_proc leaks references in 1.9.2-p180? - objects still existed when I thought they ought to have been garbage collected, and like in that case, the problem was to do with using implicit Symbol -> Proc. However, it would be nice to know if GC.start is expected to garbage collect all objects, or merely collect whatever MRI thinks is necessary to garbage collect.
Related
I've an application runs on k8s that uses ruby v2.7.4, I'm tryning to have a look on some environment variables that may enhance the performance of my application. Can you help me to understand the below parameters and how to calculate the right value ?
WEB_CONCURRENCY
RUBY_GC_MALLOC_LIMIT
RUBY_GC_MALLOC_LIMIT_MAX
RUBY_GC_OLDMALLOC_LIMIT
RUBY_GC_OLDMALLOC_LIMIT_MAX
Thanks
Most are Garbage Collection Settings
Just don't. Unless you have very specific problems around garbage collection or odd memory constraints because you're running an embedded system, you shouldn't have to worry about garbage collection at all, especially on newer Rubies. You can find most of the values you're looking for in GC#stat, but I have no idea where you're getting "WEB_CONCURRENCY" from. That one is likely tied to your web server rather than Ruby's GC module or any known Ruby environment variable, so you're going to have to figure that one out some other way.
If you're having trouble with memory usage in Ruby, the problem is most often tied to objects that never go out of scope and therefore never get garbage collected. There are many better ways to optimize most Ruby applications than messing around with GC settings, but if you do have a valid use case, the GC module is where you should start.
I am trying to use the GDAL bindings to create geographic datasets in a Ruby on Rails app. However, GDAL only flushes those datasets on disk when the corresponding Ruby objects are destroyed. This (unanswered) question provides a nice explanation of what I am facing.
I tried setting every variable to nil and manually running GC.start, but as I understand it the Ruby GC is somewhat asynchronous (tell me if I'm wrong there, as I have limited Ruby experience), so this doesn't work all the time.
Is there a way to force a synchronous garbage collection so that I can be absolutely certain that my objects are destroyed when it is done?
Note that I would vastly prefer using GDAL over other libraries, as I have a large existing Python codebase that also uses the GDAL bindings, and the Python to Ruby translation is (or should be) relatively painless.
Suppose there is simple object like:
object = Object.new
As I know this creates Object in memory (RAM).
Is there a way to delete this object from RAM?
Other than hacking the underlying C code, no. Garbage collection is managed by the runtime so you don't have to worry about it. Here is a decent reference on the algorithm in Ruby 2.0.
Once you have no more references to the object in memory, the garbage collector will go to work. You should be fine.
The simple answer is, let the GC (garbage collector) do its job.
When you are ready to get rid of that reference, just do object = nil. And don't make reference to the object.
The garbage collector will eventually collect that and clear the reference.
(from ruby site)
=== Implementation from GC
------------------------------------------------------------------------------
GC.start -> nil
GC.start(full_mark: true, immediate_sweep: true) -> nil
------------------------------------------------------------------------------
Initiates garbage collection, unless manually disabled.
This method is defined with keyword arguments that default to true:
def GC.start(full_mark: true, immediate_sweep: true); end
Use full_mark: false to perform a minor GC. Use immediate_sweep: false to
defer sweeping (use lazy sweep).
Note: These keyword arguments are implementation and version dependent. They
are not guaranteed to be future-compatible, and may be ignored if the
underlying implementation does not support them.
Ruby Manages Garbage Collection Automatically
For the most part, Ruby handles garbage collection automatically. There are some edge cases, of course, but in the general case you should never have to worry about garbage collection in a typical Ruby application.
Implementation details of garbage collection vary between versions of Ruby, but it exposes very few knobs to twiddle and for most purposes you don't need them. If you find yourself under memory pressure, you may want to re-evaluate your design decisions rather than trying to manage the symptom of excess memory consumption.
Manually Trigger Garbage Collection
In general terms, Ruby marks objects for garbage collection when they go out of scope or are no longer referenced. However, some objects such as Symbols never get collected, and persist for the entire run-time of your program.
You can manually trigger garbage collection with GC#start, but can't really free blocks of memory the way you can with C programs from within Ruby. If you find yourself needing to do this, you may want to solve the underlying X/Y problem rather than trying to manage memory directly.
You can't explicitly destroy object. Ruby has automatic memory management. Objects no longer referenced from anywhere are automatically collected by the garbage collector built in the interpreter.
Good article to read on how to do allocation wisely, and a few tools you can use to fine tune.
http://merbist.com/2010/07/29/object-allocation-why-you-should-care/
Short version: The default inspect method for a class displays the object's address.* How can I do this in a custom inspect method of my own?
*(To be clear, I want the 8-digit hex number you would normally get from inspect. I don't care about the actual memory address. I'm just calling it a memory address because it looks like one. I know Ruby is memory-safe.)
Long version: I have two classes, Thing and ThingList. ThingList is a subclass of Array specifically designed to hold Things. Due to the nature of Things and the way they are used in my program, Things have an instance variable #container that points back to the ThingList that holds the Thing.
It is possible for two Things to have exactly the same data. Therefore, when I'm debugging the application, the only way I can reliably differentiate between two Things is to use inspect, which displays their address. When I inspect a Thing, however, I get pages upon pages of output because inspect will recursively inspect #container, causing every Thing in the list to be inspected as well!
All I need is the first part of that output. How can I write a custom inspect method on Thing that will just display this?
#<Thing:0xb7727704>
EDIT: I just realized that the default to_s does exactly this. I didn't notice this earlier because I have a custom to_s that provides human-readable details about the object.
Assume that I cannot use to_s, and that I must write a custom inspect.
You can get the address using object_id and multiplying it by 2* and display it in hex using sprintf (aka %):
"#<Thing:0x%08x>" % (object_id * 2)
Of course, as long as you only need the number to be unique and don't care that it's the actual address, you can just leave out the * 2.
* For reasons that you don't need to understand (meaning: I don't understand them), object_id returns half the object's memory address, so you need to multiply by 2 to get the actual address.
This is impossible. There is no way in Ruby to get the memory address of an object, since Ruby is a memory-safe language which has (by design) no methods for accessing memory directly. In fact, in many implementations of Ruby, objects don't even have a memory address. And in most of the implementations that do map objects directly to memory, the memory address potentially changes after every garbage collection.
The reason why using the memory address as an identifier in current versions of MRI and YARV accidentally works, is because they have a crappy garbage collector implementation that never defragments memory. All other implementations have garbage collectors which do defragment memory, and thus move objects around in memory, thereby changing their address.
If you tie your implementation to the memory address, your code will only ever work on slow implementations with crappy garbage collectors. And it isn't even guaranteed that MRI and YARV will always have crappy garbage collectors, in fact, in both implementations the garbage collector has been identified as one of the major performance bottlenecks and it is safe to assume that there will be changes to the garbage collectors. There are already some major changes to YARV's garbage collector in the SVN, which will be part of YARV 1.9.3 and YARV 2.0.
If you want an ID for objects, use Object#object_id.
Instead of subclassing Array your class instances could delegate to one for the desired methods so that you don't inherit the overridden inspect method.
Killing the processs while obtaining this information would be fine.
A quick-and-dirty way would be ObjectSpace.each_object{|e| p e}. You could do some tests to determine what you wanted to keep, or Marshal the objects.
For 1.9.2/1.9.3 there's heap_dump gem, it can be injected into a running process using gdb (but more stable was is to include it in process itself, no performance overhead)
It dumps references to objects, not objects themselves, but this is usable if you're into fighting leaks
For the more hardcore there is also BleakHouse which gives you a special custom-compiled copy of ruby with better memory leak tracking powarz