Difference between local and instance variables in ruby - ruby

I am working on a script that creates several fairly complex nested hash datastructures and then iterates through them conditionally creating database records. This is a standalone script using active record. After several minutes of running I noticed a significant lag in server responsiveness and discovered that the script, while being set to be nice +19, was enjoying a steady %85 - %90 total server memory.
In this case I am using instance variables simply for readability. It helps knowing what is going to be re-used outside of the loop vs. what won't. Is there a reason to not use instance variables when they are not needed? Are there differences in memory allocation and management between local and instance variables? Would it help setting #variable = nil when its no longer needed?

An instance variable continues to exist for the life of the object that holds it. A local variable exists only within a single method, block or module body.
If you're assuming objects in your object's instance variables will be garbage-collected just because you don't intend to refer to them in the future, that isn't how it works. The garbage collector only knows whether there's a reachable reference to the object — and there is if it's in an instance variable.

Setting #variable = nil destroys the reference to the object that the instance variable once pointed to. When there are no more remaining references to an object, it should be collected by the garbage collector eventually. I say "eventually" because the GC is somewhat unpredictable. However, it's easy to have a memory leak by a "dangling reference" and possibly (depending on how the GC is implemented) circular references. What else refers to this object?

Related

How to destroy Ruby object?

Suppose there is simple object like:
object = Object.new
As I know this creates Object in memory (RAM).
Is there a way to delete this object from RAM?
Other than hacking the underlying C code, no. Garbage collection is managed by the runtime so you don't have to worry about it. Here is a decent reference on the algorithm in Ruby 2.0.
Once you have no more references to the object in memory, the garbage collector will go to work. You should be fine.
The simple answer is, let the GC (garbage collector) do its job.
When you are ready to get rid of that reference, just do object = nil. And don't make reference to the object.
The garbage collector will eventually collect that and clear the reference.
(from ruby site)
=== Implementation from GC
------------------------------------------------------------------------------
GC.start -> nil
GC.start(full_mark: true, immediate_sweep: true) -> nil
------------------------------------------------------------------------------
Initiates garbage collection, unless manually disabled.
This method is defined with keyword arguments that default to true:
def GC.start(full_mark: true, immediate_sweep: true); end
Use full_mark: false to perform a minor GC. Use immediate_sweep: false to
defer sweeping (use lazy sweep).
Note: These keyword arguments are implementation and version dependent. They
are not guaranteed to be future-compatible, and may be ignored if the
underlying implementation does not support them.
Ruby Manages Garbage Collection Automatically
For the most part, Ruby handles garbage collection automatically. There are some edge cases, of course, but in the general case you should never have to worry about garbage collection in a typical Ruby application.
Implementation details of garbage collection vary between versions of Ruby, but it exposes very few knobs to twiddle and for most purposes you don't need them. If you find yourself under memory pressure, you may want to re-evaluate your design decisions rather than trying to manage the symptom of excess memory consumption.
Manually Trigger Garbage Collection
In general terms, Ruby marks objects for garbage collection when they go out of scope or are no longer referenced. However, some objects such as Symbols never get collected, and persist for the entire run-time of your program.
You can manually trigger garbage collection with GC#start, but can't really free blocks of memory the way you can with C programs from within Ruby. If you find yourself needing to do this, you may want to solve the underlying X/Y problem rather than trying to manage memory directly.
You can't explicitly destroy object. Ruby has automatic memory management. Objects no longer referenced from anywhere are automatically collected by the garbage collector built in the interpreter.
Good article to read on how to do allocation wisely, and a few tools you can use to fine tune.
http://merbist.com/2010/07/29/object-allocation-why-you-should-care/

Remove a class instance in Ruby?

I've read the notes on GC happening at an undetermined time after disconnecting any references by instance variables, but would the second line in the delete method be foolish, unnecessary or thorough?
class MyClass
# new instances added to ##instances
...
def delete
##instances.delete(self)
self.instance_variables.each{|v| self.instance_variable_set(v,nil)}
end
end
Unnecessary. If you really want to trigger GC, use GC.start or ObjectSpace.garbage_collect.
And because it can't be recommended often enough, once again:
http://viewsourcecode.org/why/hacking/theFullyUpturnedBin.html
The method delete executes in the scope of the instance that is being removed from the ##instances structure, hence it cannot be garbage collected. Something triggered that method to run, and that something is currently holding a reference to it, therefore it cannot be garbage collected until after the method has returned (and the reference to the object been cleared).
That being said, the second line is completely unnecessary. Even if one of the instance variables pointed back to the object itself the GC is clever enough to figure that out (or rather, it just ignores it since it's not a reference counting collector).
Don't try to manually manage memory, it will not pay off. Whether or not you clear the references to the objects in those instance variables the GC decides when they will be freed. If I interpret the idea of your code example correctly all references to the host object are cleared after delete has run, and in that case it doesn't matter if its instance variables are cleared too, the objects will be just as eligible for garbage collection either way. Objects are GC'ed when they are no longer reachable, it does not matter if other unreachable objects have references to them.

Object address in Ruby

Short version: The default inspect method for a class displays the object's address.* How can I do this in a custom inspect method of my own?
*(To be clear, I want the 8-digit hex number you would normally get from inspect. I don't care about the actual memory address. I'm just calling it a memory address because it looks like one. I know Ruby is memory-safe.)
Long version: I have two classes, Thing and ThingList. ThingList is a subclass of Array specifically designed to hold Things. Due to the nature of Things and the way they are used in my program, Things have an instance variable #container that points back to the ThingList that holds the Thing.
It is possible for two Things to have exactly the same data. Therefore, when I'm debugging the application, the only way I can reliably differentiate between two Things is to use inspect, which displays their address. When I inspect a Thing, however, I get pages upon pages of output because inspect will recursively inspect #container, causing every Thing in the list to be inspected as well!
All I need is the first part of that output. How can I write a custom inspect method on Thing that will just display this?
#<Thing:0xb7727704>
EDIT: I just realized that the default to_s does exactly this. I didn't notice this earlier because I have a custom to_s that provides human-readable details about the object.
Assume that I cannot use to_s, and that I must write a custom inspect.
You can get the address using object_id and multiplying it by 2* and display it in hex using sprintf (aka %):
"#<Thing:0x%08x>" % (object_id * 2)
Of course, as long as you only need the number to be unique and don't care that it's the actual address, you can just leave out the * 2.
* For reasons that you don't need to understand (meaning: I don't understand them), object_id returns half the object's memory address, so you need to multiply by 2 to get the actual address.
This is impossible. There is no way in Ruby to get the memory address of an object, since Ruby is a memory-safe language which has (by design) no methods for accessing memory directly. In fact, in many implementations of Ruby, objects don't even have a memory address. And in most of the implementations that do map objects directly to memory, the memory address potentially changes after every garbage collection.
The reason why using the memory address as an identifier in current versions of MRI and YARV accidentally works, is because they have a crappy garbage collector implementation that never defragments memory. All other implementations have garbage collectors which do defragment memory, and thus move objects around in memory, thereby changing their address.
If you tie your implementation to the memory address, your code will only ever work on slow implementations with crappy garbage collectors. And it isn't even guaranteed that MRI and YARV will always have crappy garbage collectors, in fact, in both implementations the garbage collector has been identified as one of the major performance bottlenecks and it is safe to assume that there will be changes to the garbage collectors. There are already some major changes to YARV's garbage collector in the SVN, which will be part of YARV 1.9.3 and YARV 2.0.
If you want an ID for objects, use Object#object_id.
Instead of subclassing Array your class instances could delegate to one for the desired methods so that you don't inherit the overridden inspect method.

Debugging Ruby's garbage collection

I'm having problems with a long-lived background ruby process on our server, which isn't cleaning up Tempfiles.
I'm using hijack to inject into the process & inspect things, using, for example,
ObjectSpace.each_object(ActiveRecord::Base){|o| puts o}
- turns out that the Tempfiles in question are being referenced by an instance of one of our ActiveRecord subclasses, and those instances aren't being collected.
I haven't been able to figure out what's referencing those AR instances & keeping them alive. Any tips for getting access to whatever object graph the garbage collector uses?
Ruby's garbage collector is a mark & sweep algorithm: (1) start from the Object instance and then walk through every reachable object marking them along the way (2) then walk the ObjectSpace for every object reference and delete those that are not marked.
Here is some reading on the subject of debugging GC memory problems: http://blog.evanweaver.com/articles/2009/04/09/ruby-gc-tuning/
The only things in your rails app that are long lived are the classes and modules. With that in mind some places to look are:
1) are these active record instances being squirreled away in a class variable
2) somehow being cached by rack middleware
3) being held onto by the active record database connection pool
4) are you using ruby finalizers (notorious for memory leaks when used incorrectly) see eigenclass.org/hiki/deferred-finalizers-in-Ruby
Sorry that I'm just posting a bunch of ideas and few solutions. Hope this gets you thinking in some new directions.
Blessings,
TwP

When a variable goes out of scope does that mean it doesn't exist?

I'm not sure I understand scope - does an out-of-scope variable (I'm using Ruby) exist in memory somewhere or does it stop existing (I know you can't access it). Would it be inaccurate to say that an out-of-scope variable does not exist any more?
Maybe this is a philosophical question.
If you are using managed language then you don't allocate and unallocate memory so as far as you are concerned it no longer exists.
Technically it does but GCs tend not to be deterministic so technically it's hard to say when it actually vanishes.
A variable is not the same as the value it holds.
The variable itself ceases to exist when it goes out of scope. The value that the variable held may represent an object, and that object may continue to exist beyond the lifetime of the variable. The garbage collector reclaims the object later.
When it goes out of scope it still exists (in the sense that it has some memory allocated to it) for some time, until garbage collection cleans it up. But as you imply, it's lost it's name and is unreachable.
When a variable falls out of scope is anyone around to hear it scream?
This isn't a ruby question so much as a general question about garbage collection. In a garbage collected language such as Ruby or C# when a variable falls out of scope it's marked in some manner that says it's no longer in use. When this happens you can't get at it any more and it sits around twiddling its thumbs - but it does still have memory allocated to it.
At some point the garbage collector will wake up and look for variables marked as not in use. It will dispose of them and at that point they're no longer in memory at all.
It can be more complicated than this, depending on how the garbage collector works, but it's close enough :)
It exists for a little bit until the garbage collector disposes it (if it can).
Rob Kennedy has this answered appropriately, but I thought I would add a little more detail.
The important thing to recognize is the difference between a variable and the value it represents.
Here's an example (in C# because I don't know Ruby):
object c = null;
if (1 == 1) // Just to get a different scope
{
var newObj = new SomeClass();
newObj.SomeProperty = true;
c = newObj;
}
In the code above, newObj goes out of scope at the end of the if statement and as such "doesn't exist", but the value that it was referring to is still alive and well, referenced by c. Once all of the references to the object are gone, then the garbage collector will take care of cleaning it up.
If you're talking about file objects, it becomes more than a philosophical question. If I recall correctly, files do not close automatically when they go out of scope - they only close if you ask them to close, or if you use a File.open do |file| style block, or if they get garbage collected. This can be an issue if other code (or unit tests) try to read the contents of that file and it hasn't yet been flushed.

Resources