Debugging Ruby's garbage collection - ruby

I'm having problems with a long-lived background ruby process on our server, which isn't cleaning up Tempfiles.
I'm using hijack to inject into the process & inspect things, using, for example,
ObjectSpace.each_object(ActiveRecord::Base){|o| puts o}
- turns out that the Tempfiles in question are being referenced by an instance of one of our ActiveRecord subclasses, and those instances aren't being collected.
I haven't been able to figure out what's referencing those AR instances & keeping them alive. Any tips for getting access to whatever object graph the garbage collector uses?

Ruby's garbage collector is a mark & sweep algorithm: (1) start from the Object instance and then walk through every reachable object marking them along the way (2) then walk the ObjectSpace for every object reference and delete those that are not marked.
Here is some reading on the subject of debugging GC memory problems: http://blog.evanweaver.com/articles/2009/04/09/ruby-gc-tuning/
The only things in your rails app that are long lived are the classes and modules. With that in mind some places to look are:
1) are these active record instances being squirreled away in a class variable
2) somehow being cached by rack middleware
3) being held onto by the active record database connection pool
4) are you using ruby finalizers (notorious for memory leaks when used incorrectly) see eigenclass.org/hiki/deferred-finalizers-in-Ruby
Sorry that I'm just posting a bunch of ideas and few solutions. Hope this gets you thinking in some new directions.
Blessings,
TwP

Related

When should glDeleteBuffers, glDeleteShader, and glDeleteProgram be used in OpenGL ES 2?

While working with VBOs in OpenGL ES 2, I came across glDeleteBuffers, glDeleteShader, and glDeleteProgram. I looked around on the web but I couldn't find any good answers to when these methods are supposed to be called. Are these calls even necessary or does the computer automatically delete the objects on its own? Any answers are appreciated, thanks.
Every glGen* call should be paired with the appropriate glDelete* call which is called when you are finished with the resource.
The computer will not delete the objects on its own while your application is still running because it doesn't know whether you plan to re-use them later. If you are creating new objects throughout the life of your application and failing to delete old ones, then that's a resource leak which will eventually cause a shutdown of your application due to excessive memory usage.
The computer will delete objects for you when the application terminates, so there's no real benefit to deleting the objects that are permanently required throughout the lifetime of your application, but it is generally considered good practice to have a leak-free clean up.
You can call the glDelete* functions as soon as you are finished with the object (e.g. as soon as you've made your last draw call that uses it). You do not need to worry about whether the object might still be in the GPU's queues or pipelines, that is the OpenGL driver's problem.

How to track/find out which userdata are GC-ed at certain time?

I've written an app in LuaJIT, using a third-party GUI framework (FFI-based) + some additional custom FFI calls. The app suddenly loses part of its functionality at some point soon after being run, and I'm quite confident it's because of some unpinned objects being GC-ed. I assume they're only referenced from the C world1, so Lua GC thinks they're unreferenced and can free them. The problem is, I don't know which of the numerous userdata are unreferenced (unpinned) on Lua side?
To confirm my theory, I've run the app with GC disabled, via:
collectgarbage 'stop'
and lo, with this line, the app works perfectly well long past the point where it got broken before. Obviously, it's an ugly workaround, and I'd much prefer to have the GC enabled, and the app still working correctly...
I want to find out which unpinned object (userdata, I assume) gets GCed, so I can pin it properly on Lua side, to prevent it being GCed prematurely. Thus, my question is:
(How) can I track which userdata objects got collected when my app loses functionality?
One problem is, that AFAIK, the LuaJIT FFI already assigns custom __gc handlers, so I cannot add my own, as there can be only one per object. And anyway, the framework is too big for me to try adding __gc in each and every imaginable place in it. Also, I've already eliminated the "most obviously suspected" places in the code, by removing local from some variables — thus making them part of _G, so I assume not GC-able. (Or is that not enough?)
1 Specifically, WinAPI.
For now, I've added some ffi.gc() handlers to some of my objects (printing some easily visible ALL-CAPS messages), then added some eager collectgarbage() calls to try triggering the issue as soon as possible:
ffi.gc(foo, function()
print '\n\nGC FOO !!!\n\n'
end)
[...]
collectgarbage()
And indeed, this exposed some GCing I didn't expect. Specifically, it led me to discover a note in luajit's FFI docs, which is most certainly relevant in my case:
Please note that [C] pointers [...] are not followed by the garbage collector. So e.g. if you assign a cdata array to a pointer, you must keep the cdata object holding the array alive [in Lua] as long as the pointer is still in use.

How to destroy Ruby object?

Suppose there is simple object like:
object = Object.new
As I know this creates Object in memory (RAM).
Is there a way to delete this object from RAM?
Other than hacking the underlying C code, no. Garbage collection is managed by the runtime so you don't have to worry about it. Here is a decent reference on the algorithm in Ruby 2.0.
Once you have no more references to the object in memory, the garbage collector will go to work. You should be fine.
The simple answer is, let the GC (garbage collector) do its job.
When you are ready to get rid of that reference, just do object = nil. And don't make reference to the object.
The garbage collector will eventually collect that and clear the reference.
(from ruby site)
=== Implementation from GC
------------------------------------------------------------------------------
GC.start -> nil
GC.start(full_mark: true, immediate_sweep: true) -> nil
------------------------------------------------------------------------------
Initiates garbage collection, unless manually disabled.
This method is defined with keyword arguments that default to true:
def GC.start(full_mark: true, immediate_sweep: true); end
Use full_mark: false to perform a minor GC. Use immediate_sweep: false to
defer sweeping (use lazy sweep).
Note: These keyword arguments are implementation and version dependent. They
are not guaranteed to be future-compatible, and may be ignored if the
underlying implementation does not support them.
Ruby Manages Garbage Collection Automatically
For the most part, Ruby handles garbage collection automatically. There are some edge cases, of course, but in the general case you should never have to worry about garbage collection in a typical Ruby application.
Implementation details of garbage collection vary between versions of Ruby, but it exposes very few knobs to twiddle and for most purposes you don't need them. If you find yourself under memory pressure, you may want to re-evaluate your design decisions rather than trying to manage the symptom of excess memory consumption.
Manually Trigger Garbage Collection
In general terms, Ruby marks objects for garbage collection when they go out of scope or are no longer referenced. However, some objects such as Symbols never get collected, and persist for the entire run-time of your program.
You can manually trigger garbage collection with GC#start, but can't really free blocks of memory the way you can with C programs from within Ruby. If you find yourself needing to do this, you may want to solve the underlying X/Y problem rather than trying to manage memory directly.
You can't explicitly destroy object. Ruby has automatic memory management. Objects no longer referenced from anywhere are automatically collected by the garbage collector built in the interpreter.
Good article to read on how to do allocation wisely, and a few tools you can use to fine tune.
http://merbist.com/2010/07/29/object-allocation-why-you-should-care/

Difference between local and instance variables in ruby

I am working on a script that creates several fairly complex nested hash datastructures and then iterates through them conditionally creating database records. This is a standalone script using active record. After several minutes of running I noticed a significant lag in server responsiveness and discovered that the script, while being set to be nice +19, was enjoying a steady %85 - %90 total server memory.
In this case I am using instance variables simply for readability. It helps knowing what is going to be re-used outside of the loop vs. what won't. Is there a reason to not use instance variables when they are not needed? Are there differences in memory allocation and management between local and instance variables? Would it help setting #variable = nil when its no longer needed?
An instance variable continues to exist for the life of the object that holds it. A local variable exists only within a single method, block or module body.
If you're assuming objects in your object's instance variables will be garbage-collected just because you don't intend to refer to them in the future, that isn't how it works. The garbage collector only knows whether there's a reachable reference to the object — and there is if it's in an instance variable.
Setting #variable = nil destroys the reference to the object that the instance variable once pointed to. When there are no more remaining references to an object, it should be collected by the garbage collector eventually. I say "eventually" because the GC is somewhat unpredictable. However, it's easy to have a memory leak by a "dangling reference" and possibly (depending on how the GC is implemented) circular references. What else refers to this object?

Is it safe to manipulate objects that I created outside my thread if I don't explicitly access them on the thread which created them?

I am working on a cocoa software and in order to keep the GUI responsive during a massive data import (Core Data) I need to run the import outside the main thread.
Is it safe to access those objects even if I created them in the main thread without using locks if I don't explicitly access those objects while the thread is running.
With Core Data, you should have a separate managed object context to use for your import thread, connected to the same coordinator and persistent store. You cannot simply throw objects created in a context used by the main thread into another thread and expect them to work. Furthermore, you cannot do your own locking for this; you must at minimum lock the managed object context the objects are in, as appropriate. But if those objects are bound to by your views a controls, there are no "hooks" that you can add that locking of the context to.
There's no free lunch.
Ben Trumbull explains some of the reasons why you need to use a separate context, and why "just reading" isn't as simple or as safe as you might think, in this great post from late 2004 on the webobjects-dev list. (The whole thread is great.) He's discussing the Enterprise Objects Framework and WebObjects, but his advice is fully applicable to Core Data as well. Just replace "EC" with "NSManagedObjectContext" and "EOF" with "Core Data" in the meat of his message.
The solution to the problem of sharing data between threads in Core Data, like the Enterprise Objects Framework before it, is "don't." If you've thought about it further and you really, honestly do have to share data between threads, then the solution is to keep independent object graphs in thread-isolated contexts, and use the information in the save notification from one context to tell the other context what to re-fetch. -[NSManagedObjectContext refreshObject:mergeChanges:] is specifically designed to support this use.
I believe that this is not safe to do with NSManagedObjects (or subclasses) that are managed by a CoreData NSManagedObjectContext. In general, CoreData may do many tricky things with the sate of managed objects, including firing faults related to those objects in separate threads. In particular, [NSManagedObject initWithEntity:insertIntoManagedObjectContext:] (the designated initializer for NSManagedObjects as of OS X 10.5), does not guarantee that the returned object is safe to pass to an other thread.
Using CoreData with multiple threads is well documented on Apple's dev site.
The whole point of using locks is to ensure that two threads don't try to access the same resource. If you can guarantee that through some other mechanism, go for it.
Even if it's safe, but it's not the best practice to use shared data between threads without synchronizing the access to those fields. It doesn't matter which thread created the object, but if more than one line of execution (thread/process) is accessing the object at the same time, since it can lead to data inconsistency.
If you're absolutely sure that only one thread will ever access this object, than it'd be safe to not synchronize the access. Even then, I'd rather put synchronization in my code now than wait till later when a change in the application puts a second thread sharing the same data without concern about synchronizing access.
Yes, it's safe. A pretty common pattern is to create an object, then add it to a queue or some other collection. A second "consumer" thread takes items from the queue and does something with them. Here, you'd need to synchronize the queue but not the objects that are added to the queue.
It's NOT a good idea to just synchronize everything and hope for the best. You will need to think very carefully about your design and exactly which threads can act upon your objects.
Two things to consider are:
You must be able to guarantee that the object is fully created and initialised before it is made available to other threads.
There must be some mechanism by which the main (GUI) thread detects that the data has been loaded and all is well. To be thread safe this will inevitably involve locking of some kind.
Yes you can do it, it will be safe
...
until the second programmer comes around and does not understand the same assumptions you have made. That second (or 3rd, 4th, 5th, ...) programmer is likely to start using the object in a non safe way (in the creator thread). The problems caused could be very subtle and difficult to track down. For that reason alone, and because its so tempting to use this object in multiple threads, I would make the object thread safe.
To clarify, (thanks to those who left comments):
By "thread safe" I mean programatically devising a scheme to avoid threading issues. I don't necessarily mean devise a locking scheme around your object. You could find a way in your language to make it illegal (or very hard) to use the object in the creator thread. For example, limiting the scope, in the creator thread, to the block of code that creates the object. Once created, pass the object over to the user thread, making sure that the creator thread no longer has a reference to it.
For example, in C++
void CreateObject()
{
Object* sharedObj = new Object();
PassObjectToUsingThread( sharedObj); // this function would be system dependent
}
Then in your creating thread, you no longer have access to the object after its creation, responsibility is passed to the using thread.

Resources