Closure memory leaks - memory-management

I'm interested in the possibility in memory(unneeded reference) leaks of memory leaks in garbage collected languages
caused by variables caught in closures which are stored
(perhaps as part of an object system or as part of building actions based on input to be evaluated later).
Are there any languages where this sort of thing is somewhat common? If so what are the patterns to watch out for in those languages to prevent it?

As long as the closure is referenced, the captured variables will be kept. As a result, you need to be careful about where you create references to those closures.
Event handlers that are not unsubscribed are a potential source of many types of leaks. However, I can't really think of any generic patterns that will help you in every conceivable way you may be using closures :)

It is not really what you mean, but the garbage collector in Internet Explorer < 7 used not to be able to collect variables with circular references. This has not much to do with closures per se, but it turns out closures in javascript can create circular references quite easily.
I think a pattern like this would do
function foo() {
var div = document.getElementById('mydiv');
div.onclick = bar;
function bar() {
div.style.opacity = 0.5;
}
}
Now, whatever you do, the function bar references the variable div, and at the same time is assigned to a property of div.
As a consequence, it used to be necessary to put particular attention when using closures on IE to avoid memory leaks.

In many languages, if multiple delegates are created that close over some variables in a given scope, every delegate which closes over any of those variables will close over all of them. For example:
Action blah(Dictionary<string, int> dict, List<string> list)
{
int i;
list.ForEach( (st) => if (dict.Contains(st)) i++; );
return () => Console.WriteLine("The value was {0}", i);
}
That method will create two delegates. The first needs variables dict and i, and will be abandoned before the function exits. The second only needs i but could be held indefinitely by the caller. As long as the caller keeps the delegate returned by this method, the passed-in dictionary will not be collectible.
It would be possible for a compiler to avoid this issue by generating two closures, one of which held a dict and an int[1], and the other of which just held an int[1]; both closures would hold a reference to the same int[1], thus preserving the required semantics. In practice, though, the extra costs associated with the excess captures.

Variables in a Closure are not 'caught' but are simply still referenced, the GC isn't going to collect them until the reference is released. When the application terminates then any outstanding Closures are going to de-reference and any associated resource released. No memory leak as all allocated resources are released.

Related

Safe way to generically allocate memory in Cocoa which will be automatically deallocated

In my Cocoa project I had a bunch of places where I used malloc/free. However several months ago I decided to refactor to leverage ARC and in order to do that I tried to make a replacement for malloc which will return a pointer to something that will be automatically cleaned up.
I used this function (error checking and other logging omitted)
+ (void *) MallocWithAutoCleanup: (size_t) size
{
NSMutableData * mutableData = [[NSMutableData alloc] initWithLength:size];
void * data = [mutableData mutableBytes];
return data;
}
This worked fine for awhile, but recently a random memory overwrite issue came up. I tracked down the cause to this function, what appears to be happening is the NSMutableData instance is being deallocated even though I am keeping a pointer to its mutableBytes.
I guess this is happening because the only direct reference to the object is local and is going away, and the mutableBytes points inside the object so the ARC isn't smart enough to deal with that sort of reference counting.
Is there any way I can refactor this code to retain the mutableData object as long as the mutableBytes pointer is being used (i.e. someone has a reference to it)? I know one option is to just return the NSMutableData itself, but that requires some heavy refactoring and seems very messy.
In the 10.7 SDK, -[NSMutableData mutableBytes] is decorated with the NS_RETURNS_INNER_POINTER attribute. This signals to the compiler that the method returns a pointer whose validity depends on the receiver still existing. What exactly ARC does with this is open to change, but currently it retains and autoreleases the receiver (subject to redundant operations being optimized away).
So, the pointer is valid for the duration of the current autorelease pool's lifetime. This is akin to -[NSString UTF8String] (which is decorated in the same way).
ARC is not capable of keeping the mutable data object alive so long as there's any reference to the byte pointer. ARC is not a garbage collector. It doesn't watch all uses of all pointers. It operates locally. It examines one given function, method, or block and emits retains and releases for the behavior of the code as indicated by naming conventions. (Remember that ARC is interoperable with code which hasn't been compiled with ARC support.)
Since a void* isn't an object pointer and can't be retained or released, ARC can't do anything with it. So, in the code calling your -MallocWithAutoCleanup: method, ARC doesn't see anything it can manage. It doesn't emit any special memory management code. (What could it emit at that point?) While compiling the caller, the compiler likely doesn't know anything about the method implementation or the mutable data object inside it.
Think about it another way: if you were still writing manually reference counting code, what would you do in the caller of your method to keep the pointer valid? For the most part (ignoring __weak references), all ARC does is automatically do what you would do manually. Once you consider that you would have no options in that case, you realize that neither does ARC.
I think you answered your own question. If you want to use NSData to manage generic memory allocations, you need to keep a reference to the NSData objects around until you're done with the memory it owns, at which point you nil out your reference(s) to the NSData object in question. This doesn't seem to provide any advantage compared to just manually freeing the malloced memory. Personally, I'd continue to use malloc()/free() explicitly instead of trying to contort my code in such a way that ARC kind of sort of manages malloced memory.
Either that, or I'd write my code such that it doesn't have to use malloc/free in the first place. I'd say the typical "pure" Cocoa project doesn't have many, if any, explicit malloc() calls, and I'd be a little suspicious of code that did unless there was some good reason for it. Why are you using malloc() in the first place?

What is the design rationale behind HandleScope?

V8 requires a HandleScope to be declared in order to clean up any Local handles that were created within scope. I understand that HandleScope will dereference these handles for garbage collection, but I'm interested in why each Local class doesn't do the dereferencing themselves like most internal ref_ptr type helpers.
My thought is that HandleScope can do it more efficiently by dumping a large number of handles all at once rather than one by one as they would in a ref_ptr type scoped class.
Here is how I understand the documentation and the handles-inl.h source code. I, too, might be completely wrong since I'm not a V8 developer and documentation is scarce.
The garbage collector will, at times, move stuff from one memory location to another and, during one such sweep, also check which objects are still reachable and which are not. In contrast to reference-counting types like std::shared_ptr, this is able to detect and collect cyclic data structures. For all of this to work, V8 has to have a good idea about what objects are reachable.
On the other hand, objects are created and deleted quite a lot during the internals of some computation. You don't want too much overhead for each such operation. The way to achieve this is by creating a stack of handles. Each object listed in that stack is available from some handle in some C++ computation. In addition to this, there are persistent handles, which presumably take more work to set up and which can survive beyond C++ computations.
Having a stack of references requires that you use this in a stack-like way. There is no “invalid” mark in that stack. All the objects from bottom to top of the stack are valid object references. The way to ensure this is the LocalScope. It keeps things hierarchical. With reference counted pointers you can do something like this:
shared_ptr<Object>* f() {
shared_ptr<Object> a(new Object(1));
shared_ptr<Object>* b = new shared_ptr<Object>(new Object(2));
return b;
}
void g() {
shared_ptr<Object> c = *f();
}
Here the object 1 is created first, then the object 2 is created, then the function returns and object 1 is destroyed, then object 2 is destroyed. The key point here is that there is a point in time when object 1 is invalid but object 2 is still valid. That's what LocalScope aims to avoid.
Some other GC implementations examine the C stack and look for pointers they find there. This has a good chance of false positives, since stuff which is in fact data could be misinterpreted as a pointer. For reachability this might seem rather harmless, but when rewriting pointers since you're moving objects, this can be fatal. It has a number of other drawbacks, and relies a lot on how the low level implementation of the language actually works. V8 avoids that by keeping the handle stack separate from the function call stack, while at the same time ensuring that they are sufficiently aligned to guarantee the mentioned hierarchy requirements.
To offer yet another comparison: an object references by just one shared_ptr becomes collectible (and actually will be collected) once its C++ block scope ends. An object referenced by a v8::Handle will become collectible when leaving the nearest enclosing scope which did contain a HandleScope object. So programmers have more control over the granularity of stack operations. In a tight loop where performance is important, it might be useful to maintain just a single HandleScope for the whole computation, so that you won't have to access the handle stack data structure so often. On the other hand, doing so will keep all the objects around for the whole duration of the computation, which would be very bad indeed if this were a loop iterating over many values, since all of them would be kept around till the end. But the programmer has full control, and can arrange things in the most appropriate way.
Personally, I'd make sure to construct a HandleScope
At the beginning of every function which might be called from outside your code. This ensures that your code will clean up after itself.
In the body of every loop which might see more than three or so iterations, so that you only keep variables from the current iteration.
Around every block of code which is followed by some callback invocation, since this ensures that your stuff can get cleaned if the callback requires more memory.
Whenever I feel that something might produce considerable amounts of intermediate data which should get cleaned (or at least become collectible) as soon as possible.
In general I'd not create a HandleScope for every internal function if I can be sure that every other function calling this will already have set up a HandleScope. But that's probably a matter of taste.
Disclaimer: This may not be an official answer, more of a conjuncture on my part; but the v8 documentation is hardly
useful on this topic. So I may be proven wrong.
From my understanding, in developing various v8 based backed application. Its a means of handling the difference between the C++ and javaScript environment.
Imagine the following sequence, which a self dereferencing pointer can break the system.
JavaScript calls up a C++ wrapped v8 function : lets say helloWorld()
C++ function creates a v8::handle of value "hello world =x"
C++ returns the value to the v8 virtual machine
C++ function does its usual cleaning up of resources, including dereferencing of handles
Another C++ function / process, overwrites the freed memory space
V8 reads the handle : and the data is no longer the same "hell!#(#..."
And that's just the surface of the complicated inconsistency between the two; Hence to tackle the various issues of connecting the JavaScript VM (Virtual Machine) to the C++ interfacing code, i believe the development team, decided to simplify the issue via the following...
All variable handles, are to be stored in "buckets" aka HandleScopes, to be built / compiled / run / destroyed by their
respective C++ code, when needed.
Additionally all function handles, are to only refer to C++ static functions (i know this is irritating), which ensures the "existence"
of the function call regardless of constructors / destructor.
Think of it from a development point of view, in which it marks a very strong distinction between the JavaScript VM development team, and the C++ integration team (Chrome dev team?). Allowing both sides to work without interfering one another.
Lastly it could also be the sake of simplicity, to emulate multiple VM : as v8 was originally meant for google chrome. Hence a simple HandleScope creation and destruction whenever we open / close a tab, makes for much easier GC managment, especially in cases where you have many VM running (each tab in chrome).

Condition, Block, Module - which way is the most memory and computationally efficient?

There are always several ways to do the same thing in Mathematica. For example, when adapting WReach's solution for my recent problem I used Condition:
ClearAll[ff];
SetAttributes[ff, HoldAllComplete];
ff[expr_] /; (Unset[done]; True) :=
Internal`WithLocalSettings[Null, done = f[expr],
AbortProtect[If[! ValueQ[done], Print["Interrupt!"]]; Unset[done]]]
However, we can do the same thing with Block:
ClearAll[ff];
SetAttributes[ff, HoldAllComplete];
ff[expr_] :=
Block[{done},
Internal`WithLocalSettings[Null, done = f[expr],
AbortProtect[If[! ValueQ[done], Print["Interrupt!"]]]]]
Or with Module:
ClearAll[ff];
SetAttributes[ff, HoldAllComplete];
ff[expr_] :=
Module[{done},
Internal`WithLocalSettings[Null, done = f[expr],
AbortProtect[If[! ValueQ[done], Print["Interrupt!"]]]]]
Probably there are several other ways to do the same. Which way is the most efficient from the point of view of memory and CPU use (f may return very large arrays of data - but may return very small)?
Both Module and Block are quite efficient, so the overhead induced by them is only noticable when the body of a function whose variables you localize does very little. There are two major reasons for the overhead: scoping construct overhead (scoping constructs must analyze the code they enclose to resolve possible name conflicts and bind variables - this takes place for both Module and Block), and the overhead of creation and destruction of new symbols in a symbol table (only for Module). For this reason, Block is somewhat faster. To see how much faster, you can do a simple experiment:
In[14]:=
Clear[f,fm,fb,fmp];
f[x_]:=x;
fm[x_]:=Module[{xl = x},xl];
fb[x_]:=Block[{xl = x},xl];
Module[{xl},fmp[x_]:= xl=x]
We defined here 4 functions, with the simplest body possible - just return the argument, possibly assigned to a local variable. We can expect the effect to be most pronounced here, since the body does very little.
In[19]:= f/#Range[100000];//Timing
Out[19]= {0.063,Null}
In[20]:= fm/#Range[100000];//Timing
Out[20]= {0.343,Null}
In[21]:= fb/#Range[100000];//Timing
Out[21]= {0.172,Null}
In[22]:= fmp/#Range[100000];//Timing
Out[22]= {0.109,Null}
From these timings, we see that Block is about twice faster than Module, but that the version that uses persistent variable created by Module in the last function only once, is about twice more efficient than Block, and almost as fast as a simple function invokation (because persistent variable is only created once, and there is no scoping overhead when applying the function).
For real functions, and most of the time, the overhead of either Module or Block should not matter, so I'd use whatever is safer (usually, Module). If it does matter, one option is to use persistent local variables created by Module only once. If even this overhead is significant, I'd reconsider the design - since then obviously your function does too little.There are cases when Block is more beneficial, for example when you want to be sure that all the memory used by local variables will be automatically released (this is particularly relevant for local variables with DownValues, since they are not always garbage - collected when created by Module). Another reason to use Block is when you expect a possibility of interrupts such as exceptions or aborts, and want the local variables to automatically be reset (which Block does). By using Block, however, you risk name collisions, since it binds variables dynamically rather than lexically.
So, to summarize: in most cases, my suggestion is this: if you feel that your function has serious memory or run-time inefficiency, look elsewhere - it is very rare for scoping constructs to be the major bottleneck. Exceptions would include not garbage-collected Module variables with accumulated data, very light-weight functions used very frequently, and functions which operate on very efficient low-level structures such as packed arrays and sparse arrays, where symbolic scoping overhead may be comparable to the time it takes a function to process its data, since the body is very efficient and uses fast functions that by-pass the main evaluator.
EDIT
By combining Block and Module in the fashion suggested here:
Module[{xl}, fmbp[x_] := Block[{xl = x}, xl]]
you can have the best of both worlds: a function as fast as Block - scoped one and as safe as the one that uses Module.

Should I Dispose() DataSet and DataTable?

DataSet and DataTable both implement IDisposable, so, by conventional best practices, I should call their Dispose() methods.
However, from what I've read so far, DataSet and DataTable don't actually have any unmanaged resources, so Dispose() doesn't actually do much.
Plus, I can't just use using(DataSet myDataSet...) because DataSet has a collection of DataTables.
So, to be safe, I'd need to iterate through myDataSet.Tables, dispose of each of the DataTables, then dispose of the DataSet.
So, is it worth the hassle to call Dispose() on all of my DataSets and DataTables?
Addendum:
For those of you who think that DataSet should be disposed:
In general, the pattern for disposing is to use using or try..finally, because you want to guarantee that Dispose() will be called.
However, this gets ugly real fast for a collection. For example, what do you do if one of the calls to Dispose() thrown an exception? Do you swallow it (which is "bad") so that you can continue on to dispose the next element?
Or, do you suggest that I just call myDataSet.Dispose(), and forget about disposing the DataTables in myDataSet.Tables?
Here are a couple of discussions explaining why Dispose is not necessary for a DataSet.
To Dispose or Not to Dispose ?:
The Dispose method in DataSet exists ONLY because of side effect of inheritance-- in other words, it doesn't actually do anything useful in the finalization.
Should Dispose be called on DataTable and DataSet objects? includes some explanation from an MVP:
The system.data namespace (ADONET) does not contain
unmanaged resources. Therefore there is no need to dispose any of those as
long as you have not added yourself something special to it.
Understanding the Dispose method and datasets? has a with comment from authority Scott Allen:
In pratice we rarely Dispose a DataSet because it offers little benefit"
So, the consensus there is that there is currently no good reason to call Dispose on a DataSet.
Update (December 1, 2009):
I'd like to amend this answer and concede that the original answer was flawed.
The original analysis does apply to objects that require finalization – and the point that practices shouldn’t be accepted on the surface without an accurate, in-depth understanding still stands.
However, it turns out that DataSets, DataViews, DataTables suppress finalization in their constructors – this is why calling Dispose() on them explicitly does nothing.
Presumably, this happens because they don’t have unmanaged resources; so despite the fact that MarshalByValueComponent makes allowances for unmanaged resources, these particular implementations don’t have the need and can therefore forgo finalization.
(That .NET authors would take care to suppress finalization on the very types that normally occupy the most memory speaks to the importance of this practice in general for finalizable types.)
Notwithstanding, that these details are still under-documented since the inception of the .NET Framework (almost 8 years ago) is pretty surprising (that you’re essentially left to your own devices to sift though conflicting, ambiguous material to put the pieces together is frustrating at times but does provide a more complete understanding of the framework we rely on everyday).
After lots of reading, here’s my understanding:
If an object requires finalization, it could occupy memory longer than it needs to – here’s why: a) Any type that defines a destructor (or inherits from a type that defines a destructor) is considered finalizable; b) On allocation (before the constructor runs), a pointer is placed on the Finalization queue; c) A finalizable object normally requires 2 collections to be reclaimed (instead of the standard 1); d) Suppressing finalization doesn’t remove an object from the finalization queue (as reported by !FinalizeQueue in SOS)
This command is misleading; Knowing what objects are on the finalization queue (in and of itself) isn’t helpful; Knowing what objects are on the finalization queue and still require finalization would be helpful (is there a command for this?)
Suppressing finalization turns a bit off in the object's header indicating to the runtime that it doesn’t need to have its Finalizer invoked (doesn’t need to move the FReachable queue); It remains on the Finalization queue (and continues to be reported by !FinalizeQueue in SOS)
The DataTable, DataSet, DataView classes are all rooted at MarshalByValueComponent, a finalizable object that can (potentially) handle unmanaged resources
Because DataTable, DataSet, DataView don’t introduce unmanaged resources, they suppress finalization in their constructors
While this is an unusual pattern, it frees the caller from having to worry about calling Dispose after use
This, and the fact that DataTables can potentially be shared across different DataSets, is likely why DataSets don’t care to dispose child DataTables
This also means that these objects will appear under the !FinalizeQueue in SOS
However, these objects should still be reclaimable after a single collection, like their non-finalizable counterparts
4 (new references):
http://www.devnewsgroups.net/dotnetframework/t19821-finalize-queue-windbg-sos.aspx
http://blogs.msdn.com/tom/archive/2008/04/28/asp-net-tips-looking-at-the-finalization-queue.aspx
http://issuu.com/arifaat/docs/asp_net_3.5unleashed
http://msdn.microsoft.com/en-us/magazine/bb985013.aspx
http://blogs.msdn.com/tess/archive/2006/03/27/561715.aspx
Original Answer:
There are a lot of misleading and generally very poor answers on this - anyone who's landed here should ignore the noise and read the references below carefully.
Without a doubt, Dispose should be called on any Finalizable objects.
DataTables are Finalizable.
Calling Dispose significantly speeds up the reclaiming of memory.
MarshalByValueComponent calls GC.SuppressFinalize(this) in its Dispose() - skipping this means having to wait for dozens if not hundreds of Gen0 collections before memory is reclaimed:
With this basic understanding of finalization we
can already deduce some very important
things:
First, objects that need finalization
live longer than objects that do not.
In fact, they can live a lot longer.
For instance, suppose an object that
is in gen2 needs to be finalized.
Finalization will be scheduled but the
object is still in gen2, so it will
not be re-collected until the next
gen2 collection happens. That could be
a very long time indeed, and, in fact,
if things are going well it will be a
long time, because gen2 collections
are costly and thus we want them to
happen very infrequently. Older
objects needing finalization might
have to wait for dozens if not
hundreds of gen0 collections before
their space is reclaimed.
Second, objects that need finalization
cause collateral damage. Since the
internal object pointers must remain
valid, not only will the objects
directly needing finalization linger
in memory but everything the object
refers to, directly and indirectly,
will also remain in memory. If a huge
tree of objects was anchored by a
single object that required
finalization, then the entire tree
would linger, potentially for a long
time as we just discussed. It is
therefore important to use finalizers
sparingly and place them on objects
that have as few internal object
pointers as possible. In the tree
example I just gave, you can easily
avoid the problem by moving the
resources in need of finalization to a
separate object and keeping a
reference to that object in the root
of the tree. With that modest change
only the one object (hopefully a nice
small object) would linger and the
finalization cost is minimized.
Finally, objects needing finalization
create work for the finalizer thread.
If your finalization process is a
complex one, the one and only
finalizer thread will be spending a
lot of time performing those steps,
which can cause a backlog of work and
therefore cause more objects to linger
waiting for finalization. Therefore,
it is vitally important that
finalizers do as little work as
possible. Remember also that although
all object pointers remain valid
during finalization, it might be the
case that those pointers lead to
objects that have already been
finalized and might therefore be less
than useful. It is generally safest to
avoid following object pointers in
finalization code even though the
pointers are valid. A safe, short
finalization code path is the best.
Take it from someone who's seen 100s of MBs of non-referenced DataTables in Gen2: this is hugely important and completely missed by the answers on this thread.
References:
1 -
http://msdn.microsoft.com/en-us/library/ms973837.aspx
2 -
http://vineetgupta.spaces.live.com/blog/cns!8DE4BDC896BEE1AD!1104.entry
http://www.dotnetfunda.com/articles/article524-net-best-practice-no-2-improve-garbage-collector-performance-using-finalizedispose-pattern.aspx
3 -
http://codeidol.com/csharp/net-framework/Inside-the-CLR/Automatic-Memory-Management/
You should assume it does something useful and call Dispose even if it does nothing in current .NET Framework incarnations. There's no guarantee it will stay that way in future versions leading to inefficient resource usage.
Even if an object has no unmanaged resources, disposing might help GC by breaking object graphs. In general, if an object implements IDisposable, Dispose() should be called.
Whether Dispose() actually does something or not depends on the given class. In case of DataSet, Dispose() implementation is inherited from MarshalByValueComponent. It removes itself from container and calls Disposed event. The source code is below (disassembled with .NET Reflector):
protected virtual void Dispose(bool disposing)
{
if (disposing)
{
lock (this)
{
if ((this.site != null) && (this.site.Container != null))
{
this.site.Container.Remove(this);
}
if (this.events != null)
{
EventHandler handler = (EventHandler) this.events[EventDisposed];
if (handler != null)
{
handler(this, EventArgs.Empty);
}
}
}
}
}
Do you create the DataTables yourself? Because iterating through the children of any Object (as in DataSet.Tables) is usually not needed, as it's the job of the Parent to dispose all its child members.
Generally, the rule is: If you created it and it implements IDisposable, Dispose it. If you did NOT create it, then do NOT dispose it, that's the job of the parent object. But each object may have special rules, check the Documentation.
For .NET 3.5, it explicitly says "Dispose it when not using anymore", so that's what I would do.
I call dispose anytime an object implements IDisposeable. It's there for a reason.
DataSets can be huge memory hogs. The sooner they can be marked for clean up, the better.
update
It's been 5 years since I answered this question. I still agree with my answer. If there is a dispose method, it should be called when you are done with the object. The IDispose interface was implemented for a reason.
If your intention or the context of this question is really garbage collection, then you can set the datasets and datatables to null explicitly or use the keyword using and let them go out of scope. Dispose does not do much as Tetraneutron said it earlier. GC will collect dataset objects that are no longer referenced and also those that are out of scope.
I really wish SO forced people down voting to actually write a comment before downvoting the answer.
Datasets implement IDisposable thorough MarshalByValueComponent, which implements IDisposable. Since datasets are managed there is no real benefit to calling dispose.
Try to use Clear() function.
It works great for me for disposing.
DataTable dt = GetDataSchema();
//populate dt, do whatever...
dt.Clear();
No need to Dispose()
because DataSet inherit MarshalByValueComponent class and MarshalByValueComponent implement IDisposable Interface
This is the right way to properly Dispose the DataTable.
private DataTable CreateSchema_Table()
{
DataTable td = null;
try
{
td = new DataTable();
//use table DataTable here
return td.Copy();
}
catch { }
finally
{
if (td != null)
{
td.Constraints.Clear();
td.Clear();
td.Dispose();
td = null;
}
}
}
And this can be the best/proper way to Dispose and release the memory consumed by DataSet.
try
{
DataSet ds = new DataSet("DS");
//use table DataTable here
}
catch { }
finally
{
if (ds != null)
{
ds.EnforceConstraints = false;
ds.Relations.Clear();
int totalCount = ds.Tables.Count;
for (int i = totalCount - 1; i >= 0; i--)
{
DataTable td1 = ds.Tables[i];
if (td1 != null)
{
td1.Constraints.Clear();
td1.Clear();
td1.Dispose();
td1 = null;
}
}
ds.Tables.Clear();
ds.Dispose();
ds = null;
}
}

When a variable goes out of scope does that mean it doesn't exist?

I'm not sure I understand scope - does an out-of-scope variable (I'm using Ruby) exist in memory somewhere or does it stop existing (I know you can't access it). Would it be inaccurate to say that an out-of-scope variable does not exist any more?
Maybe this is a philosophical question.
If you are using managed language then you don't allocate and unallocate memory so as far as you are concerned it no longer exists.
Technically it does but GCs tend not to be deterministic so technically it's hard to say when it actually vanishes.
A variable is not the same as the value it holds.
The variable itself ceases to exist when it goes out of scope. The value that the variable held may represent an object, and that object may continue to exist beyond the lifetime of the variable. The garbage collector reclaims the object later.
When it goes out of scope it still exists (in the sense that it has some memory allocated to it) for some time, until garbage collection cleans it up. But as you imply, it's lost it's name and is unreachable.
When a variable falls out of scope is anyone around to hear it scream?
This isn't a ruby question so much as a general question about garbage collection. In a garbage collected language such as Ruby or C# when a variable falls out of scope it's marked in some manner that says it's no longer in use. When this happens you can't get at it any more and it sits around twiddling its thumbs - but it does still have memory allocated to it.
At some point the garbage collector will wake up and look for variables marked as not in use. It will dispose of them and at that point they're no longer in memory at all.
It can be more complicated than this, depending on how the garbage collector works, but it's close enough :)
It exists for a little bit until the garbage collector disposes it (if it can).
Rob Kennedy has this answered appropriately, but I thought I would add a little more detail.
The important thing to recognize is the difference between a variable and the value it represents.
Here's an example (in C# because I don't know Ruby):
object c = null;
if (1 == 1) // Just to get a different scope
{
var newObj = new SomeClass();
newObj.SomeProperty = true;
c = newObj;
}
In the code above, newObj goes out of scope at the end of the if statement and as such "doesn't exist", but the value that it was referring to is still alive and well, referenced by c. Once all of the references to the object are gone, then the garbage collector will take care of cleaning it up.
If you're talking about file objects, it becomes more than a philosophical question. If I recall correctly, files do not close automatically when they go out of scope - they only close if you ask them to close, or if you use a File.open do |file| style block, or if they get garbage collected. This can be an issue if other code (or unit tests) try to read the contents of that file and it hasn't yet been flushed.

Resources