Does Xcode Memory graph offer any smart visual indicators for strong references that aren't memory cycles? - xcode

As a follow up to my previous How can I create a reference cycle using dispatchQueues?:
For the strong references (that create leaks, but aren't reference cycles) e.g. Timer, DispatchSourceTimer, DispatchWorkItem, the memory graph doesn't create a purple icon, I suspect it's simply because it doesn't find two objects pointing back to each other strongly.
I know I can go back and forth and observe that a specific class is just not leaving the memory, but wondering if Xcode is providing anything more.
Is there any other indicator?
I know Xcode visually shows the number of instances of a type in memory. But is there a way to filter objects that have more than 3 instances in memory?

You ask:
For the strong references (that create leaks, but aren't reference cycles) e.g. Timer, DispatchSourceTimer, DispatchWorkItem, the memory graph doesn't create a purple icon, I suspect it's simply because it doesn't find two objects pointing back to each other strongly.
Yes. Or more accurately, the strong reference cycle warning is produced when there are two (or more objects) whose only strong references are between each other.
But in the case of repeating timers, notification center observers, GCD sources, etc., these are not, strictly speaking, strong reference cycles. The issue is that the owner (the object that is keeping a strong reference to our app’s object) is just some persistent object that won’t get released while our app is running. Sure, our object might still be “abandoned memory” from our perspective, but it’s not a cycle.
By way of example, consider repeating timer that is keeping strong reference to our object. The main runloop is keeping strong reference to that timer and won’t release it until the timer is invalidated. There’s no strong reference cycle, in the narrow sense of the term, as our app doesn’t have strong reference back to the runloop or the timer. But nonetheless, the repeating timer will keep a strong reference to our object (unless we used [weak self] pattern or what have you).
It would be lovely if the “Debug Memory Graph” knew about these well-known persistent objects (like main runloop, default notification center, libDispatch, etc.), and perhaps drew our attention to those cases where one of these persistent objects were the last remaining strong reference to one of our objects. But it doesn’t, at least at this point.
This is why we employ the technique of “return to point that most of my custom objects should be have been deallocated” and then “use ‘debug memory graph’ to identify what wasn’t released and see what strong references are persisting”. Sure, it would be nice if Xcode could draw our attention to these automatically, but it doesn’t.
But if our app has some quiescent state, where we know the limited types of objects that should still be around, this “debug memory graph” feature is still extremely useful, even in the absence of some indicator like the strong reference cycle warning.
I know I can go back and forth and observe that a specific class is just not leaving the memory, but wondering if Xcode is providing anything more.
Is there any other indicator?
No, not that I know of.
I know Xcode visually shows the number of instances of a type in memory. But is there a way to filter objects that have more than 3 instances in memory?
Again, no, not that I know of.

Related

Is Rust-style ownership and lifetimes possible without Rust-style borrow checking?

Would it be possible for a programming language to consistently have Rust-style ownership and lifetimes (for automatic memory management) while dropping the requirement that only one mutable reference to a piece of data can exist at any time (used to suppress data races)?
In other words, are Rust-style ownership and lifetimes and Rust-style borrow checking two separable concepts? Alternatively, are these two ideas inherently entangled at a semantic level?
Would it be possible for a programming language to consistently have Rust-style ownership and lifetimes (for automatic memory management) while dropping the requirement that only one mutable reference to a piece of data can exist at any time (used to suppress data races)?
A language can do anything so sure.
The problem is that dropping this requirement would be an UB-nest in a language like Rust: if you drop mutable references being unique then they have no purpose so you just have references (always mutable) and the only thing they do is be lexically scoped, this means you can hold a reference to a sub-part of an object, and have a second reference mutate the object such that the sub-part is invalidated (e.g. a reference to a vec item and clearing the vec[0]), and the first reference is now dangling, it points to garbage.
The way to solve that would be to… add a GC? And from that point on the value of "rust-style ownership and references" becomes… limited to nonexistent, because you need a GC non-lexical automated memory management and your references can keep objects alive so having all types be affine by default isn't very useful.
Now what can be useful (and what some languages explore) is for sub-normal types to be opt-in, so types would be normal by default but could be opted into being affine, linear, or even ordered, on a needs basis. This would be solely a type safety measure.
If so, are there any existing languages which achieve this?
Not to my knowledge.
If not, why not?
Because nobody's written one? Affine types by default are useful to Rust but they're not super useful in general so most of the research and design has focused around linear types, which provide more guarantees and are therefore more useful if only a small subset of your types are going to be sub-normal.
[0] which shows that "data races" are not solely about concurrency, it's an entire class of problems which occur commonly in sequential code (e.g. iterator invalidation)

A C++11 based signals/slots with ordering

I may be a bit in over my head here, but if you never try new things - you'll never learn I suppose. I'm working with some multi-touch stuff and have built myself a small but functional GUI library. Up until now I've been used boosts Signals2 library to have my detected gestures be distributed to all active GUI elements (whether on screen or not). I'm a big fan of avoiding pre-mature optimization, so things have been honky-dory until now.
I've used vs2013's profiler to find out that when the user goes touch crazy (the device supports up to 41 simultaneous touches), then my system grinds to a halt, and Signals2 is the culprit. Keep in mind that each touch can trigger a number of Gestures which are all communicated to every GUI element that have registered to interact with this type of Gesture.
Now there are a number of ways to deal with this bottleneck:
Have GUI elements work more cleverly and disconnect them when they're off-screen.
Optimize the signals/slots system so the calls are resolved faster.
Prioritization of events.
I'm not a big fan of ever having to deal with 3 - if avoidable - as it'll directly impact the responsiveness of my application. Nr. 1 should probably be implemented, but I'm more interested in getting to Nr. 2 first.
I don't really need any big fancy stuff. The Signals/Slots system I'd need really only needs to do the core emission stuff along with these 2 feature:
Slots must be able to return a value ending the emission - effectively cancelling any subsequent handling of a signal.
Slots must be order-able - and fairly efficient at this. GUI elements that are interacted with will pop-up above others, so this type of change in order is bound to happen quite often.
I stumbled across this really interesting implementation
https://testbit.eu/2013/cpp11-signal-system-performance/
which seems to have everything except the 'ordering' I need. I've only looked over the code once, and it does seem a little intimidating. If I were to try and add ordering capabilities, I'd prefer not to change too much stuff around if necessary. Does anyone have experience with this stuff? I'm fairly certain that a linked list is not optimal for constant removal and insertion, but then again, it probably needs to be optimized the most for constant emission calls.
Any thoughts are most welcome!
--- Update ---
I've spend a little time adding the features I needed to the code put into the public domain above and pasted the complete (and somewhat hacky version) here:
SimpleSignal with added features
In short, I've added:
Blockable Connections (Implemented via simple IF statement)
Depth/Order parameter (Linear-search insertion)
With those additions, keep in mind it has these current issues:
Blocked connections are simply skipped, not actively removed from the data-structure, so having a lot of blocked connections will affect run-time performance.
Depth is only maintained during insertion. So if you'd like to change the depth you'll have to disconnect and reconnect your slot.
Since the SignalLink interface has become exposed (as a result of my block implementation), it's less safe from a user perspective. It's way easier for you to shoot yourself in the foot with this version by messing with existing references and pointers.
This implementation hasn't been as thoroughly tested as the original I'm sure. I did try out the new functionality a bit. User beware.

Are there DirectX guidelines for binding and unbinding resources between draw calls?

All DirectX books and tutorials strongly recommend reducing resource allocations between draw calls to a minimum – yet I can’t find any guidelines that get more into details. Reviewing a lot of sample code found in the web, I have concluded that programmers have completely different coding principles regarding this subject. Some even set and unset
VS/PS
VS/PS ResourceViews
RasterizerStage
DepthStencilState
PrimitiveTopology
...
before and after every draw call (although the setup remains unchanged), and others don’t.
I guess that's a bit overdone...
From my own experiments I have found that the only resources I have to bind on every draw call are the ShaderResourceViews (to VS and PS in my case). This requirement may be caused by the use of compute shaders since I bind/unbind UAVs to buffers that are bound to VS / PS later on.
I have lost many hours of work before I detected that this rebinding was necessary. And I guess that many coders aren’t sure either and prefer to unbind and rebind a “little too much” instead of running into a similar trap.
Question 1: Are there at least some rules of thumb regarding this problem?
Question 2: Is it possible that my ShaderResourceViews bound to VS/PS are unbound by the driver/DirectX core because I bind UAVs to the same buffers before the CS dispatch call (I don’t unbind the SRVs myself)?
Question 3: I don't even set VS/PS to null before I use the compute shaders. Works w/o problems yet I feel constantly unsure whether or not I'm digging my next trap using such a "lazy" approach.
You want to have as less overhead, but also while avoiding invalid pipeline state. That's why some people unbind everything (try to prevent as much), it depends on uses cases, and of course you can balance this a bit.
To balance this you can pre allocate a specific resource to a slot, depending on resource type, since you have a different number of slots, different rules can apply
1/Samplers and States
You have 16 slots, and generally 4-5 samplers you use 90% of the time (linear/point/anisotropic/shadow).
So on application startup create those states and bind them to each shader stage you need (try not to start at zero slot, since they would easily be overriden by mistake).
Create a shader header file with mapping SamplerState -> slot, and use it in your shaders, so any slot update is reflected automatically.
Reuse this as much as possible, and only bind custom samplers.
For standard states (Blend/Depth/Rasterizer), building a small collection of common states on application startup and bind as needed is common practice.
An easy way to minimize Render State binding at low cost, you can build a stack, so you set a default state, and if a shader needs a more specific state, it can push new state to the stack, once it's done, pop last state and apply it again to the pipeline.
2/Constant Buffers
You have 14 slots, which is quite a lot, it's pretty rare (at least in my use cases) to use all of them, specially now you can also use buffers/structuredbuffers as well.
One simple common case is setting a reserved slots for camera (with all data that you need, view/projection/viewprojection, plus their inverses since you might need that too.
Bind it to (all if needed) shader stage slots, and only thing you have to do is update your cbuffer every frame, it's ready to use anywhere.
3/Shader stages
You pretty much never need to unbind Compute Shader, since it's fully separated from the pipeline.
On the other side, for pipeline stage, instead of unbinding, a reasonably good practice is to set all the ones you need and set to null the ones you don't.
If you don't follow this as example and render a shadow map (depth buffer only), a pixel shader might still be bound.
If you forget to unset a Geometry Shader that you previously used, you might end up with invalid layout combination and your object will not render (error will only show up in runtime debug mode).
So setting the full shader stage adds little overhead, but the safety trade off is very far from negligible.
In your use case (using only VS/PS and CS to build), you can safely ignore that.
4/Uavs-RenderTargets-DepthStencil
For write resources, always unset when you done with unit of work. Within the same routine you can optimize inside, but at the end of your render/compute shader function, set your output back to null, since pipeline will not allow anything to be rebound as ShaderResource while it's on output.
Not unsetting a write resource at the end of your function is recipe for disaster.
5/ShaderResourceView
This is very situational, but idea is to minimize while also avoiding runtime warnings (which can be harmless, but then hide important messages).
One eventual thing is to reset to null all shader resource inputs at the beginning of the frame, to avoid a buffer still bound in VS to be set as UAV in CS for example, this costs you 6 pipeline calls per frame, but it's generally worth it.
If you have enough spare registers and some constant resources you can also of course set those in some reserved slots and bind those once and for all.
6/IA related resources
For this one, you need to set the right data to draw your geometry, so everytime you bind it it's pretty reasonable to set InputLayout/Topology . You can of course organize your draw calls to minimize switches.
I find Topology to be rather critical to be set properly, since invalid topology (for example, using Triangle List with a pipeline including tesselation), will draw nothing and give you a runtime warning, but it's very common that on AMD card it will just crash your driver, so better to avoid that as it becomes rather hard to debug.
Generally never really unbinding vertex/index buffers (since just overwriting them and Input layout tells how to fetch anyway).
Only exception to this rule if in the case those buffers are generated in compute/stream out, to avoid the above mentioned runtime warning.
Answer 1 : less is better.
Answer 2 : it is the opposite, you have to unbind a view before you bind the resource with a different kind of view. You should enable the debug layer to catch errors like this.
Answer 3 : that's fine.

Garbage collecting a circular list in ruby

I'm learning about ruby's mark and sweep approach to garbage collecting. I bumped into a few threads here and there (and this article via a SO thread which I can no longer spot), but they seemed to apply to older versions of ruby and a the information in them wasn't always consistent. (As things stand I'm getting the impression that it's mostly reference counting.)
Might anyone with some unrstanding of ruby 1.9.2's internals be around to chime in, on whether ruby knew how to handle the trickier back references and circular references? (Ideally with a few details/good pointers on how it's actually implemented.)
Mark-and-sweep GC, like almost every algorithm commonly labeled as garbage collection save reference counting, handles circular references just fine. This has nothing to do with the specific implementation. Regardless of the actual GC used by Ruby 1.9, it won't have trouble with cycles. Here's a sketch of the approach of mark-and-sweep collectors, but be assured that other collection schemes handle cyclic references just as well.
Mark all things known to be always reachable ("roots", basically everything that's directly in scope - global variables, local variables, etc.)
Mark all not-yet-marked objects referenced by marked objects
Repeat 2 until no references from marked to not-yet-marked objects remain
Enumerate all objects allocated, deallocates those not marked
You see, a circle of references that's reachable "from the outside" doesn't lead to infinite recursion (we don't visit a given object's references more than once) and a circle of references that isn't reachable isn't marked as reachable and thus freed (each element independently) after marking.

How does the Garbage Collection mechanism work? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
In a lay-man terminology how does the garbage collection mechanism work?
How an object is identified to be available for garbage collection?
Also, what do Reference Counting, Mark and Sweep, Copying, Train mean in GC algorithms?
When you use a language with garbage collection you wont get access to the memory directly. Rather you are given access to some abstraction on top of that data. One of the things that is properly abstracted away is the the actual location in memory of the data block, as well as pointers to other datablocks. When the garbage collector runs (this happens occasionally) it will check if you still hold a reference to each of the memory blocks it has allocated for you. If you don't it will free that memory.
The main difference between the different types of garbage collectors is their efficiency as well as any limitations on what kind of allocation schemes they can handle.
The simplest is properly reference counting. When ever you create a reference to an object an internal counter on that object is incremented, when you chance the reference or it is no longer in scope, the counter on the (former) target object is decremented. When this counter reaches zero, the object is no longer referred at all and can be freed.
The problem with reference counting garbage collectors is that they cannot deal with circular data. If object A has a reference to object B and that in turn has some (direct or indirect) reference to object A, they can never be freed, even if none of the objects in the chain are refereed outside the chain (and therefore aren't accessible to the program at all).
The Mark and sweep algorithm on the other hand can handle this. The mark and sweep algorithm works by periodically stopping the execution of the program, mark each item the program has allocated as unreachable. The program then runs through all the variables the program has and marks what they point to as reachable. If either of these allocations contain references to other data in the program, that data is then likewise marked as reachable, etc.
This is the mark part of the algorithm. At this point everything the program can access, no matter how indirectly, is marked as reachable and everything the program can't reach is marked as unreachable. The garbage collector can now safely reclaim the memory associated with the objects marked as unreachable.
The problem with the mark and sweep algorithm is that it isn't that efficient -- the entire program has to be stopped to run it, and a lot of the object references aren't going to change.
To improve on this, the mark and sweep algorithm can be extended with so called "generational garbage collection". In this mode objects that have been in the system for some number of garbage collections are promoted to the old generation, which is not checked that often.
This improves efficiency because objects tend to die young (think of a string being changed inside a loop, resulting in perhaps a lifetime of a few hundred cycles) or live very long (the objects used to represent the main window of an application, or the database connection of a servlet).
Much more detailed information can be found on wikipedia.
Added based on comments:
With the mark and sweep algorithm (as well as any other garbage collection algorithm except reference counting) the garbage collection do not run in the context of your program, since it has to be able to access stuff that your program is not capable of accessing directly. Therefore it is not correct to say that the garbage collector runs on the stack.
Reference counting - Each object has
a count which is incremented when
someone takes a reference to the
object, and decremented when someone
releases the reference. When the reference count goes to zero, the object is deleted. COM uses
this approach.
Mark and sweep - Each object has a flag if it is in use. Starting at the root of the object graph (global variables, locals on stacks, etc.) each referenced object gets its flag set, and so on down the chain. At the end, all objects that are not referenced in the graph are deleted.
The garbage collector for the CLR is described in this slidedeck. "Roots" on slide 15 are the sources for the objects that first go into the graph. Their member fields and so on are used to find the other objects in the graph.
Wikipedia describes several of these approaches in much more and better detail.
Garbage collection is simply knowing if there is any future need for variables in your program, and if not, collect and delete them.
Emphasis is on the word Garbage, something that is completely used out in your house is thrown in the trash and the garbage man handles it for you by coming to pick it up and take it away to give you more room in your house trash can.
Reference Counting, Mark and Sweep, Copying, Train etc. are discussed in good detail at GC FAQ
The general way it is done is that the number of references to an object are kept track of in the background, and when that number goes to zero, the object is SUBJECT TO garbage collection, however the GC will not fire up until it is explicitly needed because it is an expensive operation. What happens when it starts is that the GC goes through the managed area of memory and finds every object that has no references left. The gc deletes those objects by first calling their destructors, allowing them to clean up after themselves, then frees the memory. Commonly the GC will then compact the managed memory area by moving every surviving object to one area of memory, allowing more allocations to take place.
Like i said this is one method that i know of, and there is a lot of research being done in this area.
Garbage collection is a big topic, and there are a lot of ways to implement it.
But for the most common in a nutshell, the garbage collector keeps a record of all references to anything created via the new operator, even if that operator's use was hidden from you (for example, in a Type.Create() method). Each time you add a new reference to the object, the root of that reference is determined and added to the list, if needed. A reference is removed whenever it goes out of scope.
When there are no more references to an object, it can (not "will") be collected. To improve performance and make sure necessary cleanup is done correctly, collections are batched for several objects at once and happen over multiple generations.

Resources