Source of Node<Object> in the Visual Studio memory snapshot - visual-studio

I am doing memory profiling for my application using Visual Studio diagnostic tool. I find that there is Node takes up a lot of memory (based on Inclusive Size Diff. (bytes). (see below #1). And when I click on the first instance of Node, 'Referenced Objects', I see Node is referencing other Node. And I see something like 'Overlapped data' in the attribute.
How can I find out where is creating these Node as they are from mscorlib.ni.dll.

The weapon of choice when you are rooting through these .NET Framework objects is a good decompiler. I use Reflector, there are others.
You see an opaque Node<T> object back. Just type it into the Search box, out pops but a few types that use it. Most are in the System.Collections.Concurrent namespace. Well, look no further, the profiler already told you about that one. Clearly it is the Stack<T> class in the System.Collections.Concurrent namespace that's storing Nodes.
Your profiler told you there is just one Stack<> class object that owns these objects. Well good, that narrows it down to just a single object. It just happens to have 208 elements. Hmm, well, not that much, is it?
That's not where you have to stop, the Stack<> class is a pretty useless class, nobody ever actually uses it in their code. Keep using the decompiler and let it search for usages of that class.
Ah, nice, that's a very short list as well. You see System.Data.ProviderBase show up several times, hmm, this question probably isn't related to querying dbases. Only other set of references is System.PinnableBufferCache.
"Pinnable buffers", whoa, that's a match. Pinning a buffer is important when you ask native code to get a job done to fill a managed array. With BeginRead(), the universal asynchronous I/O call. The driver needs a stable reference to the array while it is working on the async I/O request. Getting a stable buffer requires pinning in .NET. And big bingo in the profiler data, you see OverlappedData, the core data-structure in Windows to do async I/O.
Long story short, you found this guy's project. Programmers noticed, not very often.
Knowing when the stop profiling is very important. You cannot change code written by another programmer. And nobody at Microsoft thinks that guy did anything wrong. He didn't, caches are Good Things.
You are most definitely done. Congratulations.

Related

How can I determine what objects ARC is retaining using Instruments or viewing assembly?

This question is not about finding out who retained a particular object but rather looking at a section of code that appears from the profiler to have excessive retain/release calls and figuring out which objects are responsible.
I have a Swift application that after initial porting was spending 90% of its time in retain/release code. After a great deal of restructuring to avoid referencing objects I have gotten that down to about 25% - but this remaining bit is very hard to attribute. I can see that a given chunk of it is coming from a given section of code using the profiler, but sometimes I cannot see anything in that code that should (to my understanding) be causing a retain/release. I have spent time viewing the assembly code in both Instruments (with the side-by-side view when it's working) and also the output of otool -tvV and sometimes the proximity of the retain/release calls to a recognizable section give me a hint as to what is going on. I have even inserted dummy method calls at places just to give me a better handle on where I am in the code and turned off optimization to limit code reordering, etc. But in many cases it seems like I would have to trace the code back to follow branches and figure out what is on the stack in order to understand the calls and I am not familiar enough with x86 to know know if that is practical. (I will add a couple of screenshots of the assembly view in Instruments and some otool output for reference below).
My question is - what else can I be doing to debug / examine / attribute these seemingly excessive retain/release calls to particular code? Is there something else I can do in Instruments to count these calls? I have played around with the allocation view and turned on the reference counting option but it didn't seem to give me any new information (I'm not actually sure what it did). Alternately, if I just try harder to interpret the assembly should I be able to figure out what objects are being retained by it? Are there any other tools or tricks I should know on that front?
EDIT: Rob's info below about single stepping into the assembly was what I was looking for. I also found it useful to set a symbolic breakpoint in XCode on the lib retain/release calls and log the item on the stack (using Rob's suggested "p (id)$rdi") to the console in order to get a rough count of how many calls are being made rather than inspect each one.
You should definitely focus on the assembly output. There are two views I find most useful: the Instruments view, and the Assembly assistant editor. The problem is that Swift doesn't support the Assembly assistant editor currently (I typically do this kind of thing in ObjC), so we come around to your complaint.
It looks like you're already working with the debug assembly view, which gives somewhat decent symbols and is useful because you can step through the code and hopefully see how it maps to the assembly. I also find Hopper useful, because it can give more symbols. Once you have enough "unique-ish" function calls in an area, you can usually start narrowing down how the assembly maps back to the source.
The other tool I use is to step into the retain bridge and see what object is being passed. To do this, instruction-step (^F7) into the call to swift_bridgeObjectRetain. At that point, you can call:
p (id)$rdi
And it should print out at least some type information about the what's being passed ($rdi is correct on x86_64 which is what you seem to be working with). I don't always have great luck extracting more information. It depends on exactly is in there. For example, sometimes it's a ContiguousArrayStorage<Swift.CVarArgType>, and I happen to have learned that usually means it's an NSArray. I'm sure better experts in LLDB could dig deeper, but this usually gets me at least in the right ballpark.
(BTW, I don't know why I can't call p (id)$rdi before jumping inside bridgeObjectRetain, but it gives strange type errors for me. I have to go into the function call.)
Wish I had more. The Swift tool chain just hasn't caught up to where the ObjC tool chain is for tracing this kind of stuff IMO.

What is System.Signature?

When running Visual Studio's memory profiler (Memory Usage Analysis) on my Windows Store app it shows me that there are many objects of type System.Signature (mscorlib.dll) on the heap.
I can't find System.Signature in the Object Browser, so I assume that it is some internal class only used and only usable by the Framework.
Does anyone have more information about this class?
When or why are System.Signature objects created and what are they used for internally?
According to the memory profiler, the number of System.Signature objects raises constantly while the app is used. This looks like a memory leak to me, but where do these objects come from or what creates these objects? To answer this question, I first need to know what System.Signature actually is.

What kind of memory leaks XCode Analyzer may not notice?

I'm afraid that asking this question may result in some downvotes, but after making some not satisfying research I decided to take a risk and ask more experienced people...
There are many questions here referring to some specific problems connected with the XCode Analayzer Tool. It seems to be very helpful solution. But I would like to ask you - as a beginner in iOS world - what kind of memory management stuff cannot be noticed by this tool.
In other words, are there any common memory management aspects, about which the iOS beginners should think "Oh, be careful with that, because in this case XCode Analyzer may not warn you about your mistake"...
For instance, I've found here Why cannot XCode static analyzer detect un-released retained properties? that:
(...)the analyzer can't reliably detect retain/release issues across
method/library boundaries(...)
It sounds like a good hint to consider, but maybe you know about some other common issues...
The analyzer is very good at finding the routine leaks that plague new programmers writing non-ARC code (failures to call release, returning objects of the wrong retain count, etc.).
In my experience, there are a couple of types of memory issues it does not find:
It cannot generally identify strong reference cycles (a.k.a. retain cycles). For example, you add a repeating NSTimer to a view controller, unaware that the timer maintains a strong reference to the view controller, and if you don't invalidate the timer (or do it in the wrong place, such as the dealloc method), neither the view controller nor the timer will get released.
It cannot find circular logic errors. For example, if you have some circular references where view controller A presents view controller B, which in turn presents a new copy of A (rather than dismissing/popping to get back to A).
It cannot find many non-reference counting memory issues. While it's getting better in dealing with Core Foundation functions, if you have code that is doing manual memory allocations (such as via malloc and free), the static analyzer may be of limited use. The same is true whenever you're using non-reference counting code (e.g. you use SQLite sqlite3_prepare_v2 and fail to call sqlite3_finalize).
I'm sure that's not a complete list of what it doesn't find, but those are the common issues I see asked about on Stack Overflow for which the static analyzer will be of limited help. But the analyzer is still a wonderful tool (it finds issues other than memory issues, too) and for those individuals not using ARC, it's invaluable.
Having said that, while the static analyzer is an under-appreciated first line of defense, you really should use Instruments to find leaks. See Locating Memory Issues in Your App in the Instruments User Guide. That's the best way to identify leaks.

Extending functionality of existing program I don't have source for

I'm working on a third-party program that aggregates data from a bunch of different, existing Windows programs. Each program has a mechanism for exporting the data via the GUI. The most brain-dead approach would have me generate extracts by using AutoIt or some other GUI manipulation program to generate the extractions via the GUI. The problem with this is that people might be interacting with the computer when, suddenly, some automated program takes over. That's no good. What I really want to do is somehow have a program run once a day and silently (i.e. without popping up any GUIs) export the data from each program.
My research is telling me that I need to hook each application (assume these applications are always running) and inject a custom DLL to trigger each export. Am I remotely close to being on the right track? I'm a fairly experienced software dev, but I don't know a whole lot about reverse engineering or hooking. Any advice or direction would be greatly appreciated.
Edit: I'm trying to manage the availability of a certain type of professional. Their schedules are stored in proprietary systems. With their permission, I want to install an app on their system that extracts their schedule from whichever system they are using and uploads the information to a central server so that I can present that information to potential clients.
I am aware of four ways of extracting the information you want, both with their advantages and disadvantages. Before you do anything, you need to be aware that any solution you create is not guaranteed and in fact very unlikely to continue working should the target application ever update. The reason is that in each case, you are relying on an implementation detail instead of a pre-defined interface through which to export your data.
Hooking the GUI
The first way is to hook the GUI as you have suggested. What you are doing in this case is simply reading off from what an actual user would see. This is in general easier, since you are hooking the WinAPI which is clearly defined. One danger is that what the program displays is inconsistent or incomplete in comparison to the internal data it is supposed to be representing.
Typically, there are two common ways to perform WinAPI hooking:
DLL Injection. You create a DLL which you load into the other program's virtual address space. This means that you have read/write access (writable access can be gained with VirtualProtect) to the target's entire memory. From here you can trampoline the functions which are called to set UI information. For example, to check if a window has changed its text, you might trampoline the SetWindowText function. Note every control has different interfaces used to set what they are displaying. In this case, you are hooking the functions called by the code to set the display.
SetWindowsHookEx. Under the covers, this works similarly to DLL injection and in this case is really just another method for you to extend/subvert the control flow of messages received by controls. What you want to do in this case is hook the window procedures of each child control. For example, when an item is added to a ComboBox, it would receive a CB_ADDSTRING message. In this case, you are hooking the messages that are received when the display changes.
One caveat with this approach is that it will only work if the target is using or extending WinAPI controls.
Reading from the GUI
Instead of hooking the GUI, you can alternatively use WinAPI to read directly from the target windows. However, in some cases this may not be allowed. There is not much to do in this case but to try and see if it works. This may in fact be the easiest approach. Typically, you will send messages such as WM_GETTEXT to query the target window for what it is currently displaying. To do this, you will need to obtain the exact window hierarchy containing the control you are interested in. For example, say you want to read an edit control, you will need to see what parent window/s are above it in the window hierarchy in order to obtain its window handle.
Reading from memory (Advanced)
This approach is by far the most complicated but if you are able to fully reverse engineer the target program, it is the most likely to get you consistent data. This approach works by you reading the memory from the target process. This technique is very commonly used in game hacking to add 'functionality' and to observe the internal state of the game.
Consider that as well as storing information in the GUI, programs often hold their own internal model of all the data. This is especially true when the controls used are virtual and simply query subsets of the data to be displayed. This is an example of a situation where the first two approaches would not be of much use. This data is often held in some sort of abstract data type such as a list or perhaps even an array. The trick is to find this list in memory and read the values off directly. This can be done externally with ReadProcessMemory or internally through DLL injection again. The difficulty lies mainly in two prerequisites:
Firstly, you must be able to reliably locate these data structures. The problem with this is that code is not guaranteed to be in the same place, especially with features such as ASLR. Colloquially, this is sometimes referred to as code-shifting. ASLR can be defeated by using the offset from a module base and dynamically getting the module base address with functions such as GetModuleHandle. As well as ASLR, a reason that this occurs is due to dynamic memory allocation (e.g. through malloc). In such cases, you will need to find a heap address storing the pointer (which would for example be the return of malloc), dereference that and find your list. That pointer would be prone to ASLR and instead of a pointer, it might be a double-pointer, triple-pointer, etc.
The second problem you face is that it would be rare for each list item to be a primitive type. For example, instead of a list of character arrays (strings), it is likely that you will be faced with a list of objects. You would need to further reverse engineer each object type and understand internal layouts (at least be able to determine offsets of primitive values you are interested in in terms of its offset from the object base). More advanced methods revolve around actually reverse engineering the vtable of objects and calling their 'API'.
You might notice that I am not able to give information here which is specific. The reason is that by its nature, using this method requires an intimate understanding of the target's internals and as such, the specifics are defined only by how the target has been programmed. Unless you have knowledge and experience of reverse engineering, it is unlikely you would want to go down this route.
Hooking the target's internal API (Advanced)
As with the above solution, instead of digging for data structures, you dig for the internal API. I briefly covered this with when discussing vtables earlier. Instead of doing this, you would be attempting to find internal APIs that are called when the GUI is modified. Typically, when a view/UI is modified, instead of directly calling the WinAPI to update it, a program will have its own wrapper function which it calls which in turn calls the WinAPI. You simply need to find this function and hook it. Again this is possible, but requires reverse engineering skills. You may find that you discover functions which you want to call yourself. In this case, as well as being able to locate the location of the function, you have to reverse engineer the parameters it takes, its calling convention and you will need to ensure calling the function has no side effects.
I would consider this approach to be advanced. It can certainly be done and is another common technique used in game hacking to observe internal states and to manipulate a target's behaviour, but is difficult!
The first two methods are well suited for reading data from WinAPI programs and are by far easier. The two latter methods allow greater flexibility. With enough work, you are able to read anything and everything encapsulated by the target but requires a lot of skill.
Another point of concern which may or may not relate to your case is how easy it will be to update your solution to work should the target every be updated. With the first two methods, it is more likely no changes or small changes have to be made. With the second two methods, even a small change in source code can cause a relocation of the offsets you are relying upon. One method of dealing with this is to use byte signatures to dynamically generate the offsets. I wrote another answer some time ago which addresses how this is done.
What I have written is only a brief summary of the various techniques that can be used for what you want to achieve. I may have missed approaches, but these are the most common ones I know of and have experience with. Since these are large topics in themselves, I would advise you ask a new question if you want to obtain more detail about any particular one. Note that in all of the approaches I have discussed, none of them suffer from any interaction which is visible to the outside world so you would have no problem with anything popping up. It would be, as you describe, 'silent'.
This is relevant information about detouring/trampolining which I have lifted from a previous answer I wrote:
If you are looking for ways that programs detour execution of other
processes, it is usually through one of two means:
Dynamic (Runtime) Detouring - This is the more common method and is what is used by libraries such as Microsoft Detours. Here is a
relevant paper where the first few bytes of a function are overwritten
to unconditionally branch to the instrumentation.
(Static) Binary Rewriting - This is a much less common method for rootkits, but is used by research projects. It allows detouring to be
performed by statically analysing and overwriting a binary. An old
(not publicly available) package for Windows that performs this is
Etch. This paper gives a high-level view of how it works
conceptually.
Although Detours demonstrates one method of dynamic detouring, there
are countless methods used in the industry, especially in the reverse
engineering and hacking arenas. These include the IAT and breakpoint
methods I mentioned above. To 'point you in the right direction' for
these, you should look at 'research' performed in the fields of
research projects and reverse engineering.

Win from application gets slow

I have built an application using Visual Studio .NET and it works fine. After the application is used for more than 2-3 hours it starts to get slow and I don't know why. I have used GC.Collect(); to get memory leak problems but now I have the new one.
Does anyone know a solution?
If you really have a memory leak, just calling GC.Collect() will get you nowhere. The GarbageCollector can only collect those objects, that are not referenced from others anymore.
If you do not cleanup your objects properly, the GC will not collect anything.
When handling with memory consumptions, you should strongly consider the following patterns:
Weak Events (MSDN Documentation here)
If you do not unsubscribe from events, the subscribing objects will never be released into the Garbage Collection. GC.Collect() will NOT remove those objects and they will clutter your memory.
Implement the IDisposable interface (MSDN documentation here)
(I strongly suggest to read this ducumentation as I have seen lots of wrong implementations.)
You should always free resources that you used. Call Dispose() on every object that offers it!
The same applies to streams. Always call Close() on every object that offers this.
To make points 2. and 3. easier you can use the using blocks. (MSDN documentation here)
As soon as these code blocks go out of scope they automatically call the appropriate Dispose() or Close() methods on the given object. This is the same, but more convinient, as using a try... finally combination.
Try a memory profiler, such as the ANTS Memory Profiler. First you need to understand what's going on, then you can think about how to fix it.
http://www.red-gate.com/products/dotnet-development/ants-memory-profiler/

Resources