I'm trying to use boundschecker to analyze a rather complex program. Running the program with boundschecker is almost too slow for it to be of any use since it takes me almost a day to run the program to the point in the code where I suspect the issue exists. Can anyone give me some ideas for how to inspect only certain parts of my software using boundschecker (DevPartner) in Visual Studio 2005?
I last used BoundsChecker a few years ago, and had the same problems. With large projects, it makes everything run so slowly that it is useless. We ended up ditching it.
But, we still needed some of it's functionality, but like you, not for the whole program. So we had to do it ourselves.
In our case, we mainly used it to try and track down memory leaks. If that's your objective as well, there are other options.
Visual Studio does a pretty good job of telling you about memory leaks when your program exits
It reports leaks in the order that they were created
It will tell you exactly where the leaked memory was created if your source files have this at the top
#ifdef _DEBUG
#undef THIS_FILE
static char THIS_FILE[]=__FILE__;
#define new DEBUG_NEW
Those help a lot, but it's often not enough. Adding that snippet everywhere isn't always feasible. If you use factory classes, knowing where memory was allocated doesn't help at all. So when all else fails, we take advantage of #2.
Add something like the following:
#define LEAK(str) {char *s = (char*)malloc(100); strcpy(s, str);}
Then, pepper your code with "LEAK("leak1");" or whatever. Run the program, and exit it. Your new leaked strings will display in Visual Studio's leak dump surrounding the existing leaks. Keep adding/moving your LEAK statements and re-running the program to narrow your search until you've pinpointed the exact location. Then fix the leak, remove your debugging leaks, and you're all set!

BoundsChecker tracks all memory allocations and releases in extreme detail. It knows, for instance, that such and such a memory allocation was done from the C runtime heap, which in turn was taken from a Win32 heap, which in turn started life as memory allocated by VirtualAlloc. If the application was instrumented (FinalCheck), it also has detailed information as to which pointers reference the memory.
This is one reason (of many) why the thing is slow.
If BC were to connect to an application late, it would have none of this data built up, and would have either (1) dig it all up at once, or (2) start guessing about things. Neither solution is very practical.

One way to lighten up BoundsChecker is by excluding from instrumentation all but the few modules you are interested in. I know thats not great because if you knew where the leak was you wouldn't need BoundsChecker. What I usually recommend is that you use BC's Active Check mode first with only Memory Tracking available. You miss the API Validations but you could always rerun that seperately. After you run Active Check and you get clues regarding which modules tend to be problematic, only then do you enable instrumentation for the module or modules of interest and their dependencies. We know Final Check is annoyingly slow but as Mistiano correctly states, with Final Check not only does BC keep a graph of all allocated blocks but also all pointers and contexts to them. Therein lies the magic in how Final Check can nail leaks and corruptions at the point of occurance, not just on application shutdown or fault. Shameless plug: I work on the DevPartner team. We are releasing DPS 10.5 on February 4, 2011 with x64 application support in BC. Unlike the relatively ancient and undersold BC64 for Itanium which only provided Active Check, DPS 10.5 provides full Final Check support for x64 applications, both for pure C++ and for native modules running in .NET processes. See under MF Developer for details.


Are there any compiler options to make x64 release crash dumps more usable?

Whenever I get a crash dump for an x64 release build of my application I find its rare I can view the locals, this makes it hard or impossible to fix some issues. In x86 I can usually view all locals without any issues.
Are there any compiler options in the release build that will allow me to view the locals in release build crash dumps? Obviously I don't want to turn off optimizations but perhaps there is some way to force it to save the locals with minor performance impact?
You've said a couple things that hint at why you can't see locals...
#1 - It's a release build.
With certain optimizations turned on, the compiler is free to do a few things that make looking at locals more difficult.
It can inline a function. When this happens, the locals of the function that aren't optimized away are mixed with the calling stack frame.
It can free up a register and save a couple clock cycles on the function call using a trick called Frame-Pointer Omission.
In order to save stack space, the compiler can reuse the location that held a variable earlier in the function body to hold a different variable later in the function. This means that where you are in the function determines which locals you are actually able to see.
#2 - It's an x64 build.
MSVC uses a new calling convention for 64-bit code aptly called the x64 Calling Convention. The first 4 function arguments are stored in registers instead of on the stack. This means that even though you are looking at a stack frame, you will not see some of the arguments and you may not even be able to recover them if the registers have been reused for something else by the time you look at them.
So, now what?
Now that we know why you are going to have such a difficult time, let's see what you can do to get around the issues above. None of those issues are really show stoppers, but they all work together to make things just that much more difficult for you.
Turn off some optimizations. You can try making with a release build with optimizations at a level that doesn't impede debugging quite so much. You would probably want to start with the optimizations mentioned above that play with stack frames (/Oy and /Ob). Then you would need to hope that you can still reproduce the issue with those optimizations turned off. Also, depending on your internal policy and the contract that you have with your customer, you may need to involve a lawyer before sending an unofficial build to the customer -- probably not the most fun thing in the world.
Build a better symbol file. VS2012 and higher has a new compiler switch, /d2Zi+ in VS2012 and /Zo in VS2013, that generates better debug info when optimizations are turned on. This puts debugging optimized code on par with GCC/Clang. Even though it is undocumented in VS2012, I'd still consider it pretty safe since I saw no difference in the generated code -- only in the symbol file. You may even be able to rebuild locally with this flag and force windbg to use your new symbol file via .symopt+ 0x40. This gives you a chance to get more out of the dumps you already have.
Use windbg extensions to do the heavy lifting. In other StackOverflow answers, I've mentioned a tool called CMKD that has saved my bacon a couple times. It, among other things, attempts to reconstruct the arguments in the x64 calling convention that were passed in registers. It's not a sure thing, but it is probably the best hope of getting them back.
Anyway, I hope my ramblings will prove helpful in your debugging.

Out of Memory Message box

I have an MFC application developed with VS2003
It is working fine in XP vista etc.
But when i have executed it in windows 8, and we use it for some time,
then no window is displayed. Instead of that the a MessageBox with a message 'Out of Memory' is displayed. And the Message box is Having the caption of my application.
This issue is rarely occurred in windows 7 too.
I have tried watching the handles using tools like processexplorer and it is not increasing.
Also many forums says that it is because of increase in unclosed handles or resources.
Can any one suggest how can i find where the issue is. Or any one provide possible reason for this.
I cant setup the devenv in the machine causing the issue. I am confused how to diagnose by executing a test build in that.
You clearly have a memory leak somewhere. It's hard to be any more specific without seeing the code.
A debugger is really the best way to solve this problem. If you can reproduce the problem on your development machine, that would be the easiest case. If not, you can attach a debugger to the running process on another machine, either locally or remotely.
The MFC libraries also support some basic memory leak detection, turned on by default for Debug builds and controllable for other builds using the AfxEnableMemoryTracking function. You can use this feature to obtain information about which blocks of memory were allocated but not properly deallocated (i.e. were leaked).
Like you mentioned, Process Explorer is another good way to track down resource leaks. Are you sure that the handle counts are remaining constant rather than trending upwards over time? If the values in the columns are never changing like the question suggests, then you are surely doing something wrong. Your application has to be creating objects in order to do its job. The point is to make sure that it disposes of them when it is finished.
If you can't reproduce the problem with the running application and have only the source code available, you'll need to go through the code and make sure that every use of new has a corresponding use of delete (and that new[] matches up with delete[]). And in general in C++, you should avoid explicit dynamic memory allocation wherever possible. Instead, use the container classes that are provided either by MFC or the standard library. For example, don't allocate arrays manually, use std::vector to do it for you. These container classes ensure that the memory is automatically deallocated in the destructor when the object goes out of scope.

Are there memory leaks in Linux?

This question is purely theoretical.
I was wondering whether the Linux source code could have memory leaks, and how they debugged it, considering that it is Linux, after all, that deals with each program's memory?
I obviously understand that Linux, being written in C, has to deal itself with malloc and free. What I don't understand is how we measure the operating system's memory leaks.
Note that this question is not Linux-specific; it also addresses the corresponding issues in Windows and MacOS X (darwin).
Quite frequently non-mainstream drivers and the staging tree has memory leaks. Follow the LKML and you can see occasional fixes for mistakes in the networking code for corner cases handling lists of SKBs.
Due to the nature of the kernel most work is code review and refactoring, but work is ongoing to make more tools:
In certain cases you can use frameworks like Usermode Linux and then use conventional tools like Valgrind to attempt to peer into the running code:
The implementation of malloc and free (actually brk/sbrk, since malloc and free are implemented by libc in-process) are not magical or special - it's just code, just like anything else, and there are data structures behind it that describe the mappings.
If you want to test correct behavior, one way is to write test programs in user-space that are known to allocate then free all their memory correctly. Run the app, then check the internal memory allocation structures in kernel mode using a debugger (or better yet, make this check a debug assert on process shutdown).
All software has bugs, including operating systems. Some of those bugs will result in memory leaks.
The Linux has a kernel debugger to help track down these things, but one has to discover that they exist before one can track them down. Usually, once a bug has been discovered and can be replicated at will, it becomes much easier to fix (Relatively speaking! Obviously you need a good coder to do the job). The hard part is finding the bugs in the first place and creating reliable test cases that demonstrate them. This is where you need to have a skilled QA team.
So I guess the short version of this answer is that good QA is as important good coding.

Finding GDI/User resource usage from a crash dump

I have a crash dump of an application that is supposedly leaking GDI. The app is running on XP and I have no problems loading it into WinDbg to look at it. Previously we have use the Gdikdx.dll extension to look at Gdi information but this extension is not supported on XP or Vista.
Does anyone have any pointers for finding GDI object usage in WinDbg.
Alternatively, I do have access to the failing program (and its stress testing suite) so I can reproduce on a running system if you know of any 'live' debugging tools for XP and Vista (or Windows 2000 though this is not our target).
I've spent the last week working on a GDI leak finder tool. We also perform regular stress testing and it never lasted longer than a day's worth w/o stopping due to user/gdi object handle overconsumption.
My attempts have been pretty successful as far as I can tell. Of course, I spent some time beforehand looking for an alternative and quicker solution. It is worth mentioning, I had some previous semi-lucky experience with the GDILeaks tool from msdn article mentioned above. Not to mention that i had to solve a few problems prior to putting it to work and this time it just didn't give me what and how i wanted it. The downside of their approach is the heavyweight debugger interface (it slows down the researched target by orders of magnitude which I found unacceptable). Another downside is that it did not work all the time - on some runs I simply could not get it to report/compute anything! Its complexity (judging by the amount of code) was another scare-away factor. I'm not a big fan of GUIs, as it is my belief that I'm more productive with no windows at all ;o). I also found it hard to make it find and use my symbols.
One more tool I used before setting on to write my own, was the leakbrowser.
Anyways, I finally settled on an iterative approach to achieve following goals:
minor performance penalties
implementation simplicity
non-invasiveness (used for multiple products)
relying on as much available as possible
I used detours (non-commercial use) for core functionality (it is an injectible DLL). Put Javascript to use for automatic code generation (15K script to gen 100K source code - no way I code this manually and no C preprocessor involved!) plus a windbg extension for data analysis and snapshot/diff support.
To tell the long story short - after I was finished, it was a matter of a few hours to collect information during another stress test and another hour to analyze and fix the leaks.
I'll be more than happy to share my findings.
P.S. some time did I spend on trying to improve on the previous work. My intention was minimizing false positives (I've seen just about too many of those while developing), so it will also check for allocation/release consistency as well as avoid taking into account allocations that are never leaked.
Edit: Find the tool here
There was a MSDN Magazine article from several years ago that talked about GDI leaks. This points to several different places with good information.
In WinDbg, you may also try the !poolused command for some information.
Finding resource leaks in from a crash dump (post-mortem) can be difficult -- if it was always the same place, using the same variable that leaks the memory, and you're lucky, you could see the last place that it will be leaked, etc. It would probably be much easier with a live program running under the debugger.
You can also try using Microsoft Detours, but the license doesn't always work out. It's also a bit more invasive and advanced.
I have created a Windbg script for that. Look at the answer of
Command to get GDI handle count from a crash dump
To track the allocation stack you could set a ba (Break on Access) breakpoint past the last allocated GDICell object to break just at the point when another GDI allocation happens. That could be a bit complex because the address changes but it could be enough to find pretty much any leak.

Comparing cold-start to warm start

Our application takes significantly more time to launch after a reboot (cold start) than if it was already opened once (warm start).
Most (if not all) the difference seems to come from loading DLLs, when the DLLs' are in cached memory pages they load much faster. We tried using ClearMem to simulate rebooting (since its much less time consuming than actually rebooting) and got mixed results, on some machines it seemed to simulate a reboot very consistently and in some not.
To sum up my questions are:
Have you experienced differences in launch time between cold and warm starts?
How have you delt with such differences?
Do you know of a way to dependably simulate a reboot?
Clarifications for comments:
The application is mostly native C++ with some .NET (the first .NET assembly that's loaded pays for the CLR).
We're looking to improve load time, obviously we did our share of profiling and improved the hotspots in our code.
Something I forgot to mention was that we got some improvement by re-basing all our binaries so the loader doesn't have to do it at load time.
As for simulating reboots, have you considered running your app from a virtual PC? Using virtualization you can conveniently replicate a set of conditions over and over again.
I would also consider some type of profiling app to spot the bit of code causing the time lag, and then making the judgement call about how much of that code is really necessary, or if it could be achieved in a different way.
It would be hard to truly simulate a reboot in software. When you reboot, all devices in your machine get their reset bit asserted, which should cause all memory system-wide to be lost.
In a modern machine you've got memory and caches everywhere: there's the VM subsystem which is storing pages of memory for the program, then you've got the OS caching the contents of files in memory, then you've got the on-disk buffer of sectors on the harddrive itself. You can probably get the OS caches to be reset, but the on-disk buffer on the drive? I don't know of a way.
How did you profile your code? Not all profiling methods are equal and some find hotspots better than others. Are you loading lots of files? If so, disk fragmentation and seek time might come into play.
Maybe even sticking basic timing information into the code, writing out to a log file and examining the files on cold/warm start will help identify where the app is spending time.
Without more information, I would lean towards filesystem/disk cache as the likely difference between the two environments. If that's the case, then you either need to spend less time loading files upfront, or find faster ways to load files.
Example: if you are loading lots of binary data files, speed up loading by combining them into a single file, then do a slerp of the whole file into memory in one read and parse their contents. Less disk seeks and time spend reading off of disk. Again, maybe that doesn't apply.
I don't know offhand of any tools to clear the disk/filesystem cache, but you could write a quick application to read a bunch of unrelated files off of disk to cause the filesystem/disk cache to be loaded with different info.
#Morten Christiansen said:
One way to make apps start cold-start faster (sort of) is used by e.g. Adobe reader, by loading some of the files on startup, thereby hiding the cold start from the users. This is only usable if the program is not supposed to start up immediately.
That makes the customer pay for initializing our app at every boot even when it isn't used, I really don't like that option (neither does Raymond).
One succesful way to speed up application startup is to switch DLLs to delay-load. This is a low-cost change (some fiddling with project settings) but can make startup significantly faster. Afterwards, run depends.exe in profiling mode to figure out which DLLs load during startup anyway, and revert the delay-load on them. Remember that you may also delay-load most Windows DLLs you need.
A very effective technique for improving application cold launch time is optimizing function link ordering.
The Visual Studio linker lets you pass in a file lists all the functions in the module being linked (or just some of them - it doesn't have to be all of them), and the linker will place those functions next to each other in memory.
When your application is starting up, there are typically calls to init functions throughout your application. Many of these calls will be to a page that isn't in memory yet, resulting in a page fault and a disk seek. That's where slow startup comes from.
Optimizing your application so all these functions are together can be a big win.
Check out Profile Guided Optimization in Visual Studio 2005 or later. One of the thing sthat PGO does for you is function link ordering.
It's a bit difficult to work into a build process, because with PGO you need to link, run your application, and then re-link with the output from the profile run. This means your build process needs to have a runtime environment and deal cleaning up after bad builds and all that, but the payoff is typically 10+ or more faster cold launch with no code changes.
There's some more info on PGO here:
As an alternative to function order list, just group the code that will be called within the same sections:
#pragma code_seg(".startUp")
#pragma code_seg
#pragma data_seg(".startUp")
#pragma data_seg
It should be easy to maintain as your code changes, but has the same benefit as the function order list.
I am not sure whether function order list can specify global variables as well, but use this #pragma data_seg would simply work.
One way to make apps start cold-start faster (sort of) is used by e.g. Adobe reader, by loading some of the files on startup, thereby hiding the cold start from the users. This is only usable if the program is not supposed to start up immediately.
Another note, is that .NET 3.5SP1 supposedly has much improved cold-start speed, though how much, I cannot say.
It could be the NICs (LAN Cards) and that your app depends on certain other
services that require the network to come up. So profiling your application alone may not quite tell you this, but you should examine the dependencies for your application.
If your application is not very complicated, you can just copy all the executables to another directory, it should be similar to a reboot. (Cut and Paste seems not work, Windows is smart enough to know the files move to another folder is cached in the memory)
