Best Ways to Debug a Release Mode Application - debugging

Im sure this has happened to folks before, something works in debug mode, you compile in release, and something breaks.
This happened to me while working on a Embedded XP environment, the best way i found to do it really was to write a log file to determine where it would go wrong.
What are your experiences/ discoveries trying to tackle an annoying Release-mode bug?

Make sure you have good debug symbols available (you can do this even with a release build, even on embedded devices). You should be able to get a stack trace and hopefully the values of some variables. A good knowledge of assembly language is probably also useful at this point.
My experience is that generally the bug is related to code that is near the area of breakage. That is to say, if you are seeing an issue arising in the function "LoadConfigInfoFromFile" then probably you should start by closely analysing that for issues, rather than "DrawControlsOnScreen", if you know what I mean. "Spooky action at a distance" type bugs do not tend to arise often (although when they do, they tend to be a major bear).

Tracefile is always a good idea.
When it's about crashes, I'm using adplus, which is part of debugging tools for windows. basically what adplus does, is, it attaches windbg to the executable you're monitoring. When the application crashes, you get a crash dump and a log file. You can load the crash dump in your preferred debugger and find out, which instruction lead to the crash.
As release builds are heavily optimized compared to debug builds, the way you compile your code affects its behaviour. This is basically true when crashes in multithreaded code happen in the release version but not the debug version. adplus and windbg helped me, to find out, where this happened.
ADPlus is explained here:
httx://support.microsoft.com/?scid=kb%3Ben-us%3B286350&x=15&y=12
Basically what you have to do is:
1. Download and install WinDbg into C:\debuggers
httx://www.microsoft.com/whdc/devtools/debugging/default.mspx
Start your application
open a cmd and cd to c:\debuggers
start adplus like this:
"adplus.bat -crash your_exe.exe"
reproduce the crash
analyze the crashdump in vs2005 or in windbg

If it's only a small portion of the application that needs debugging then you can change those source files only to be built without optimisations. Presumably you generate debug info for all builds, and so this makes the application run mostly as it would in release, but allows you to debug the interesting parts properly.

How about using Trace statements. They are there for Release mode value checking.
Trace.WriteLine(myVar);

I agree on log file debugging to narrow it down.
I've used "Entering FunctionName" "Leaving FunctionName" until I can find what method it enters before the crash. Then I add more log messages re-compile and re-release.

Besides playing with turning off optimization and/or turning on debug information for your Release build as pauldoo said, a log file will good data can really help. I once wrote a "trace" app that would capture trace logs for the app if it was running when the release build started (otherwise the results would go to the debugger's output window if running under the debugger). I was able to have end-users email me log files from them reproducing the bugs they were seeing, and it was the only way I would have found the problem in at least one case.

Though it's probably not usable in an embedded environment, I've had good luck with WinDbg for debugging release-mode Windows applications. Even if the application is not compiled with symbol information, you can at least get a usable stack trace and plenty of other useful crash information.

You could also copy your debug symbols to the production environment even if it's compiled in relase mode
Here's an article with more information

If you problem is synchronization related dumping log in the file might be problematic.
In this case i usually will use some big array of string and dump this to screen/file after the problem was reproduces.
This is of course depend on your memory restriction, sometime i use just few symbols and numbers to store in the array if the memory on the platform is limited. Reading such logs is not a big pleasure, but sometimes this is the only choice.

Related

CICS/COBOL Abend ASRA in debugger only

I have an issue I don't seem to find a solution for.
One of the transactions gives ABEND ASRA when used in debug mode.
When I compile the Cobol program without debug option and run the program, it works fine.
The error looks like this one (quite exactly like this), only I am using Cobol V4:
http://www-01.ibm.com/support/docview.wss?uid=swg1PM96501
Now the question would be: why is it abending in debugger and not without debugger?
I am using the CICS debugger (DTCN transaction), the program starts normally, I can do steps with F2 and all this, then at some location is abends.
Please note that it is extremely difficult to say where it abends as the program is really big.
This happens only to this program at the moment, others are running fine with debugger. I placed a breakpoint before my modifications, the abend occurs in some other area.
Another weird thing is that this Abend is not consistent, If I do a big portion of the code with small steps (F2 and small breakpoints), sometimes it executes without abend until the end.
Due to the nature of the issue, I can not post much information.
I was hoping you encountered similar issues and you can tell me where to look for.
Thank you!
The issue was solved by deleting my debug tool profile form the system and then login to the debugger (DTCN) again so it creates a new profile (the profile was 3 files: TOOLTEMP.PDTOOLS.{userid}.DBGTOOL.* ). After this the issue was gone. I asked the guys how this happened, they told me this was because I had modified the program between 2 debugging sessions without closing CICS. This was a disfunction that can be avoided by closing CICS while we compile programs used in it (not sure about why exactly.... neither are they).
Hope this helps if you face a similar issue with DTCN debugging.

Side Effects of running the JVM in debug mode

I'd like to realease a Java application in debug mode to allow for easier debugging when random or hard to reproduce problems occur on the customer side.
However, I want to get a heads up on potential side effects of doing this? From the Java HotSpot Documentation it seems that there should be no performance penalty.
From the link
Full Speed Debugging
The Java HotSpot VM now uses
full-speed debugging. In previous
version of the VM, when debugging was
enabled, the program executed using
only the interpreter. Now, the full
performance advantage of HotSpot
technology is available to programs,
even with compiled code. The improved
performance allows long-running
programs to be more easily debugged.
It also allows testing to proceed at
full speed. Once there is an
exception, the debugger launches with
full visibility to code sources.
Is this accurate or are there hidden caveats, what about memory footprint and are there any other hidden gotchas while using debug mode.
PS: I found this article from AMD which confirmed my initial suspiciion that the original article from oricale doesn't show the full story.
I can't speak for HotSpot, and won't officially for IBM, but I will say there are certainly legal kinds of optimization that aren't possible to undo fully should a decompilation be required in the middle of them, and thus aren't enabled when debug is being asked for in the production JVMs you are likely to use.
Imagine a situation where the optimizer discovers a part of the program is provably not required and by the various language rules (including JSR 133) is legal to remove, the JVM will want to get rid of it. The one wrinkle is debug: removing the code will look odd to the human stepping through it (variables not updating, possibly not stopping on lines when stepping) so the choice is to disable said optimizations in those cases. The same might also be true for opts like stack allocated objects, etc.. so while the JVM says it's "full speed" it's actually closer to "nearly full speed, with some of the funkier opts that can't quite be undone removed".
This question is old but came up while I was searching for any performance impact if you just leave -agentlib:jdwp... on but are not actively debugging.
Summary: Starting with debugging options but not connecting shouldn't impact the speed now (Java 7+).
Before java 6 (ish) you used -Xdebug and this had a definite impact, it shut off the JIT!
In java 6 they changed it to -agentlib and made it better. there were some bugs though that did cause a performance penalty. Here is one of the bugs that was filed against openjdk, My guess is that there were similar problems with The oracle/sun version: https://bugs.openjdk.java.net/browse/JDK-6902182
Note however that the stated goal is that simply enabling debugging by opening the port should not cause any performance penalty.
It looks like, at least in openjdk, the bugs were cleaned up by java 7. I didn't see anything about performance impacts after that.
If you research this further and find negative results, take note of the java version the testing was done under--everything I saw was referring to versions before 7.
I'd love to hear if anyone encountered performance problems in a recent VM just leaving the port enabled.
If you plan to run the app with remote debugging enabled, it can affect security also. Remote debugging leaves a port open on your machine, and by connecting to it, I can do all sorts of fun things with your application.
The program definitely does lot more than simply running when in debugging mode, so it is obvious that performance can not be same. However if you read the statement carefully, it says that new release can run fully optimized code even if in debugging mode which was not possible earlier. Thus the new jvm is much more faster than previous one which could only run in interpreted mode which no optimization.

Debugging multiple exe program

I have a project where I am required to fix this program that has the tendency to crash very non-deterministically. This piece of software performs lots of calculations and database calls and can have a very high load, meaning lots of clients.
It is a very critical component and without it nothing works. It needs to perform and be able to run without user interaction for long times.
It is actually a native C++/ATL project with COM for communication between its two executables.
I have spent a lot of time now actually studying the code and looking for obvious code flaws, such as not locking of shared variables (those that are obvious), exception handlers that don't do anything with an exception, besides 'return false', even if this could be a critical exception.
But I wanted to know if anyone has some tips for in regards to tackling a project like this, where many people have actually attempted to fix the issue and failed, and now you've taken the challenge and don't want to fail.
I am prepared to go far to fix this, however I need some guidance as to how to go about it in a good way?
My idea is to first set up a test environment and hope to collect as much information as possible about crashes that do occur, and then find, through logging, stack traces, etc, the points of the crashes. This may or may not be a good way to debug such a project.
Any input is appreciated?
It may be obvious, but my roadmap for such bugfixing task is :
Collect as many information as possible on crash source (users, developpers, etc).
Inspect documentation and dependencies.
Inspect source code.
Build an isolated test env and try to reproduce.
If you still can't find the source of the bug, try to sanitize the source code and to add a more verbose logging system.
Regards
Log, log, log, log.

What can we do about a randomly crashing app without source code?

I am trying to help a client with a problem, but I am running out of ideas. They have a custom, written in house application that runs on a schedule, but it crashes. I don't know how long it has been like this, so I don't think I can trace the crashes back to any particular software updates. The most unfortunate part is there is no longer any source code for the VB6 DLL which contains the meat of the logic.
This VB6 DLL is kicked off by 2-3 function calls from a VB Script. Obviously, I can modify the VB Script to add error logging, but I'm not having much luck getting quality information to pinpoint the source of the crash. I have put logging messages on either side of all of the function calls and determined which of the calls is causing the crash. However, nothing is ever returned in the err object because the call is crashing wscript.exe.
I'm not sure if there is anything else I can do. Any ideas?
Edit: The main reason I care, even though I don't have the source code is that there may be some external factor causing the crash (insufficient credentials, locked file, etc). I have checked the log file that is created in drwtsn32.log as a result of wscript.exe crashing, and the only information I get is an "Access Violation".
I first tend to think this is something to do with security permissions, but couldn't this also be a memory access violation?
You may consider using one of the Sysinternals tools if you truly think this is a problem with the environment such as file permissions. I once used Filemon to figure out all the files my application was touching and discovered a problem that way.
You may also want to do a quick sanity check with Dependency Walker to make sure you are actually loading the DLL files you think you are. I have seen the wrong version of the C runtime being loaded and causing a mysterious crash.
Depending on the scope of the application, your client might want to consider a rewrite. Without source code, they will eventually be forced to do so anyway when something else changes.
It's always possible to use a debugger - either directly on the PC that's running the crashing app or on a memory dump - to determine what's happening to a greater or lesser extent. In this case, where the code is VB6, that may not be very helpful because you'll only get useful information at the Win32 level.
Ultimately, if you don't have the source code then will finding out where the bug is really help? You won't be able to fix it anyway unless you can avoid that code path for ever in the calling script.
You could use the debugging tools for windows. Which might help you pinpoint the error, but without the source to fix it, won't do you much good.
A lazier way would be to call the dll from code (not a script) so you can at least see what is causing the issue and inspect the err object. You still won't be able to fix it, unless the problem is that it is being called incorrectly.
The guy of Coding The Wheel has a pretty interesting series about building an online poker bot which is full of serious technical info, a lot of which is concerned with how to get into existing applications and mess with them, which is, in some way, what you want to do.
Specifically, he has an article on using WinDbg to get at important info, one on how to bend function calls to your own code and one on injecting DLLs in other processes. These techniques might help to find and maybe work around or fix the crash, although I guess it's still a tough call.
There are a couple of tools that may be helpful. First, you can use dependency walker to do a runtime profile of your app:
http://www.dependencywalker.com/
There is a profile menu and you probably want to make sure that the follow child processes option is checked. This will do two things. First, it will allow you to see all of the lib versions that get pulled in. This can be helpful for some problems. Second, the runtime profile uses the debug memory manager when it runs the child processes. So, you will be able to see if buffers are getting overrun and a little bit of information about that.
Another useful tool is process monitor from Mark Russinovich:
http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx
This tool will report all file, registry and thread operations. This will help you determine if any you are bumping into file or registry credential issues.
Process explorer gives you a lot of the same information:
http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx
This is also a Russinovich tool. I find that it is a bit easier to look at some data through this tool.
Finally, using debugging tools for windows or dev studio can give you some insight into where the errors are occurring.
Access violation is almost always a memory error - all the more likely in this case because its random crashing (permissions would likely be more obviously reproducible). In the case of a dll it could be either
There's an error in the code in the dll itself - this could be something like a memory allocation error or even a simple loop boundary condition error.
There's an error when the dll tries to link out to another dll on the system. This will generally be caused by a mismatch between dll versions on the machine.
Your first step should be to try and get a reproducible crash condition. If you don't have a set of circumstances that will crash the system then you cannot know when you have fixed it.
I would then install the system on a clean machine and attempt to reproduce the error on that. Run a monitor and check precisely what other files (dlls etc) are open when the program crashes. I have seen code that crashes on a hyperthreaded Pentium but not on an earlier one - so restoring an old machine as a testbed may be a good option to cover that one. Varying the amount of ram in the machine is also worthwhile.
Hopefully these steps might give you a clue. Hopefully it will be an environment problem and so can be avoided by using the right version of windows, dlls etc. However if you're still stuck with the crash at this point with no good clues then your options are either to rewrite or attempt to hunt down the problem further by debugging the dll at assembler lever or dissassembling it. If you are not familiar with assembly code then both of these are long-shots and it's difficult to see what you will gain - and either option is likely to be a massive time-sink. Myself I have in the past, when faced with a particularly low-level high intensity problem like this advertised on one of the 'coder for hire' websites and looked for someone with specialist knowledge. Again you will need a reproducible error to be able to do this.
In the long run a dll without source code will have to be replaced. Paying a specialist with assembly skills to analyse the functions and provide you with flowcharts may well be worthwhile considering. It is good business practice to do this sooner in a controlled manner than later - like after the machine it is running on has crashed and that version of windows is no longer easily available.
You may want to try using Resource Hacker you may have luck de-compiling the in house application. it may not give you the full source code but at least maybe some more info about what the app is doing, which also may help you determine your culrpit.
Add the maximum possible RAM to the machine
This simple and cheap hack has work for me in the past. Of course YMMV.
Reverse engineering is one possibility, although a tough one.
In theory you can decompile and even debug/trace a compiled VB6 application - this is the easy part, modifying it without source, in all but the most simple cases, is the hard part.
Free compilers/decompilers:
VB decompilers
VB debuggers
Rewrite would be, in most cases, a more successful and faster way to solve the problem.

Diagnosing Deadlocks in Win32 Program

What are the steps and techniques to debug an apparent hang due to a deadlock in a Win32 production process. I heard that WinDbg can be used for this purpose but could you please provide clear hints on how this can be accomplished?
This post should get you started on the various options..Check the posts tagged with Debugging..
Another useful article on debugging deadlocks..
Debugging a true deadlock is actually kind of easy, if you have access to the source and a memory dump (or live debugging session).
All you do is look at the threads, and find the ones that are waiting on some kind of shared resource (for example hung waiting in WaitForSingleObject). Generally speaking from there it is a matter of figuring out which two or more threads have locked each other up, and then you just have to figure out which one broke the lock heirarchy.
If you can't easily figure out which threads are locked up, use the method shown in this post here to trace the lock chain for each thread. When you get into a loop, the threads in the loop are the ones that are deadlocked.
If you are very lazy, you can install Application Verifier, then add you module and select just "locks" from the basic test.
then you can run your application under any debugger.
if a critical section deadlock happens you with find the reason right away.
What language/IDE are you using?
In .Net you can view the threads of an application: Debug->Windows->Threads or Ctrl+Alt+H
Debugging deadlocks can be tricky. I usually do some kind of logging and see where the log stops. I either log to a file or to the debug console using OutputDebugString().
The best thing is to start by adding logging statements. Generally I would recommend only around the shared resources that are deadlocking but also adding them in general might point to situations or areas of code you weren't expecting. The much publicized stackoverflow.com database issue actually turned out to be log4net! The stackoverflow team never suspected log4net, and only by examining logging (ironically) showed this. I would initially forgo any complicated tools e.g., WinDgb since using them is not very intuitive IMHO.

Resources