Application won't start - how to debug? - debugging

We just released a new version of one of our applications, and are hearing early reports that it won't run on a few customer's computers. This is a 32-bit Win32/MFC application that we have been shipping for about 5 or 6 years, so there is a history that the thing works. The problem I have is trying to get it running on the two computers. I can't reproduce the problem in-house, nor do I have any real access to the two computers in question. This much I do know - the InitInstance method is never called. I've checked it out in dependencywalker and everything looks OK (except for missing IEShims and wer.dll - but this is common, even in-house). What other tools do I have at my disposal to diagnose this problem?
Thanks for your help.

When a program crashes during initialization, it will usually leave an entry in the windows application error log. (Control Panel -> Administrative Tools -> Event Viewer)
There should be something along the lines to "AppCrash" or perhaps "SideBySide"

Related

Troubleshooting VB6 App Crash after XP to Win7 Upgrade

I have a VB6 application that I provide support for. This application works on both Windows XP and Windows 7. Some users were migrated from Windows XP to Windows 7 using the User State Migration tool. These users now receive a generic "Application has crashed" Windows error message when they open certain screens (forms) in the application. My assumption is that there is a missing dll/ocx reference, but I'm having trouble tracking it down.
I've tried many/varied troubleshooting techniques:
Full uninstall and reinstall of my application
Manually re-registering all dll's and ocx's that I know are used
Running Process Monitor on a broken computer and a working computer to compare what dll's and ocx's are accessed. The answer might be here but even after filtering out most of the background noise the amount of data is overwhelming. At a minimum I reviewed all of the calls right before it crashes and all of the calls that were not successful. All of the non-successful calls match between working and non-working.
Installed the Windows Debugger Tools and captured a crash dump. Analyzed the crash dump with DebugDiag. DebugDiag says the exception is in msvbvm60.dll. I tried building a PDB file for my exe and loading it in DebugDiag to get more detail about where the exception is occuring but DebugDiag doesn't want to accept the PDB (might be doing something wrong here, but it just seems to ignore it. This same PDB file works fine when I do remote debugging, however.)
I recompiled my VB6 program without any optimizations in PCode. I've read online that sometimes building in PCode, while bad for performance, will tell you the real exception.
Used the above created PDB file to remote debug the VB6 application. The debugger says that the application crashes after the new window has been created, on a line that sets MousePointer = vbHourGlass... To me it seems unlikely that this is the real cause of the error. There are at least 20 other locations in the program where this same line is called and all work fine.
(Forgot about this one)
Used Dependency Walker and profiled the application on both a working and non-working computer. All errors found by dependency walker were the same between the two computers. There were no additional dependencies found on a working computer, and all missing dependencies on the non-working computer were also missing on the working one.
None of these actions changed my error message or showed me what the error is (unless it really is the mouse cursor issue)... There are no entries in the Windows Event Log related to the app crash.
The non-working and working computers all have the same base Windows 7 image, the only difference is whatever is being changed by USMT, which further convinces me that this is some kind of quirky configuration change or a missing dll/ocx or perhaps an unregistered dll/ocx.
Any ideas or thoughts on how I can track down the root cause of the issue would be greatly appreciated.
Update 1 - Response to questions
#MarkHall I have tried running it as admin, though not with UAC off. The application runs fine on a Windows 7 box as a non-admin with full UAC. Windows XP was 32-bit, Windows 7 is 64-bit, but again it works just fine on a like for like box where the user was not migrated from Windows XP.
#Beaner It's possible that it stores settings somewhere that have been corrupted, but the remote debugging leads me to think that it's more likely something else since it seems to die on a step related to the UI, which then makes me think it's probably a missing dll/ocx reference.
#Bob77 The application is installed into Program Files (x86). While many of the libraries do reside in the same folder, they are all registered.
Peter, often I've noticed that the debugger will indicate a line of code that is actually incorrect, depending on WHERE in the actual assembly language the fault occurs. You should look REAL close around your statement that sets the cursor to vbHourGlass. Your exception is PROBABLY happening BEFORE that line of code, but that line is what the debugger thinks is the actual faulted line of code.
Since you said it happens when a window OPENS, I'd look real close at any ocx's you may have referenced on the form, but perhaps NOT actually being used, or called. You might have one there that you don't intend to be there, that could be causing security issues, or something on Win7? Edit the .frm file by hand if you have to, and look at all the GUIDs the form references.
It is possible that one machine is using PER-USER registration, and the other is using PER-SYSTEM registration?? I don't know...
I would take a much closer look at the form that you are trying to open, and be VERY cautious of everything you are doing in the form load events, and so on. This sounds like it could be something as stupid as Windows Aero being enabled on one system, and not another, or some other sort of "Theme" setting that is throwing the VB Form Rendering routine into a hissyfit... Perhaps even something as stupid as a transparent color index in the icon you selected for that from?
If you are still developing this app, (or at least maintaining it), create an entirely NEW form, and re-create all the controls, etc, on the form (resist the temptation to copy/paste them from the old one...), and then see if THAT does the trick. Then, copy all the event code to the new form one event at a time, with at LEAST enough event code to make the form function, even if it's just a "dead form", that loads no data, or whatever the form is supposed to do. Check and debug after each change, and you WILL find it eventually. Of course, make sure you isolate one of the defunct systems to have a platform that you can duplicate the issue on, or then it's just guessing. I find that using something like Acronis w/ Universal Restore is a great option to then take the image file into a good HV, like VirtualBox, and then restore that image as a VM, so you can debug without interfering with your actual users. This sounds like a lot of work, but then again, so is re-writing an application that already exists, right? :)
Failing THAT... /* and */ are your friends!! (Well, we're dealing with VB, so ' would be your best friend! heh... But I'd start commenting out all the code on the form until that sucker opens. Then once it opens, start putting one line back at a time, and re-running it... That's called "VooDoo Debugging", but sometimes, you gotta do what you gotta do...
THANKS A LOT PETER! :) Now you got ME so involved in this, I feel like I'M the one debugging this sucker! Like if it was MY code I was trying to fix! :)
Let me know if any of this helps... I am actually quite interested in what you discover.

Program runs slow on just a couple of computers

I have a program that I run on multiple network PCs. When I compiled the most recent version, it runs extremely slowly on 2 PCs on the network, but runs fine for everyone else.
This used to happen with my old dev PC when I had an additional 2gb RAM installed. When I would remove the additional 2gb and recompile, it would then work fine for everyone.
Now, I am on a completely new machine and am having the same issue. I've tried to rebuild the project after rebooting, but still have the same issue.
For all other PCs, the program loads in about 3-5 seconds. On these 2 PCs, it takes anywhere from 45 seconds to 1.5 mins to load...
One of the PCs is an older Dell Dimension 8200, but the other is a newer OptiPlex that is identical to several other PCs on the network, so this is what is really making it so confusing.
For now, I've had to revert to the old version so it will run correctly for everyone.
Does anyone have any idea of anything to try?
Thanks in advance!!!
Edit:
Ok, it was an exhausting day yesterday trying various things to solve this issue. Here is what I tried and where the problem begins:
Using the new program
Went back to old versions of all updated components, but still had the same issue
Using the old program
I decided to go back to the drawing board and start from the old version of the application and incrementally add the new features a small piece at a time.
Recompiled the old version using the old components - program works fine
Updated to new DevExpress components - program works fine
Updated to new ESBPCS components - program works fine
Updated to new DeepSoftware components - program works fine
Ok, so now we know there is nothing with the component sets I've updated...
Added 1 image to each of 2 image lists - program works fine
Added new database table - program works fine
Added code to open and close the new table - program works fine
Added new action to action list and added a menu item and toolbar button to new action (action does nothing at this point) - program works fine
Added a new BLANK form to the application and added code to open the new form - BAM!!!
So, adding just one form to the application is what's causing the issue! I removed all the code for the opening of the form, commented out the uses clauses and removed the uses entry from the project source and everything is back to normal!
Anybody have any idea about this?
Thanks!
Edit 2:
For #Warren P - here is my .DPR source:
program Scheduler;
uses
ExceptionLog,
Forms,
SchedulerMainUnit in 'SchedulerMainUnit.pas' {FrmMain},
SchedulerDBInfoUnit in 'SchedulerDBInfoUnit.pas' {FrmDBInfo},
SchedulerHistoryUnit in 'SchedulerHistoryUnit.pas' {FrmHistory},
SchedulerOptionsUnit in 'SchedulerOptionsUnit.pas' {FrmOptions},
SchedulerExtVersionUnit in 'SchedulerExtVersionUnit.pas' {FrmExtVersion},
SchedulerSplashUnit in 'SchedulerSplashUnit.pas' {FrmSplash},
SchedulerInfoUnit in 'SchedulerInfoUnit.pas' {FrmInfo},
SchedulerShippedUnit in 'SchedulerShippedUnit.pas' {FrmShipped}; {<-- This is the new form with the issue}
{$R *.res}
begin
Application.Initialize;
Application.Title := 'SmartWool WIP Scheduling Assistant';
Application.CreateForm(TFrmMain, FrmMain);
Application.CreateForm(TFrmDBInfo, FrmDBInfo);
Application.CreateForm(TFrmHistory, FrmHistory);
Application.CreateForm(TFrmOptions, FrmOptions);
Application.CreateForm(TFrmExtVersion, FrmExtVersion);
Application.Run;
end.
And here is the intialization section of the main form to create the splash:
initialization
FrmSplash:=TFrmSplash.Create(Application);
FrmSplash.Show;
FrmSplash.Refresh;
Edit 3:
Anybody??? Please?
It could be that the program is waiting for timeouts when trying to access resources that are not available on that machine such as network drives or Internet hosts.
Try running Process Monitor when starting up your program and look for file open calls. Filter the output so it only shows your process.
http://technet.microsoft.com/en-us/sysinternals/bb896645
Performance problems initially can seem very daunting at first.
I have been on many teams where people have tried to guess at a reason for performance problems. This sometimes works, but is far less effective than actually measuring the code.
When reproducible on a development machine, I would recommend a profiler.
There was a previous question that asked about
Delphi Profiling tools which has several possible tools you could use.
When you can't reproduce the problem on your development machine, then it becomes a bit more difficult, but not impossible. Typically I have found that problems are related to an application dependency that is different, and not performing well. Understanding the external influences on your application can help pinpoint the problem.
Specifically common external problems in some of my applications.
Network
Database
Application Servers
Installation or Data File Location (i.e. Disk Performance)
Virus and Malware Scanners
Other application interring with yours such as a virus.
To monitor for items related to the network (i.e. Database, web services, etc...)
I typically use Wireshark which allows me to see if resources are responding in expected times. My most common problem is poor performing DNS and can found using Wireshark.
You can use the AutoRuns program to determine everything that starts up when your computer does, it's useful in determine differences between machines.
But most of all I have logging that can be turned on in my applications and this allows me to isolate the problem to a specific area of code. This narrowing down to a specific section of code reduces the guessing, and allows you to focus on a few possible problems.
I created a log function for this that I call at specific places (in your case especially during startup). It adds a timestamp to each log text and stores them in a TMemo that is regularly saved. Not only very helpful when debugging, but may also shed some light on your problem.
Are you using code signing - ie Microsoft Authenticode? If so, then outdated certificate authorities on the computers can cause significant delays to startup.
First, I would try to defragment the hard disk. If still slow, I would check the power supply. Maybe your hard disk are getting insufficient energy.
Check if there is the same antivirus software on those 2 problematic computers. If so, then your Delphi application may match byte pattern used in some virus made in Delphi. Update virus definitions to solve it, or report false alarm to antivirus company, or change antivirus software.
Check if there isn't any printer installed on those 2 problematic computers. If it is so, then add any printer and try again.
Idea 1:
One reason I have seen for very slow application load time, is when printing or reporting system components like Developer Express Express Print, are in your application.
The problem I saw when using Developer Express Printing components, is that I had an offline or non-responsive network printer in my list of printers (check the control panel printer icon) that was not responding. Some of those Developer Express components seem to read some information from each printer you have installed, and the solution was to go to those clients, and delete old printers from their control panel, that were no longer being used. Each not-responding network printer added up to 60 seconds for a TCP Timeout, to the startup time of my application.
Update - Idea 2:
Download MS DebugView and install it on the machine that runs slowly. Now go back to your main development PC, open the IDE, open your main project file (right click on the project, view project source in project viewer), this will show you the contents of your main project source file (.dpr). go to the main begin....end. block. Now set a breakpoint on the main begin statement, and single step INTO (not OVER) and you will see all the module initialization sections. In each one add this: OutputDebugString('ModuleName').
Now when you run this inside the Delphi Ide you will see messages, and see how far apart they come in, and understand what is taking a long time to initialize. Instead of installing the delphi ide onto the machine that runs slowly, Debug View (which is less than 400kb single executable) will be run, and it will show you these debug messages, along with a nice time display (##.# seconds) for each message.
MS Debug view is here.
Are you allowing the forms to be constructed on initialization within the DPR source? If so, you may do well to consider whether or not you want those forms sucking up memory the entire time, more-over if you want those forms to be wasting the application's time on load.
A rule of thumb: If the form is used a LOT during the application's execution, allow it to be constructed when the application loads (this will work out faster over-all than constructing the instance "on-demand").
If the form is not used very often at all (for example, a Dialog or an About Box), delete the "Application.CreateForm" line from the DPR source, and instead construct your instance on request...
var
LForm: TfrmAbout;
begin
Application.CreateForm(LForm, TfrmAbout);
try
LForm.ShowModal;
finally
LForm.Free;
end;
end;
Now that form (which may not even be displayed during the program's execution) is not sucking up system resources, and will not slow down the application's load time.
It may not solve your problem 100%, but it should certainly help!

Delphi program & Windows 64-bit compatibility issue

I have some customers/candidate who complained that my program doesn't work on their Windows 7 64 bit version (confirmed with screenshots). The errors were strange, for example:
in the trial version i am
getting a error message whenever i
click on \"mark\" \"delete\" \"help\".
error msg is: Access violation at
address 0046C978 in module
\'ideduper.exe.\' read of address
00000004
windows 7 ultimate 64bit. i7 920
#2.67GHz 9gb or ram
'Mark', 'delete' and 'help' are just standard TToolButton on TToolbar.
The other example is failing to get a thumbnail from IExtractImage.
I have told them to try Compatibility mode but still doesn't work.
The problem is when I tested it on Windows 7 HP 64-bit on my computer (which I've done it before released it actually) it just works fine! So I don't know what causing it
Do you have any advice ?Are different Windows package (home basic,premium,ultimate,etc) treating 32 bit prog differently ?Are the newer version of Delphis (I use 2006) more compatible with 64 bit Windows ? Do I need to wait until 64 bit compiler out?
Thanks in advance
Your best bet in my opinion is to add MadExcept or EurekaLog or something similar to your application and give it to the customer to try again. MadExcept will generate log with stack trace, which will give you a clearer view of what is happening there.
To answer 2nd part of the question, 32bit Delphi programs work fine on 64bit Windows 7. I think it's more likely you have some memory management problems and the customer just happens to stumble upon them while you don't. Use FastMM4 to track those down.
Your applications is trying to access an invalid pointer. Changing environment may surface issues that are hidden in others. Check your application, and use FastMM + JCL+JCVL/MadExcept/EurekaLog to get a detailed trace of the issue. Some Windows APIs may have some stricter call requisites under 7 and/or 64 bit, but we would have to know what your app actually cals.
A free alternative to MadExcept is JCL Debug stuff. However it is less thorough and doesn't include the cool dialog box to send the stack trace to you via email, or as a file you can attach and manually email.
MadExcept is worth the money, and it is free for non-commercial use. You could try it first on your own PC, observe its functionality, and be sure it functions the way you want, and then buy it.
If buying Delphi is worth it (and it is!) then buying mad Except is a no brainer. But if you insist on rolling your own, JCLDebug (part of jedi code library) is also pretty nice.
Give them a stripped down version of your app and see when the problem goes away. I am betting it is your code as I never had any problems with my (hundreds of) W7/64 clients.
I'd be willing to bet it's an issue in your code. The reason it's failing on your customer's machine and not yours is that your machine probably has the default Data Execution Protection (DEP) enabled (which is turned on only for essential Windows programs and services), while your customer's computer is actually using DEP as intended (turned on for all programs and services).
The default setting (which is compatible with older versions of Windows, like 95/98/ME), allows software to execute code from what should be data segments. The more strict setting won't allow this, and raises a system-level exception instead.
You can check the settings between the two by looking at System Properties. I'm not at a Win7 machine right now, but on WinXP you get there by right-clicking on My Computer, choosing Properties, clicking on Performance Options, and then selecting the "Data Execution Prevention" tab. Find it on Vista/Win7 by using the Help; search for Data Execution Protection.
The solution, as previous answers have told you, is to install MadExcept or EurekaLog. You can also get a free version as part of JEDI, in JCLDebug IIRC. I haven't used it, so I can't vouch for it personally. I've heard it's pretty good, though.
If you don't want to go that route, set a breakpoint somewhere in the startup portion of your app (make sure to build with debugging info turned on). Run your app until the breakpoint is hit, and then use the IDE's Search->Goto Address (which is disabled until the breakpoint is hit). Enter the address from the exception dialog (not the one that's almost all zeros, but the 0046C978 address, prefixed with $ to indicate it's in hex) as in $0046C978. You'll probably end up in the CPU window looking at assembly code, but you can usually pick out a line of Delphi code of some sort that can sometimes give you a place to start looking.
In addition to all previous suggestions, I'll add the difference in accessing Registry under WOW64 compared to Win32. If your application is accessing Registry to read or write some settings, you should be aware of this. First, take a look at this and this page in the MSDN. On this page you will find 2 flags that determine the access you get to Registry from 32- or 64-bit application. KEY_WOW64_64KEY is the one that you should use.
In any case, I agree with others about using madExcept (or any other similar tool) to be able to find the exact cause of your problems.

Application error: fault address 0x00012afb (Expert)

I need some "light" to get a solution. Probably there are tons of things that cause this problem, but maybe somebody could help me.
Scenario: a Windows server running 24/7 a PostgreSQL database and others server applications (for processing tasks on database, etc...). There are differents servers scenarios (~30), with different hardware and windows versions (XP SP3/ WinServer, etc... all NT based). All aplications were written in Delphi7, and link to DLLs (in D7 also).
After some days (sometimes a week, sometimes a couple of months), Windows begins to act strange, like not opening start menu, some buttons are missing in dialogs. And soon some applications do not open, raising a event on eventviewer:
Faulting application x, version y, faulting module kernel32.dll, version 5.1.2600.5781, fault address 0x00012afb
In mean while, others applications open fine, like notepad, iexplore, etc... but SOME of my applications don't, with only event log described above. But if we do not restart system, in a few days even cmd.exe stops open, (and all other applications) with same error on eventlog.
I've tried to find 'what' can cause this, but with no sucess. So, and any advice will be welcome.
Thanks in advance.
I think you are running out of resource handles (Window handles). You can verify this by having a look at the system properties in Sysinternals Process Explorer (a better task manager). I think even the default task manager can help out to display a handle count. Then you can identify which application is causing the trouble.
Once you know the application leaking and if it is yours, you can use Rational purify or Boundschecker to drill down to the problem. If you do not have money for these tools you will have to reduce the problem manually a bit by deactivating some features for example and see if the handle count still increases...
Not sure if it is the problem you are experiencing maybe it is completely unrelated. But easy to check. The track is that some app is stealing some global resources as you experience trouble with other applications. Applications like notepad do not use much resources so appear to work fine, heavy apps are more likely to show up the trouble.
Hope it helps.

Looking for advice on solving problems that occur only on your machine

I'm stuck trying to debug a problem which only occurs on my machine. It doesn't exhibit on any of the other devs' systems, nor on our production test server. I've tried pretty much everything I can think of short of completely wiping my hard disk and starting from scratch, or sneaking into the office in the middle of the night to swap my computer with someone else's.
This brings to mind the titular question, then: short of those drastic measures, what do you do when trying to resolve issues that no one else has? I'm open to advice that's general or specific.
[Not sure if this should be CW or not.]
Have you attached a debugger to the program to find the exact point of failure? That is what I would do first.
Sometimes third party software can be the root cause of these sorts of issues. Things like Anti-virus software install low-level filesystem and network drivers that can cause random intermittent failures. You can try killing all processes that aren't base OS services and your app.
Depending on your OS there are different tools that you can use to see what's going on under the covers. E.g. on Windows you can use Process Monitor to see what Registry keys it opens, what DLLs get loaded, etc. You can run this on your machine and on the success machines and compare to see if perhaps some required module is missing .
But seriously, use a debugger. That's what they are there for.
Two things:
I start with the obvious: What's different on your box? More memory? Odball PCI card? Different Microsoft APIs or service packs?
For oddball random software and/or OS crashes:
Check your system for heat issues.
Check your RAM for bad bits.
In this situation, I would try to check out the code and cleanly rebuild it from a different directory to make sure that there are no miscellaneous files in your working directory that are causing a problem.
If you are doing work against a database, I would also try tearing down the database and reconstructing it, possibly using a dump from another developer's machine.
Check the versions of any external third party software - database version, OS version, even software patches.
Look at the configuration on someone else's machine who doesn't have the problem and compare.
Get another developer to sit at your workstation and try to reproduce the problem and also go to their workstation and try it. True story - a fellow developer had a bug that he could only reproduce on his machine...it turns out that he was doing something slightly different in the GUI that no one else was doing (tabbing to a button and then hitting enter, everyone else just hit enter). It never occurred to him that other people might just hit enter to submit, because that "didn't make sense" to him.

Resources