Troubleshooting VB6 App Crash after XP to Win7 Upgrade - debugging

I have a VB6 application that I provide support for. This application works on both Windows XP and Windows 7. Some users were migrated from Windows XP to Windows 7 using the User State Migration tool. These users now receive a generic "Application has crashed" Windows error message when they open certain screens (forms) in the application. My assumption is that there is a missing dll/ocx reference, but I'm having trouble tracking it down.
I've tried many/varied troubleshooting techniques:
Full uninstall and reinstall of my application
Manually re-registering all dll's and ocx's that I know are used
Running Process Monitor on a broken computer and a working computer to compare what dll's and ocx's are accessed. The answer might be here but even after filtering out most of the background noise the amount of data is overwhelming. At a minimum I reviewed all of the calls right before it crashes and all of the calls that were not successful. All of the non-successful calls match between working and non-working.
Installed the Windows Debugger Tools and captured a crash dump. Analyzed the crash dump with DebugDiag. DebugDiag says the exception is in msvbvm60.dll. I tried building a PDB file for my exe and loading it in DebugDiag to get more detail about where the exception is occuring but DebugDiag doesn't want to accept the PDB (might be doing something wrong here, but it just seems to ignore it. This same PDB file works fine when I do remote debugging, however.)
I recompiled my VB6 program without any optimizations in PCode. I've read online that sometimes building in PCode, while bad for performance, will tell you the real exception.
Used the above created PDB file to remote debug the VB6 application. The debugger says that the application crashes after the new window has been created, on a line that sets MousePointer = vbHourGlass... To me it seems unlikely that this is the real cause of the error. There are at least 20 other locations in the program where this same line is called and all work fine.
(Forgot about this one)
Used Dependency Walker and profiled the application on both a working and non-working computer. All errors found by dependency walker were the same between the two computers. There were no additional dependencies found on a working computer, and all missing dependencies on the non-working computer were also missing on the working one.
None of these actions changed my error message or showed me what the error is (unless it really is the mouse cursor issue)... There are no entries in the Windows Event Log related to the app crash.
The non-working and working computers all have the same base Windows 7 image, the only difference is whatever is being changed by USMT, which further convinces me that this is some kind of quirky configuration change or a missing dll/ocx or perhaps an unregistered dll/ocx.
Any ideas or thoughts on how I can track down the root cause of the issue would be greatly appreciated.
Update 1 - Response to questions
#MarkHall I have tried running it as admin, though not with UAC off. The application runs fine on a Windows 7 box as a non-admin with full UAC. Windows XP was 32-bit, Windows 7 is 64-bit, but again it works just fine on a like for like box where the user was not migrated from Windows XP.
#Beaner It's possible that it stores settings somewhere that have been corrupted, but the remote debugging leads me to think that it's more likely something else since it seems to die on a step related to the UI, which then makes me think it's probably a missing dll/ocx reference.
#Bob77 The application is installed into Program Files (x86). While many of the libraries do reside in the same folder, they are all registered.

Peter, often I've noticed that the debugger will indicate a line of code that is actually incorrect, depending on WHERE in the actual assembly language the fault occurs. You should look REAL close around your statement that sets the cursor to vbHourGlass. Your exception is PROBABLY happening BEFORE that line of code, but that line is what the debugger thinks is the actual faulted line of code.
Since you said it happens when a window OPENS, I'd look real close at any ocx's you may have referenced on the form, but perhaps NOT actually being used, or called. You might have one there that you don't intend to be there, that could be causing security issues, or something on Win7? Edit the .frm file by hand if you have to, and look at all the GUIDs the form references.
It is possible that one machine is using PER-USER registration, and the other is using PER-SYSTEM registration?? I don't know...
I would take a much closer look at the form that you are trying to open, and be VERY cautious of everything you are doing in the form load events, and so on. This sounds like it could be something as stupid as Windows Aero being enabled on one system, and not another, or some other sort of "Theme" setting that is throwing the VB Form Rendering routine into a hissyfit... Perhaps even something as stupid as a transparent color index in the icon you selected for that from?
If you are still developing this app, (or at least maintaining it), create an entirely NEW form, and re-create all the controls, etc, on the form (resist the temptation to copy/paste them from the old one...), and then see if THAT does the trick. Then, copy all the event code to the new form one event at a time, with at LEAST enough event code to make the form function, even if it's just a "dead form", that loads no data, or whatever the form is supposed to do. Check and debug after each change, and you WILL find it eventually. Of course, make sure you isolate one of the defunct systems to have a platform that you can duplicate the issue on, or then it's just guessing. I find that using something like Acronis w/ Universal Restore is a great option to then take the image file into a good HV, like VirtualBox, and then restore that image as a VM, so you can debug without interfering with your actual users. This sounds like a lot of work, but then again, so is re-writing an application that already exists, right? :)
Failing THAT... /* and */ are your friends!! (Well, we're dealing with VB, so ' would be your best friend! heh... But I'd start commenting out all the code on the form until that sucker opens. Then once it opens, start putting one line back at a time, and re-running it... That's called "VooDoo Debugging", but sometimes, you gotta do what you gotta do...
THANKS A LOT PETER! :) Now you got ME so involved in this, I feel like I'M the one debugging this sucker! Like if it was MY code I was trying to fix! :)
Let me know if any of this helps... I am actually quite interested in what you discover.

Related

Spurious "cannot load control, license not found" error?

When I try to load a form in the designer, it shows "runtime error 0" and produces a log file which contains:
Line 15: Cannot load control xxxx; license not found.
But the control in question DOES NOT have any licensing restrictions. It has no installer and requires only registration (regsvr32).
Not only that, but for years this had worked without any problems and only just recently this has started. It affects a number of forms which have any controls from a particular OCX.
So it appears that something is fooling VB6 into either thinking a license check has failed, or at least to show a nonsensical error message.
I have tried to trace this using Process Monitor but I couldn't spot any useful clues in the logs. At least, nothing which was obviously problematic.
Any ideas what could cause this? I'm at a loss so far to find a cause.
Thanks
I'm assuming this is a third-party OCX... Many such products came with their own installer which generates the license file; simply copying, and even regsvr32ing the .ocx is not sufficient to use it in the development environment. If you still have the original installation routine, you can try running that to regenerate the license. Failing that, you could look for an appropriately-named .lic file on an existing, working development machine (in \Windows\System32, or the VB6 installation directory, or in the directory into which the .ocx is installed) and manually copy that to the same place on the new development machine.
(Answering my own question, should anyone else run into this again).
As far as I can determine this error was caused by either a subtly corrupt FRX file and/or an FRM/FRX file pair being out of sync.
By going back in source control I could eventually locate a revision where there was no problem. This alone seemed to eliminate anything in the computing environment from being the cause. (ie, bad VB6 installation, disk space, etc. etc.)
I manually re-did certain changes and brought that older code back up to date, and so far the problem has not reappeared.
EDIT the struck-out text was not incorrect but was not specific enough -- I have since learned what seems to be the root cause.
The problem was that we loaded a 32-bit ICO file (icon) into an imagelist in one of the VB6 forms. Now, traditionally 32-bit color icons were not usable in VB6 and you would get an error even trying to do this. However for some reason certain Windows PCs will now allow this - which can be a time bomb.
The problem is: forms saved that way can cause the errors in this question when run on a different PC which does NOT support such icons.
This will occur in the IDE when the form is loaded, OR if a compiled EXE is run on a different computer which respects the original VB6 icon limitations!
I don't know why the totally meaningless "licensing" error message is shown when this happens.
In my case we didn't intentionally introduce this icon, it was a mistake, and so it took a LONG time to debug and eventually figure this out (plus some very valuable advice from people on VBForums).
I've created a different question specifically to try and get at what underlying element of Windows has changed causing this problem.

How to find out why a MFC program closes silently

I have the following bug in a program:
An MDI MFC program closes silently on Windows 7 (terminates the process without prompting to save changes and without displaying any "crash" dialog) when the user performs this operation : Click a context menu item.
But this happens only on that PC, at least, for the moment. No other PC has encountered that problem. However the bug can always be reproduced following the same steps but only on that PC.
I want to know the reason but probably the customer won't allow me to install many programs to debug, so I need to be able to log when the program terminates, or print the stack in Release version but I'm quite lost.
I had faced a similar bug before and eventually I fixed it logging line by line and changing the problematic part, but I guess there are much better ways to find bug reasons than this.
I have tried on my development PC to create minidumps on Release mode but if there is no exception thrown on that PC, (I haven't confirmed that yet, though...) maybe it's pointless.
Also used an available class on codeproject (Stackwalker) but I don't manage to print all the function calls. Only on simple console programs, but not on MDI or even SDI.
Any ideas on how to find out the reason? Thanks in advance.

Windows 7 and VB6: Event Error ID 1000

I have a completely random error popping up on a particular piece of software out in the field. The application is a game written in VB6 and is running on Windows 7 64-bit. Every once in a while, the app crashes, with a generic "program.exe has stopped responding" message box. This game can run fine for days on end until this message appears, or within a matter of hours. No exception is being thrown.
We run this app in Windows 2000 compatibility mode (this was its original OS), with visual themes disabled, and as an administrator. The app itself is purposely simple in terms of using external components and API calls.
References:
Visual Basic for Applications
Visual Basic runtime objects and procedures
Visual Basic objects and procedures
OLE Automation
Microsoft DAO 3.51 Object Library
Microsoft Data Formatting Object Library
Components:
Microsoft Comm Control 6.0
Microsoft Windows Common Controls 6.0 (SP6)
Resizer XT
As you can see, these are pretty straightforward, Microsoft-standard tools, for the most part. The database components exist to interact with an Access database used for bookkeeping, and the Resizer XT was inserted to move this game more easily from its original 800x600 resolution to 1920x1080.
There is no networking enabled on the kiosks; no network drivers, and hence no connections to remote databases. Everything is encapsulated in a single box.
In the Windows Application event log, when this happens, there is an Event ID 1000 faulting a seemingly random module -- so far, either ntdll.dll or lpk.dll. In terms of API calls, I don't see any from ntdll.dll. We are using kernel32, user32, and winmm, for various file system and sound functions. I can't reproduce as it is completely random, so I don't even know where to start troubleshooting. Any ideas?
EDIT: A little more info. I've tried several different versions of Dependency Walker, at the suggestion of some other developers, and the latest version shows that I am missing IESHIMS.dll and GRPSVC.dll (these two seems to be well-known bugs in Depends.exe), and that I have missing symbols in COMCTRL32.dll and IEFRAME.dll. Any clues there?
The message from the application event log isn't that useful - what you need is a post mortem process dump from your process - so you can see where in your code things started going wrong.
Every time I've seen one of these problems it generally comes down to a bad API parameter rather than something more exotic, this may be caused by bad data coming in, but usually it's a good ol fashioned bug that causes the problem.
As you've probably figured already this isn't going to be easy to debug; ideally you'd have a repeatable failure case to debug, instead of relying on capturing dump files from a remote machine, but until you can make it repeatable remote dumps are the only way forwards.
Dr Watson used to do this, but is no longer shipped, so the alternatives are:
How to use the Userdump.exe tool to create a dump file
Sysinternals ProcDump
Collecting User-Mode dumps
What you need to get is a minidump, these contain the important parts of the process space, excluding standard modules (e.g. Kernel32.dll) - and replacing the dump with a version number.
There are instructions for Automatically Capturing a Dump When a Process Crashes - which uses cdb.exe shipped with the debugging tools, however the crucial item is the registry key \\HKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\CurrentVersion\AeDebug
You can change your code to add better error handling - especially useful if you can narrow down the cause to a few procedures and use the techniques described in Using symbolic debug information to locate a program crash. to directly process the map files.
Once you've got a minidump and the symbol files WinDbg is the tool of choice for digging into these dumps - however it can be a bit of a voyage to discover what the cause is.
The only other thing I'd consider, and this depends on your application structure, is to attempt to capture all input events for replay.
Another option is to find a copy of VMWare 7.1 which has replay debugging and use that as the first step in capturing a reproducible set of steps.
Right click your executable object and let it be WINXP compatible pending
when you discover source of the problem to finally solve it

Program runs slow on just a couple of computers

I have a program that I run on multiple network PCs. When I compiled the most recent version, it runs extremely slowly on 2 PCs on the network, but runs fine for everyone else.
This used to happen with my old dev PC when I had an additional 2gb RAM installed. When I would remove the additional 2gb and recompile, it would then work fine for everyone.
Now, I am on a completely new machine and am having the same issue. I've tried to rebuild the project after rebooting, but still have the same issue.
For all other PCs, the program loads in about 3-5 seconds. On these 2 PCs, it takes anywhere from 45 seconds to 1.5 mins to load...
One of the PCs is an older Dell Dimension 8200, but the other is a newer OptiPlex that is identical to several other PCs on the network, so this is what is really making it so confusing.
For now, I've had to revert to the old version so it will run correctly for everyone.
Does anyone have any idea of anything to try?
Thanks in advance!!!
Edit:
Ok, it was an exhausting day yesterday trying various things to solve this issue. Here is what I tried and where the problem begins:
Using the new program
Went back to old versions of all updated components, but still had the same issue
Using the old program
I decided to go back to the drawing board and start from the old version of the application and incrementally add the new features a small piece at a time.
Recompiled the old version using the old components - program works fine
Updated to new DevExpress components - program works fine
Updated to new ESBPCS components - program works fine
Updated to new DeepSoftware components - program works fine
Ok, so now we know there is nothing with the component sets I've updated...
Added 1 image to each of 2 image lists - program works fine
Added new database table - program works fine
Added code to open and close the new table - program works fine
Added new action to action list and added a menu item and toolbar button to new action (action does nothing at this point) - program works fine
Added a new BLANK form to the application and added code to open the new form - BAM!!!
So, adding just one form to the application is what's causing the issue! I removed all the code for the opening of the form, commented out the uses clauses and removed the uses entry from the project source and everything is back to normal!
Anybody have any idea about this?
Thanks!
Edit 2:
For #Warren P - here is my .DPR source:
program Scheduler;
uses
ExceptionLog,
Forms,
SchedulerMainUnit in 'SchedulerMainUnit.pas' {FrmMain},
SchedulerDBInfoUnit in 'SchedulerDBInfoUnit.pas' {FrmDBInfo},
SchedulerHistoryUnit in 'SchedulerHistoryUnit.pas' {FrmHistory},
SchedulerOptionsUnit in 'SchedulerOptionsUnit.pas' {FrmOptions},
SchedulerExtVersionUnit in 'SchedulerExtVersionUnit.pas' {FrmExtVersion},
SchedulerSplashUnit in 'SchedulerSplashUnit.pas' {FrmSplash},
SchedulerInfoUnit in 'SchedulerInfoUnit.pas' {FrmInfo},
SchedulerShippedUnit in 'SchedulerShippedUnit.pas' {FrmShipped}; {<-- This is the new form with the issue}
{$R *.res}
begin
Application.Initialize;
Application.Title := 'SmartWool WIP Scheduling Assistant';
Application.CreateForm(TFrmMain, FrmMain);
Application.CreateForm(TFrmDBInfo, FrmDBInfo);
Application.CreateForm(TFrmHistory, FrmHistory);
Application.CreateForm(TFrmOptions, FrmOptions);
Application.CreateForm(TFrmExtVersion, FrmExtVersion);
Application.Run;
end.
And here is the intialization section of the main form to create the splash:
initialization
FrmSplash:=TFrmSplash.Create(Application);
FrmSplash.Show;
FrmSplash.Refresh;
Edit 3:
Anybody??? Please?
It could be that the program is waiting for timeouts when trying to access resources that are not available on that machine such as network drives or Internet hosts.
Try running Process Monitor when starting up your program and look for file open calls. Filter the output so it only shows your process.
http://technet.microsoft.com/en-us/sysinternals/bb896645
Performance problems initially can seem very daunting at first.
I have been on many teams where people have tried to guess at a reason for performance problems. This sometimes works, but is far less effective than actually measuring the code.
When reproducible on a development machine, I would recommend a profiler.
There was a previous question that asked about
Delphi Profiling tools which has several possible tools you could use.
When you can't reproduce the problem on your development machine, then it becomes a bit more difficult, but not impossible. Typically I have found that problems are related to an application dependency that is different, and not performing well. Understanding the external influences on your application can help pinpoint the problem.
Specifically common external problems in some of my applications.
Network
Database
Application Servers
Installation or Data File Location (i.e. Disk Performance)
Virus and Malware Scanners
Other application interring with yours such as a virus.
To monitor for items related to the network (i.e. Database, web services, etc...)
I typically use Wireshark which allows me to see if resources are responding in expected times. My most common problem is poor performing DNS and can found using Wireshark.
You can use the AutoRuns program to determine everything that starts up when your computer does, it's useful in determine differences between machines.
But most of all I have logging that can be turned on in my applications and this allows me to isolate the problem to a specific area of code. This narrowing down to a specific section of code reduces the guessing, and allows you to focus on a few possible problems.
I created a log function for this that I call at specific places (in your case especially during startup). It adds a timestamp to each log text and stores them in a TMemo that is regularly saved. Not only very helpful when debugging, but may also shed some light on your problem.
Are you using code signing - ie Microsoft Authenticode? If so, then outdated certificate authorities on the computers can cause significant delays to startup.
First, I would try to defragment the hard disk. If still slow, I would check the power supply. Maybe your hard disk are getting insufficient energy.
Check if there is the same antivirus software on those 2 problematic computers. If so, then your Delphi application may match byte pattern used in some virus made in Delphi. Update virus definitions to solve it, or report false alarm to antivirus company, or change antivirus software.
Check if there isn't any printer installed on those 2 problematic computers. If it is so, then add any printer and try again.
Idea 1:
One reason I have seen for very slow application load time, is when printing or reporting system components like Developer Express Express Print, are in your application.
The problem I saw when using Developer Express Printing components, is that I had an offline or non-responsive network printer in my list of printers (check the control panel printer icon) that was not responding. Some of those Developer Express components seem to read some information from each printer you have installed, and the solution was to go to those clients, and delete old printers from their control panel, that were no longer being used. Each not-responding network printer added up to 60 seconds for a TCP Timeout, to the startup time of my application.
Update - Idea 2:
Download MS DebugView and install it on the machine that runs slowly. Now go back to your main development PC, open the IDE, open your main project file (right click on the project, view project source in project viewer), this will show you the contents of your main project source file (.dpr). go to the main begin....end. block. Now set a breakpoint on the main begin statement, and single step INTO (not OVER) and you will see all the module initialization sections. In each one add this: OutputDebugString('ModuleName').
Now when you run this inside the Delphi Ide you will see messages, and see how far apart they come in, and understand what is taking a long time to initialize. Instead of installing the delphi ide onto the machine that runs slowly, Debug View (which is less than 400kb single executable) will be run, and it will show you these debug messages, along with a nice time display (##.# seconds) for each message.
MS Debug view is here.
Are you allowing the forms to be constructed on initialization within the DPR source? If so, you may do well to consider whether or not you want those forms sucking up memory the entire time, more-over if you want those forms to be wasting the application's time on load.
A rule of thumb: If the form is used a LOT during the application's execution, allow it to be constructed when the application loads (this will work out faster over-all than constructing the instance "on-demand").
If the form is not used very often at all (for example, a Dialog or an About Box), delete the "Application.CreateForm" line from the DPR source, and instead construct your instance on request...
var
LForm: TfrmAbout;
begin
Application.CreateForm(LForm, TfrmAbout);
try
LForm.ShowModal;
finally
LForm.Free;
end;
end;
Now that form (which may not even be displayed during the program's execution) is not sucking up system resources, and will not slow down the application's load time.
It may not solve your problem 100%, but it should certainly help!

What is the "Cannot set allocations" error, who emits it and what can I do about it?

We've been plagued for several years by occasional reports from customers about a non-descript error message "Cannot set allocations" that appears on startup of our app. We have never been able to reproduce the problem in our own test environments so far. I have now run out of ideas for attempting to track this down. Here's a collection of observations that have accumulated over time:
Error message text reads "Cannot set allocations" (note absence of punctuation).
The window title simply reads "Error" (or the localized equivalent).
The "Cannot set allocations" text is always in English regardless of OS locale.
I have so far not been able to locate the DLL or EXE containing the message text.
Google is chock full of reports of this error for a variety of different products - but no solutions.
The only unifying aspect between the affected products I could make out so far was that they all appear to come in the form of DLLs that load into third-party processes (such as addins for Visual Studio or Windows Explorer shell extensions).
Our app is actually a shareware COM-addin for MS Outlook, written in Delphi (i.e. native code - no .NET).
The prime suspect in our case is the third-party licensing wrapper that we're using which decrypts and uncompresses our DLL into memory on the fly. Obviously I couldn't simply give an unprotected version of our app to the affected customers to verify this suspicion. Maybe the other vendors that this has been reported against are using similar products.
Debug versions of the protection wrapper supplied to us by the licensing vendor yielded no results: The log files looked exactly the same as from sessions where the error did not occur. Apparently the "inner" DLL gets decrypted and uncompressed all right but for some reason still fails to get loaded by the host process.
By creating an unprotected "loader" DLL we have been able to pinpoint the occurrence of the error somewhere behind the LoadLibrary call that is supposed to load our DLL into memory.
Extensive logging and global exception hooks in our own code (both the unprotected loader and the protected "core"-DLL) yielded no results at all. The error is obviously raised somewhere else.
The problem described in this earlier question of mine was very probably prompted by the same issue. This was before we created the unprotected loader stub.
The error only occurs at about 1-2% of our customers - whereas typically all installations at any affected customer's site are affected the same.
Sometimes the error goes away after we release a new version but often it will come back again after a couple of weeks or months.
Once the error has started to occur on a machine it does so consistently.
The error never occurs while connected to the affected machine via remote access (e.g. VNC, RDP, TeamViewer, etc.) and none of the affected customers are within travel distance from us so all we have to go by is log files and "eye-witness reports".
One customer reported that the error message dialog apparently was non-modal, i.e. he was able to simply move the dialog box to the side and continue working with the application (minus the functionality that our DLL would have provided). Not sure whether this is universally true in all other occurrences, too.
In some cases customers have been able to permanently rid themselves of the error by disabling or uninstalling other addins from other vendors that were sharing the host application with our own product.
The error has so far been observed on Windows XP, Vista and 7.
During the last few weeks we had a surge of reports from Outlook 2003 / Windows 7 users. Could the situation have been made worse by a recent Windows/Office-update?
Does anyone have any experience with this error at all?
Or any more ideas for investigating this?
I have only recently had this happen, which prompted my search and I ended up here. I can tell you that with me for sure it is in windows 7 home premium BUT ONLY WITH IE9 (which I hate by the way) it reduces the user back to the dummy stage and assumes that we have to be repeated flagged about everything.It will keep asking you if you want to disable add ons to speed up load times but usually on things that aren't really the things slowing the browser down in the first place,it is there is too much garbage loading in the first place.But back to the "Cannot set allocations", I for one have never expirianced it in any other browser which is not to say it doesn't happen with them.
This is going to be pure guess-work, but it sounds like maybe your third-party licensing software is trying to load your DLL at a particular location in memory, which - on these failing systems - happens to already be occupied by something else, perhaps a global hook DLL.
If you have a customer who is willing to work with you a bit, it might shed some light on the situation to get a crash dump (e.g., with ADPlus or maybe simpler with Sysinternals' ProcDump) when the error message is showing. That would show what modules are loaded and possibly the callstack (if it is from a message box at the time of the error as opposed to one that is catching an exception after the problem).
I also have experienced the "Cannot set allocations" issue. Royal pain. I had disabled Java, since I did not seem to need it, I used add/remove programs to remove Java from my system. Then I stated to get those errors. I have reinstalled, but disabled Java in IE explorer. Now I do not get the error anymore. Not a programmer, don't know why this happened. Maybe a clue for someone.
Win 7 - 64bit OS IE Explorer 10. Hope this helps someone figure this out. John
I've seen this happen. In my case a global hook dll created a large memory file mapping, perhaps to the memory the licensing dll was counting on.
I see "Cannot set Allocations" when I open google chrome only. Also after that, chrome closes with a msg saying "Whoa chrome has crashed..."
Still no solution :(
Also not a programmer, but it always happens when I open Chrome. It opens second window with error message 'cannot set allocation'. I usually close it and go on my way. if i don't it usually causes a crash. doesnt happen on any other browsers.

Resources