I have known for a while that VB6 is trouble, but what can I do - I have a legacy system to maintain and it still being developed here and there.
And I had the weirdest thing ever happen to me with VB6 just now.
Compiled a new version of our application. Tested it - it worked.
Deployed to a client production site. Application keeps crashing! When? When the user clicks anywhere inside a DHTMLEdit control that we have in a specific window. Or sometimes even on the window that contains it.
Now, I've had this before and the solution was always very cryptic. So I tried the following: incremented the project's revision number by 1, recompiled, and guess what?
Works perfectly.
I cannot show code because we are talking about 50,000 lines of code here, and there is no specific code called when clicking the DHTMLEdit control, the form, whatever.
I'm just wondering if anyone else had encountered this oddity.
The VB6 compiler is not a deterministic compiler and does not produce anything near consistent output. IOW, while fixing your bug is not by design, the fact that it can produce such behavior is.
But even in deterministic compilers, making a small change can sometimes "fix" (aka hide) bugs that are triggered by memory alignments, or variable initialition
Related
I know this sounds like a weird question.
I have a Xamarin.Android app. I'm going through my code and cleaning up stuff that was too aggressively wrapped with calls to RunOnUiThread. My understanding is that code that runs in response to UI events, for example, is already on the UI thread and therefore doesn't need to be wrapped. I don't like the idea of RunOnUiThread calls just sprinkled everywhere like magical pixie dust.
I've removed a bunch of these calls, and preserved the ones I think are necessary. When I run the app, everything works great. But not even a single crash on the first try seemed almost too good to be true, so as a sanity check, I tried to intentionally crash the app by adding UI update code in places where I expected it to be required to be wrapped by RunOnUiThread. However, nothing I do causes that effect!
This makes me concerned that there is some magic happening somewhere that is preventing me from seeing a problem.
The last thing I tried, which I thought would surely crash the app, was to use a Java.Util.Timer to execute a Java.Util.TimerTask that sets the Alpha on an ImageView to 0.5. But still the app hums merrily along!
Now I am totally suspicious that something -- maybe some new OS feature -- is silently fixing things but that customers running in some other configuration will experience crashes. I want to see the error so that I can have confidence that when I don't see it, things are OK.
So, what can I do to intentionally get the Only the original thread that created a view hierarchy can touch its views error?
A co-worker of mine had downloaded a vb example project a while back to see how to make a call or two, he discovered it today again, and noticed that while sitting in the IDE the time/date is updating automatically when the entire project is not even running.
How does this work, we looked around for code but can't find anything giving it away.
Any ideas?
No enough info, really, but is it possible the time and date are part of a user control sited on the main form? In the IDE, user controls operate in a hybrid mode where their code can be running when the app itself is not.
I can't think of any other scenario where if the IDE is truly stopped, any code can be running.
The StatusBar control in Windows Common Controls can automatically display the current date and time. It updates its display even in design mode.
If you have ever developed controls yourself, you might have experienced an AHA! moment when you realised that your control code is running when the control is being used at design time in the IDE.
Bugs can be difficult enough to resolve when they're your (or a coworker's) fault. However, we all know that the technology we use to implement our programs is written by infallible people such as ourselves. So it stands to reason that some people have been affected by bugs in the implementation of the tools they used.
So, have you found a bug in your program that was caused by a widespread underlying technology, such as a programming language or framework? If so, did it fail with some indication, or did it silently overwrite some data? How difficult was it to debug? Did it cause a potential security vulnerability? Were you able to contact the provider and confirm that it was fixed (or fix it yourself)?
Here are some of the worst (in my opinion) technologies to have a bug in (especially one that fails silently):
Programming language
Concurrency framework
Remote API
Database
I deal with one on a daily basis called Internet Explorer.
To be fair though, there are lots of bugs in all of the browsers. I have filed several bugs for Firefox as well, and just the other day I found a strange case where the border doesn't take padding into consideration.
This is a good argument for writing lots of unit tests. If you upgrade your platform to a newer version that has some new bug, hopefully you have a test that reveals the bug
in one case I had, the vendor was working on a brand new API. They were not ready to release the new API, but they were not very keen to fix the bug in the old one either as they considered it dead from a $$ perspective.
A colleague once stumbled across a bug in the Jikes Java compiler. He had something like this:
if (condition)
{
}
else
{
System.out.println("Code that does stuff.");
}
He hadn't intended to leave the top block empty permanently, but just had it that way during development. He discovered that the condition was ignored unless he put a comment in that block so that it was no longer empty.
During my time developing (mostly) with Java I've run into bugs in the following components:
Java compiler
This actually happened several times. Usually we found that ecj (the Eclipse compiler) and javac (the Sun compiler) disagree about the validity of some Java code. Usually I enter bug reports for both systems, one of them gets accepted and the other one is closed as invalid.
Database engine
Those are very rare and very, very nasty, because no one expects the DB itself to have a bug. In our case it was an old product (the bug was already fixed in a newer release) that accepted values in a field that were not within the defined range (similar to having a NOT NULL field containing NULL)
JDBC driver
There were several bug fixes to a JDBC driver due to a project I've been working on. The bug fixes ranged from trivial ("why is there debug output in the production release?") to might-not-even-be-a-real-bug ("you can easily safe one roundtrip per statement by doing that-and-that").
JVM implementations
those are hard to diagnose and often present themselves as effectively random crashes on one JVM and stable operation on another one. We temporarily switched JVM vendor several times due to things like this.
Each time it took quite some time (and usually the involvement of the vendor of the component) before I actually believed it was a bug in that component.
And yes: the cases of false positives (i.e. the bug was actually in my/our code) were orders of magnitude more common.
The only place where bugs in the third-party component are kind-of expected seem to be web browsers. Almost no one questions you when you say "that's a bug in <insert buggy browser of the week>, we need to work around it like this ...".
I guess almost anyone who has programmed JavaScript with Internet Explorer has found a bug in their program which was caused by a widespread underlying technology.
The indication of failure is the blue "e" on your Windows desktop.
The first that comes to mind was with version 1 of the .NET Framework; for some reason, Random.NextDouble() method never produced a value greater than 0.5. I was completely baffled, and having run a test apps that called the method thousands and thousands of times, I had to presume it was a bug and work round it.
Never did find out what the cause was...
I've run into something with the gcc 4.4.0 but as the product I'm currently working on is still pre-alpha, it was fairly easy to repair locally. Hopefully they'll fix it soon.
i found a very strange bug in gcc on Mipsel (openwrt). We was testing a small app (about 3K sloc) that give me sigsev even if the code was corrected in theory.
I don't know the detail of the bug (and I don't have that code anymore) but change the gcc version from 4.1 to 3.6 solve the problem.
I am trying to help a client with a problem, but I am running out of ideas. They have a custom, written in house application that runs on a schedule, but it crashes. I don't know how long it has been like this, so I don't think I can trace the crashes back to any particular software updates. The most unfortunate part is there is no longer any source code for the VB6 DLL which contains the meat of the logic.
This VB6 DLL is kicked off by 2-3 function calls from a VB Script. Obviously, I can modify the VB Script to add error logging, but I'm not having much luck getting quality information to pinpoint the source of the crash. I have put logging messages on either side of all of the function calls and determined which of the calls is causing the crash. However, nothing is ever returned in the err object because the call is crashing wscript.exe.
I'm not sure if there is anything else I can do. Any ideas?
Edit: The main reason I care, even though I don't have the source code is that there may be some external factor causing the crash (insufficient credentials, locked file, etc). I have checked the log file that is created in drwtsn32.log as a result of wscript.exe crashing, and the only information I get is an "Access Violation".
I first tend to think this is something to do with security permissions, but couldn't this also be a memory access violation?
You may consider using one of the Sysinternals tools if you truly think this is a problem with the environment such as file permissions. I once used Filemon to figure out all the files my application was touching and discovered a problem that way.
You may also want to do a quick sanity check with Dependency Walker to make sure you are actually loading the DLL files you think you are. I have seen the wrong version of the C runtime being loaded and causing a mysterious crash.
Depending on the scope of the application, your client might want to consider a rewrite. Without source code, they will eventually be forced to do so anyway when something else changes.
It's always possible to use a debugger - either directly on the PC that's running the crashing app or on a memory dump - to determine what's happening to a greater or lesser extent. In this case, where the code is VB6, that may not be very helpful because you'll only get useful information at the Win32 level.
Ultimately, if you don't have the source code then will finding out where the bug is really help? You won't be able to fix it anyway unless you can avoid that code path for ever in the calling script.
You could use the debugging tools for windows. Which might help you pinpoint the error, but without the source to fix it, won't do you much good.
A lazier way would be to call the dll from code (not a script) so you can at least see what is causing the issue and inspect the err object. You still won't be able to fix it, unless the problem is that it is being called incorrectly.
The guy of Coding The Wheel has a pretty interesting series about building an online poker bot which is full of serious technical info, a lot of which is concerned with how to get into existing applications and mess with them, which is, in some way, what you want to do.
Specifically, he has an article on using WinDbg to get at important info, one on how to bend function calls to your own code and one on injecting DLLs in other processes. These techniques might help to find and maybe work around or fix the crash, although I guess it's still a tough call.
There are a couple of tools that may be helpful. First, you can use dependency walker to do a runtime profile of your app:
http://www.dependencywalker.com/
There is a profile menu and you probably want to make sure that the follow child processes option is checked. This will do two things. First, it will allow you to see all of the lib versions that get pulled in. This can be helpful for some problems. Second, the runtime profile uses the debug memory manager when it runs the child processes. So, you will be able to see if buffers are getting overrun and a little bit of information about that.
Another useful tool is process monitor from Mark Russinovich:
http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx
This tool will report all file, registry and thread operations. This will help you determine if any you are bumping into file or registry credential issues.
Process explorer gives you a lot of the same information:
http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx
This is also a Russinovich tool. I find that it is a bit easier to look at some data through this tool.
Finally, using debugging tools for windows or dev studio can give you some insight into where the errors are occurring.
Access violation is almost always a memory error - all the more likely in this case because its random crashing (permissions would likely be more obviously reproducible). In the case of a dll it could be either
There's an error in the code in the dll itself - this could be something like a memory allocation error or even a simple loop boundary condition error.
There's an error when the dll tries to link out to another dll on the system. This will generally be caused by a mismatch between dll versions on the machine.
Your first step should be to try and get a reproducible crash condition. If you don't have a set of circumstances that will crash the system then you cannot know when you have fixed it.
I would then install the system on a clean machine and attempt to reproduce the error on that. Run a monitor and check precisely what other files (dlls etc) are open when the program crashes. I have seen code that crashes on a hyperthreaded Pentium but not on an earlier one - so restoring an old machine as a testbed may be a good option to cover that one. Varying the amount of ram in the machine is also worthwhile.
Hopefully these steps might give you a clue. Hopefully it will be an environment problem and so can be avoided by using the right version of windows, dlls etc. However if you're still stuck with the crash at this point with no good clues then your options are either to rewrite or attempt to hunt down the problem further by debugging the dll at assembler lever or dissassembling it. If you are not familiar with assembly code then both of these are long-shots and it's difficult to see what you will gain - and either option is likely to be a massive time-sink. Myself I have in the past, when faced with a particularly low-level high intensity problem like this advertised on one of the 'coder for hire' websites and looked for someone with specialist knowledge. Again you will need a reproducible error to be able to do this.
In the long run a dll without source code will have to be replaced. Paying a specialist with assembly skills to analyse the functions and provide you with flowcharts may well be worthwhile considering. It is good business practice to do this sooner in a controlled manner than later - like after the machine it is running on has crashed and that version of windows is no longer easily available.
You may want to try using Resource Hacker you may have luck de-compiling the in house application. it may not give you the full source code but at least maybe some more info about what the app is doing, which also may help you determine your culrpit.
Add the maximum possible RAM to the machine
This simple and cheap hack has work for me in the past. Of course YMMV.
Reverse engineering is one possibility, although a tough one.
In theory you can decompile and even debug/trace a compiled VB6 application - this is the easy part, modifying it without source, in all but the most simple cases, is the hard part.
Free compilers/decompilers:
VB decompilers
VB debuggers
Rewrite would be, in most cases, a more successful and faster way to solve the problem.
This is the day of weird behavior.
We have a Win32 project made with Delphi 2007, which hosts the .NET runtime and calls into .NET to show new forms, as part of a transition period.
Recently we've begun experiencing exceptions at seemingly random locations and points of our code: Arithmetic overflow or underflow.
The stack trace of one of these looks like this:
at System.Windows.Forms.UnsafeNativeMethods.DispatchMessageW(MSG& msg)
at System.Windows.Forms.Application.ComponentManager.System.Windows.Forms.UnsafeNativeMethods.IMsoComponentManager.FPushMessageLoop(Int32 dwComponentID, Int32 reason, Int32 pvLoopData)
at System.Windows.Forms.Application.ThreadContext.RunMessageLoopInner(Int32 reason, ApplicationContext context)
at System.Windows.Forms.Application.ThreadContext.RunMessageLoop(Int32 reason, ApplicationContext context)
at System.Windows.Forms.Application.RunDialog(Form form)
at System.Windows.Forms.Form.ShowDialog(IWin32Window owner)
at System.Windows.Forms.Form.ShowDialog()
at Gatsoft.Gat.UI.Windows.Forms.Remanaging.RemanageForm.DelphiOpenInNewMode(String employeeCode, String departmentCode, DateTime date) in C:\Dev\VS.NET\Gatsoft\Gatsoft.Gat.UI.Windows\Forms\Remanaging\RemanageForm.Delphi.cs:line 67
In the Visual Studio solution, one of the outmost class libraries (ie. pulls in all the references it can), has set a specific debug program, targetted for the Delphi project output. This allows us to debug .NET code from Visual Studio, even though the main bulk of the program is written in Delphi.
The problem only occurs when run from the debugger, not if we just run the exe file directly (either through explorer, shortcuts, or even Ctrl+F5 inside Visual Studio).
There's apparently no spyware on the machine (as hinted by this).
Any other things we can check?
Edit: It looks like the .NET debugger is enabling this SNaN flags, and the Delphi debugger does not. We'll have to investigate this further, but for now I'll accept #Lorenzo Boccaccia's answer.
Apparently Solved
Ok, it looks like we've finally nailed this problem. The problem started occuring without having the debugger attached as well, for our testers, so we had to prioritize the problem way up.
Finally we found one common issue with the machines that had the problem, they are Dell Lattitude D620 laptops with an NVIDIA Quadro NVS 110M, with an old driver from a system image used to provision the laptops, from back in 2006.
I found one post on the web, though I lost the url when I rebooted to update the display driver, that had a .NET service crashing, mostly when the machine was busy doing something on the screen. One way to reproduce his problem was to open a command prompt to C:\ and doing a DIR /S to just force a massive amount of screen updates, which would trigger the crash.
He too had a NVIDIA video card.
The problem on my machine occured roughly every 2-4 startups of our program, but after updating the video driver I've had 123 successfull startups without any problems. (BTW I can recommend AutoHotKey for such things).
So it looks like we've found the culprit, an old/buggy NVIDIA driver.
Updated this question so that perhaps someone in the future can save some time.
Now, if you'll excuse me, I'm going to go cry in a corner.
Jinxed!
I must've jinxed it. No sooner had I posted the above update than a colleague laptop failed, after updating the video driver.
Still, I'm positive it's a problem outside of our application now, so it just remains to figure out which specific things to update.
Further updates: Ok, my machine is now apparently fixed, not so with my colleagues machine. So far we've updated the BIOS, Chipset drivers, and currently SP3 for XP is on its way in.
A burn-in test will be done tonight, where the app will be left overnight starting up, as the problem cropped up either during startup, or at the first time some WinForms .NET code was executed. This app is mainly a Delphi Win32 app, but it hosts the .NET runtime, and the problem seems to be related to .NET code. When we "boot" the .NET runtime, the problem can appear, or when we fire the first .NET window from Win32 then it can also appear.
Statistically I'm ready to release this code now. Over the night the application has been started 3051 times without errors, whereas before I updated the video driver it crashed every 2-4 times.
Prodded and found(!/?)
This bug-fixing ordeal feels like going to the doctor, where the following conversation ensues:
Doc: Does this hurt?
Me: No...
Doc: What about now?
I've prodded and poked the application and finally I think I've found something we did that introduced this problem.
In our app we host the .NET runtime, from a Delphi 2007 Win32 application, and in our glue-code we have the following line (now):
rc := CorBindToRuntimeEx('v2.0.50727', 'wks',
STARTUP_LOADER_OPTIMIZATION_MULTI_DOMAIN or STARTUP_CONCURRENT_GC,
#clsid, #iid, UnkRuntimeEngine);
The two constants in the middle there was originally just a 0, meaning pick the defaults. This change was introduced a few months ago and the problem has been slowly creeping in on us after this. The change was introduced in order to encourage ANTS profiler to load our Win32 application + hosted .NET runtime in order to do performance profiling and the changes we introduced back then made that work. Additionally, the problem with arithmetic overflow/underflow has slowly been getting worse so I bet the problem didn't appear for a while after the change so it wasn't attributed to any of the changes we did.
Also, since we only (originally) saw the problem when running through the debugger, we thought something was wrong with Visual Studio and/or Delphi.
Anyway, statistically now, with a browser on one screen doing repeated scrolling up and down triggered by a javascript (apparently needed in order to trigger the bug), then I have been able to successfully start the application 726 times with a 0 in the call, and it crashes 5 out of 17 times with the two constants there.
Doc: Does this hurt?
And let's not get into who made that change in the first place. I'm sure the culprit wants to be left anonymous... cough
a debug version of a linked dll could be compiled with signaling nan support, see http://blogs.msdn.com/oldnewthing/archive/2008/07/02/8679191.aspx for an example of this problem.
that heisenbug was caused by uninitialized variables, here there could be a linked dll enabling the snan feature of the cpu and forgetting to disable it upon returning
Do the errors occur still occur if you attach the debugger after starting the application?