Difficult to debug embedded application - debugging

I'm trying to debug an application on an embedded device running an old version of Linux/Qtopia. I asked for help on QT forums but the people there don't know about old software and embedded systems. I'd really like some help with debug strategies.
My program will crash after the main window has been constructed, i.e. some time into the event loop. But depending on the order of functions in the constructor, sometimes it will run only from the console and sometimes it will only run from the icon. Despite my best efforts I can't narrow down what is causing the problem.
There is no seg fault or signal but my program does not continue and the destructor does not get called. It seems to me that one of the first things that would happen in the event loop is a resize event and when this is called could vary if you ran from the console or icon. Also, the various widgets in my GUI would be initialised and drawn so that is also a potential source of error, if I haven't set up something properly.
My debugging options are limited as the area where the crash actually occurs is not under my control. I tried logging to a file and printing to stderr but this was no help. When I got to the state where it runs from the icon but not console, I tried running in gdb and strace but it ran OK - the classic problem of debug software initialising differently.
My next thought is to try to force a core dump and then analyse that. How do I force a core dump ? Is there a better strategy ?

Logging to a file or to a communication port (serial port, etc.) is probably the simplest way to see what is happening and maintaining the normal runtime (i.e. not in a debugger).
You say that logging to a file and printing to stderr was no help. Why not? Are you printing relevant debugging information to the file? Are you using the Linux/Qtopia sources and adding debug logging?
Assuming you have sources for all of the code you are running, it should be just a matter of adding debug logging in the right places to pinpoint where the problem is occurring.

Related

How to find out why a MFC program closes silently

I have the following bug in a program:
An MDI MFC program closes silently on Windows 7 (terminates the process without prompting to save changes and without displaying any "crash" dialog) when the user performs this operation : Click a context menu item.
But this happens only on that PC, at least, for the moment. No other PC has encountered that problem. However the bug can always be reproduced following the same steps but only on that PC.
I want to know the reason but probably the customer won't allow me to install many programs to debug, so I need to be able to log when the program terminates, or print the stack in Release version but I'm quite lost.
I had faced a similar bug before and eventually I fixed it logging line by line and changing the problematic part, but I guess there are much better ways to find bug reasons than this.
I have tried on my development PC to create minidumps on Release mode but if there is no exception thrown on that PC, (I haven't confirmed that yet, though...) maybe it's pointless.
Also used an available class on codeproject (Stackwalker) but I don't manage to print all the function calls. Only on simple console programs, but not on MDI or even SDI.
Any ideas on how to find out the reason? Thanks in advance.

The thread '<No Name>' (0xb24) has exited with code 0 (0x0)

whenever i try to run program for example,
if i have to run "frmphonebook" so in
Application.Run(new frmphonebook());
I typed but when i run it it run another form, and it happens to each and every form and it is displaying output as
The thread 'vshost.RunParkingWindow' (0x63c) has exited with code 0 (0x0).
The thread '<No Name>' (0xb24) has exited with code 0 (0x0).
how to solve this ?
You can give your threads a name it would also help you in your debugging...
But in many apps threads are created implicitly and you have no control over the name.
So that is not an error message. Code 0 means everything went according to plan. Any non-zero code usually indicates an error.
edit: You can also disable the display of these messages, when debugging, do right click on output, and choose what do you want see.
If a thread has exited with code 0 it ran successfully. On Codeproject is a Beginners-Guide-to-Threading
This article on threading might also be helpfull. This question on so could also be of use. A list of System Error Codes
One of the things you will learn about using the Debugger is that you will see what we might call "the soft white underbelly" (an allusion to alligators' anatomy) of the system: all kinds of DLLs being loaded and unloaded, the somewhat complex arrangement of "helper" threads being started and stopped... etc.
It can be distracting to a less experienced user, to see all of these messages. However, over time, you will come to understand that the Debugger is simply being truthful and verbose. The details it is displaying for you might not really be relevant to your debugging process, but it cannot "know" that; it is only displaying factual information, and you have to sort out what is relevant and what is not.
As for Windows Forms applications, I have myself noticed that there seem to be several "helper" threads, typically with no name, or (as is frequently seen by me when debugging), they are named things like "vshost.RunParkingWindow". Typically, you have to trust that the system is creating threads on your behalf, in addition to any threads you might create yourself. As others have suggested, give your own threads meaningful names so you can tell them apart from the system's threads.
You can get further insight into the multithreaded structure of your Windows Forms app by putting a breakpoint somewhere in your UI update code, and when it hits, use Debug/Windows/Threads to bring up a view of all the threads running in your process space. You'll be quite surprised, I think, by how many there are! Try creating and .Show()-ing several forms in your app, one by one. I think you'll see that each .Show() operation creates a new window, and with it, several supporting threads for that window.
You may also see messages in the debug window such as the following: "A first chance exception of type 'System.ObjectDisposedException' occurred in System.Windows.Forms.dll". Many times there are system exception handlers that perform a reasonable default action on your behalf. This message appearing without a break in the debugger indicates that some default handler took care of this exception for you.
The system support for something like a Windows forms application is somewhat complicated, to make YOUR implementation easier and simpler. When you run the debugger, you get to see some of these details. Over time, you will learn what is "usual" and what is indicative of a problem.
Check to see if there are some files in your web app that have been rendered inaccessible. For my case my chart control created a text file which was read only and it threw an exception. Deleted the file and the folders and voila
i found your solution i think....i the visual studio go to project >properties >linker >system look for the Subsystem line and click the down arrow and change to Console(....words....).
it worked for me !! ENJOY"

Troubleshooting VB6 App Crash after XP to Win7 Upgrade

I have a VB6 application that I provide support for. This application works on both Windows XP and Windows 7. Some users were migrated from Windows XP to Windows 7 using the User State Migration tool. These users now receive a generic "Application has crashed" Windows error message when they open certain screens (forms) in the application. My assumption is that there is a missing dll/ocx reference, but I'm having trouble tracking it down.
I've tried many/varied troubleshooting techniques:
Full uninstall and reinstall of my application
Manually re-registering all dll's and ocx's that I know are used
Running Process Monitor on a broken computer and a working computer to compare what dll's and ocx's are accessed. The answer might be here but even after filtering out most of the background noise the amount of data is overwhelming. At a minimum I reviewed all of the calls right before it crashes and all of the calls that were not successful. All of the non-successful calls match between working and non-working.
Installed the Windows Debugger Tools and captured a crash dump. Analyzed the crash dump with DebugDiag. DebugDiag says the exception is in msvbvm60.dll. I tried building a PDB file for my exe and loading it in DebugDiag to get more detail about where the exception is occuring but DebugDiag doesn't want to accept the PDB (might be doing something wrong here, but it just seems to ignore it. This same PDB file works fine when I do remote debugging, however.)
I recompiled my VB6 program without any optimizations in PCode. I've read online that sometimes building in PCode, while bad for performance, will tell you the real exception.
Used the above created PDB file to remote debug the VB6 application. The debugger says that the application crashes after the new window has been created, on a line that sets MousePointer = vbHourGlass... To me it seems unlikely that this is the real cause of the error. There are at least 20 other locations in the program where this same line is called and all work fine.
(Forgot about this one)
Used Dependency Walker and profiled the application on both a working and non-working computer. All errors found by dependency walker were the same between the two computers. There were no additional dependencies found on a working computer, and all missing dependencies on the non-working computer were also missing on the working one.
None of these actions changed my error message or showed me what the error is (unless it really is the mouse cursor issue)... There are no entries in the Windows Event Log related to the app crash.
The non-working and working computers all have the same base Windows 7 image, the only difference is whatever is being changed by USMT, which further convinces me that this is some kind of quirky configuration change or a missing dll/ocx or perhaps an unregistered dll/ocx.
Any ideas or thoughts on how I can track down the root cause of the issue would be greatly appreciated.
Update 1 - Response to questions
#MarkHall I have tried running it as admin, though not with UAC off. The application runs fine on a Windows 7 box as a non-admin with full UAC. Windows XP was 32-bit, Windows 7 is 64-bit, but again it works just fine on a like for like box where the user was not migrated from Windows XP.
#Beaner It's possible that it stores settings somewhere that have been corrupted, but the remote debugging leads me to think that it's more likely something else since it seems to die on a step related to the UI, which then makes me think it's probably a missing dll/ocx reference.
#Bob77 The application is installed into Program Files (x86). While many of the libraries do reside in the same folder, they are all registered.
Peter, often I've noticed that the debugger will indicate a line of code that is actually incorrect, depending on WHERE in the actual assembly language the fault occurs. You should look REAL close around your statement that sets the cursor to vbHourGlass. Your exception is PROBABLY happening BEFORE that line of code, but that line is what the debugger thinks is the actual faulted line of code.
Since you said it happens when a window OPENS, I'd look real close at any ocx's you may have referenced on the form, but perhaps NOT actually being used, or called. You might have one there that you don't intend to be there, that could be causing security issues, or something on Win7? Edit the .frm file by hand if you have to, and look at all the GUIDs the form references.
It is possible that one machine is using PER-USER registration, and the other is using PER-SYSTEM registration?? I don't know...
I would take a much closer look at the form that you are trying to open, and be VERY cautious of everything you are doing in the form load events, and so on. This sounds like it could be something as stupid as Windows Aero being enabled on one system, and not another, or some other sort of "Theme" setting that is throwing the VB Form Rendering routine into a hissyfit... Perhaps even something as stupid as a transparent color index in the icon you selected for that from?
If you are still developing this app, (or at least maintaining it), create an entirely NEW form, and re-create all the controls, etc, on the form (resist the temptation to copy/paste them from the old one...), and then see if THAT does the trick. Then, copy all the event code to the new form one event at a time, with at LEAST enough event code to make the form function, even if it's just a "dead form", that loads no data, or whatever the form is supposed to do. Check and debug after each change, and you WILL find it eventually. Of course, make sure you isolate one of the defunct systems to have a platform that you can duplicate the issue on, or then it's just guessing. I find that using something like Acronis w/ Universal Restore is a great option to then take the image file into a good HV, like VirtualBox, and then restore that image as a VM, so you can debug without interfering with your actual users. This sounds like a lot of work, but then again, so is re-writing an application that already exists, right? :)
Failing THAT... /* and */ are your friends!! (Well, we're dealing with VB, so ' would be your best friend! heh... But I'd start commenting out all the code on the form until that sucker opens. Then once it opens, start putting one line back at a time, and re-running it... That's called "VooDoo Debugging", but sometimes, you gotta do what you gotta do...
THANKS A LOT PETER! :) Now you got ME so involved in this, I feel like I'M the one debugging this sucker! Like if it was MY code I was trying to fix! :)
Let me know if any of this helps... I am actually quite interested in what you discover.

How to prevent error pop-up message box for failed program (.exe) when running batch file

I'm running a test script from batch file.
Because it is test, the programs are expected to fail once in a while. It is file as long as error code is returned so I can continue and mark specific test as failed.
However there is very annoying behavior of executable files under Microsoft Windows - if something fails it pop-ups window like:
This application has failed to start because foo.dll was not found, Re-installing the application may fix the problem
<OK>
Or even better:
The instruction at "..." referenced to memory at "..." ..
Click on OK to terminate the program
Click on CANCEL to debug the program
The result is known - the script execution blocks till somebody presses "Ok" button. And when we talk about automatic scripts that may run automatically at night in some headless virtual machine, it may be very problematic.
Is there a simple way to prevent such behavior and just make an application to exit with failure code - without changing the code of the program itself?
Is this possible at all?
The answer is following: You need to disable WER.
Simplest description for this I found at http://www.noktec.be/archives/259
Simply (ON XP): Right Click on My Computer > Advanced > Error Reporting > Disable
Voila - programs crash silently!
This does not solves problem when DLL is missing, but this is much rare case and this is good enough for me.
You can suppress AV's and such from showing a dialog box by running your application, or the script (the script engine, like cscript.exe), under a debugger.
Use Gflags.exe, or modify the registry directly, and set Image File Execution Options for the image in question. See this article for details on how to use the appropriate registry keys. You can set it up using a debugger commandline like "C:\Debuggers\ntsd.exe -g -G -c'command'", where you can pass commands to ignore certain types of exceptions in the -c"commmand" argument. This will effectively give you a tool to suppress interactive dialogs as a result of exceptions like AV, and will let the process continue (presumably to immediate end after the exception has occured).
This article explains the commands you can use to control exceptions and events from withing the debugger.
The -g and -G flags make sure that the process won't break into the debugger automatically during process start and end respectively. You'll have to play with the various exception suppression options to make sure that you 'eat' all possible first and second chance exceptiosn that might cause the process to break into the debugger.
Also, if you can tolerate a process being broken into the debugger (as against being stuck showing a dialog box), perhaps that would be a better option overall. You can evaluate each debug break in batch mode at a later time and decide which bugs you care to fix.
It is possible. We used to use IBM's Rational Robot product which could monitor the screen for specific items and, if found, send keystrokes to windows and other sorts of things.
We actually used it for fully automated unit and system testing, much like you're trying to do.
Now I thought that Robot has been through quite a few name changes so it may be hard to find but there it is, right on IBM's web page and with a free downloadable trial for you. It's not cheap, clocking in at a smidgeon under USD5,000 but it was worth it for us.
There's also TestComplete where you could get a licence for just unedr USD1,000 - it touts "Black-box testing - Functional testing of any Windows application" as one of its features and also has a downloadable demo to see if it's suitable before purchase.
However, you may be able to find another product to do the same sort of thing.
I initially thought of Expect but the ActiveState one seems to concentrate on console applications which leads me to believe it may not do graphics well.
The only other option I can suggest is to write your own program in VBScript. I've done this before to automate the starting of many processes (log on to work VPN, start mail, log in and so on) so I could be fully set up with one mouseclick instead of having to start everything manually.
You can use AppActivate to bring a window to the foreground and SendKeys to send arbitrary keypresses to it after that. It's possible you may be able to cobble together something from that if you want a cheaper solution.

I need to find the point in my userland code that crash my kernel

I have big system that make my system crash hard. When I boot up, I don't even have
a coredump. If I log every line that
get executed until my system goes down. I will find that evil code.
Can I log every source code line in GDB to a file?
UPDATE:
ok, I found the bug. It was nasty. The application I started did not
take the system down. After learning about coredump inspection with mdb, and some gdb stepping I found out that the systemcall causing the dump, was not implemented. Updating the system to latest kernel will fix my problem. Thanks to all of you.
MY LESSON:
make sure you know what process causes the coredump. It's not always the one you started.
Sounds like a tricky little problem.
I often try to eliminate as many possible suspects as I can by commenting out large chunks of code, configuring the system to not run certain pieces (if it allows you to do that) etc. This amounts to doing an ad-hoc binary search on the problem, and is a surprisingly effective way of zooming in on offending code relatively quickly.
A potential problem with logging is that the log might not hit the disk before the system locks up - if you don't get a core dump, you might not get the log.
Speaking of core dumps, make sure you don't have a limit on your core dump size (man ulimit.)
You could try to obtain a list of all the functions in your code using objdump, process it a little bit and create a bunch of GDB trace statements on those functions - basically creating a GDB script automatically. If that turns out to be overkill, then a binary search on the code using tracepoints can also help you zoom in on the problem.
And don't panic. You're smarter than the bug - you'll find it.
You can not reasonably track every line of your source using GDB (too slow). Besides, a system crash is most likely a result of a system call, and libc is probably doing the system call on your behalf. Even if you find the line of the application that caused OS crash, you still don't really know anything.
You should start by clarifying which OS is crashing. For Linux, you can try the following approaches:
strace -fo trace.out /path/to/app
After reboot, trace.out will contain syscalls the application was doing just before the crash. If you are lucky, you'll see the last syscall-of-death, but I wouldn't count on it.
Alternatively, try to reproduce the crash on the user-mode Linux, or on kernel with KGDB compiled in.
These will tell you where the problem in the kernel is. Finding the matching system call in your application will likely be trivial.
Please clarify your problem: What part of the system is crashing?
Is it an application?
If so, which application? Is this an application which you have written yourself? Is this an application you have obtained from elsewhere? Can you obtain a clean interrupt if you use a debugger? Can you obtain a backtrace showing which functions are calling the section of code which crashes?
Is it a new hardware driver?
Is it based on an older driver? If so, what has changed? Is it based on a manufacturer's data sheet? Is that data sheet the latest and most correct?
Is it somewhere in the kernel? Which kernel?
What is the OS? I assume it is linux, seeing that you are using the GNU debugger. But of course, that is not necessarily so.
You say you have no coredump. Have you enabled coredumps on your machine? Most systems these days do not have coredumps enabled by default.
Regarding logging GDB output, you may have some success, but it depends where the problem is whether or not you will have the right output logged before the system crashes. There is plenty of delay in writing to disk. You may not catch it in time.
I'm not familiar with the gdb way of doing this, but with windbg the way to go is to have a debugger attached to the kernel and control the debugger remotely over a serial cable (or firewire) from a second debugger. I'm pretty sure gdb has similar capabilities, I could quickly find some hints here: http://www.digipedia.pl/man/gdb.4.html

Resources