Progress - Monitor Database Read & Writes - performance

I'm new to all this and would like some guidance and I'm not sure if this is the correct place, any help would be appreciated and really aimed at someone who doesn't know what they're doing, not my job role but I'm being asked to pick it up.
We have a customer server running our Progress application, the last few days the server has been experiencing extreme performance issues, crippling the performance for all other users.
We know it's being caused by our application, as we can see it writing out to the Progress app server log, unfortunately we can't pinpoint the root cause.
I've been asked to monitor the database and see which tables are being hit to get a further understanding.
I've had a look and can only see third party monitoring solutions.
My question is, is there a native Progress monitoring tool and how would one roughly go about it?
Thanks

The native OpenEdge monitoring tool is PROMON. This is included with every installation. Just open up a proenv window and type:
proenv> promon /path/to/db/dbname
Most of the interesting stuff is under the R&D menu.
"OpenEdge Management" is sometimes considered a monitoring tool but it is an add-on product and is mostly focused on management and administration.
There are also (freemium) 3rd party tools focused on monitoring such as ProTop http://wss.com/protop (full disclosure, that's my product).
Or you can "roll your own" monitoring using the built-in virtual system tables.

Related

Creating a Windows installer using C# Winforms instead of Installer tool [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have used InstallAware and InstallShield before, and they are pretty difficult to work with and when something goes wrong it is very difficult to find and resolved the issue.
My question is why can't we use a Windows application written using C# to do this.
I understand that .Net framework may not be installed on the destination computer, so I wonder why no one has ever used this architecture:
I will create a simple installer using IntallSiheld(or any other similar tool) to just install .Net Framework and after that extracts and runs my own Windows application which I have written using C# in elevated mode. My application will run a Wizard with Back and Next button and I will take care of everything in it (copying files, creating and starting Windows Services, adding registry values, creating firewall extensions etc.)
Has anyone ever done this, and is there anything that prevents people from doing this?
In essence: don't try to re-invent the wheel. Use an existing deployment tool and stay with your day job :-). There are many such tools available. See links below.
And below, prolonged, repetitive musing:
Redux: IMHO and with all due respect, if I may say so, making your own installer software is reinventing the wheel for absolutely no gain whatsoever I am afraid. I believe you will "re-discover" the complexities found by others who have walked the path that is involved in deployment as you create your own installer software and find that software can be quick to make, but very hard to perfect. In the process you will expend lots of effort trying to wrap things up - and "the last meter is very long" as you curse yourself dealing with trifles that take up your time at the expense of what would otherwise pay the bills. Sorting out the bugs in any toolkit for whatever technical feature, can take years or even decades. And no, I am not making it up. It is what all deployment software vendors deal with.
Many Existing Tools: there are many existing tools that implement such deployment functionality already - which are not based on Windows Installer (Inno Setup, NSIS, DeployMaster and heaps of other less known efforts):
There is a list of non-MSI installer software here.
There is another list of MSI-capable software here.
My 2 cents - if you do not like MSI, choose one of the free, non-MSI deployment tools. How to create windows installer.
Corporate Deployment: The really important point (for me) is that corporate deployment relies on standardized packaging formats - such as MSI - to allow reliable, remote management of your software's deployment. Making your own installer will not impress any system administrators or corporate deployment specialists (at least until you sort out years of bugs and deficiencies). They want standardized format that they know how to handle (that does not imply that they are that impressed with existing deployment technology). Doing your deployment with standardized deployment formats can get you corporate approval for your software. If you make a weird deployment format that does unusual things on install that can't be easily captured and deployed on a large scale your software is head-first out of any large corporation. No mercy - for real. These are busy environments and you will face little understanding for your unusual solution.
"File-Pushers": Those of us who push files around for a living know that the field of deployment is riddled with silly problems that quickly kill your productiveness in other endeavors - the ones that make you stand out in your field - your day job. Deployment is a high profile, low status endeavor - and we are not complaining. It is just what it is: a necessity that is harder to deal with than you might think. Just spend your time more wisely is what I would conclude.
Complexity: Maybe skim the section "The Complexity of Deployment" here: Windows Installer and the creation of WiX. It is astonishing to deal with all the silly bugs that happen in deployment. It is not just a file copy, though it might be easy to think it is. And if it happens to be just a file copy, then there are existing tools that do the job. Free ones too. See links above. And if you think deployment is only file-copy in general, then please skim this list of tasks a deployment task should be capable of supporting: What is the benefit and real purpose of program installation?
Will your home-grown package handle the following? (just some random thoughts)
A malware-infected terminal server PC in Korea with Unicode characters in the path?
Symbolic links and NTFS junction points paths?
A laptop which shuts itself off in the middle of your file copy because it is out of battery?
Out of disk space situations? What about disk errors? And copy timeouts?
What about reboot requirements? For in-use files or some other reason. How are they to be handled? What if the system is in a reboot pending state and you need to detect it before kicking off your install?
How will you reliably install, configure and start and stop services?
How will you support uninstall and cleanup for your application?
Security software which flags your unknown, unrecognized, non-standard package a security threat and quarantines it? How would you begin to deal with this? Who do you contact to get into the good graces of a "recognized binary" for elevation?
Non-standard NTFS permissioning (ACLs) and NT Privileges? How do you detect it and degrade gracefully when you get permission denied? (for whatever reason).
Deployment of necessary runtimes for your application to work? (has been done by many others before). Download of the lastest runtimes if your embedded ones are out of date? Etc...
Provide a standardized way to extract files from your installation binary?
Provide help and support for your setup binaries for users who try to use them?
Etc... This was just a random list of whatever came to mind quickly. There are obviously many issues.
This was a bit over the top for what you asked, but don't be fooled to think deployment is something you can sort out a solution for in a few hours. And definitely don't take the job promising to do so - if that is what you are being asked. Just my two cents.
The above issues, and many others, are what people discover they have to handle when creating deployment software - for all but the most trivial deployments. Don't waste your time - use some established tool.
Transaction: If you are working in a corporation and just need your files to your testers, you can deploy using batch files for that matter - if you would like to. But you have to support it, and I guarantee you it will take a lot of your time. What do you do when the batch file failed half-way through due to a network error, and your testers are testing files that are inconsistent? Future deployment technologies may be better for such light-weight tasks. Perhaps the biggest feature of a deployment tool is to report whether the deployment completed successfully or not, and to log the errors and to roll the machine back to a stable state if something failed. Windows Installer does a lot of this work for you.
Distribution: A lot of people feel they can "just replicate my build folder to the user's computers". The complexities involved here are many. There is network involved, and network can never be assumed to be reliable, you need lots of error handling here. Then there is the issue of transactions: when do you know when the computer is in a stable state and should stop replicating. How often do you replicate, only on demand? How do you deal with the few computers that failed to replicate. How do you tell the users? These are distribution issues. Corporations have huge tools such as SCCM to deal with all these error conditions. Trying to re-implement all these checks, logging and features will take a long time. In the end you will have re-created an existing distribution system. Full circle. And how do you do inventory of your computers when there is no product registered as installed since only a batch file or script ran? And if you start replicating a lot of packages, how many times do you scan each file to determine if they are up to date? How much network traffic do you want to create? Where does it end? The answer: I guess transactions must be implemented with full logging and error tracking and rollback. Then you are full circle to a distribution system like I mentioned above and a supported package format as well.
This "just replicate my build folder to my users" ideas somehow remind me of this list: https://en.wikipedia.org/wiki/Fallacies_of_distributed_computing. Not a 100% match, but the issues are reminiscent. When networking is involved, things start to become very unpredictable and you need logging, error control, transactions, rollback, network communication, etc... We have re-discovered large scale deployment - the beast that it is.
Network: and let's say you want to replicate your build folder to 10000 desktop machines in your enterprise. How do you kick off the replication? Do you start all replications at once and take down the trading floor of the bank as file replication takes over the whole network like a DDOS attack? Sorry - it is getting out of hand - please pardon the lunacy - but it really is upsetting that this replication approach is seen as viable for large scale deployment with current technology approaches. Built-in Windows features could help, but still need to be tested properly. You need scheduling, queuing, caching, regional distribution shares, logging, reporting / inventory, and God knows what else that a packaging / deployment system gives you already. And re-implementing it will be a pain train of brand new bugs to deal with.
Maybe we one day will see automatic output folder replication based on automatic package generation which really works via an intelligent and transacted distribution system. Many corporate teams are trying, and by using existing tools they get closer with standard package formats used. I guess current cloud deployment systems are moving in this direction with online repositories and easy, interactive installation, but we still need to package our software intelligently. It will be interesting to see what the future holds and what new problems result for packaging and distribution in the age of the cloud.
As we pull files directly from online repositories on-demand we will see a bunch of new problems? Malware, spoofing and injection? (already problematic, but could get worse). Remote files deleted without warning (to get rid of vulnerable releases that should no longer be used - leaving users stranded)? Certificate and signature problems? Firewalls & proxy issues? Auto-magic updates with unfortunate bugs hitting everyone immediately and unexpectedly? And the fallacies of the network and other factors as linked to above. Beats me. We will see.
OK, it became a rant as usual - and that last paragraph is heading over board with speculation (and some of the issues already apply to current deployment). Sorry about that. But do try to get management approval to use an existing packaging & deployment solution is my only advice.
Links:
Stefan Kruger's Installsite.org twitter feed: https://twitter.com/installsite
Choosing a deployment tool:
How to create windows installer
What installation product to use? InstallShield, WiX, Wise, Advanced Installer, etc
Windows Installer and the creation of WiX
WiX quick start tips
More on dark.exe (a bit down the page)

Help needed with windows hooks

I am working on building a system that can monitor how users react to security alerts on their systems (software updates, warnings etc.). It also needs to monitor the web traffic and the processes running on the system and I am looking to the community to help me design this system. We intend to provide users with test laptops and monitor their behavior over a period of time to see how they react to security alerts thrown by various applications and the OS(windows in this case).
Following are my questions
Can I use windows hooks to solve the first problem i.e finding how users reacted to the alerts thrown by various applications. Specifically, can global hooks be used to solve this?
(How this information should be collected (XML?) and relayed back to a server(how frequently?) is another problem)
Can I do this in C# or it has to be done only in c++ or VB?
Do you know any alternate approach to solve the problem? Is there any software that has these capabilities.
I have many more questions but getting these answered would be a good first step. Really hoping for some good insights from the knowledgeable people on this community
Thank you in advance
Edit:
Example scenario is when adobe prompts you to update the flash player or the antivirus prompts you to update definitions or any application displays a notification(security related having keywords like update, warning, install etc.) needing the user to take some action. Windows system updates is another example. I want to know how the user reacted to these alerts/notifications/updates (which are typically a pop-up window). So i was wondering if i placed a global hook that can monitor the content of the windows displayed on screen and notify me(server) when certain words like update, alert, warning etc. appear in the content/title of the windows and what the user did with the message(dismissed it, Oked it etc). Unfortunately, i do not have any more specifications than this. I can use anything I want to achieve this and I am not clear on what my choices are.
Edit 2:
After having reviewed my requirements and having read about hooks, I feel like I could achieve this by a combination of hooks and the following textGrab SDK, http://www.renovation-software.com/en/text-grab-sdk/textgrab-sdk.html. I want some guidance to know if I am on the right track. I am thinking if I can install hooks then it gives me handles to all possible windows on the screen and I can use the textGRAB SDK to look for certain keywords in those windows. Although this may capture some interesting text, I am still not sure how I will know what action the user had taken on the window. Anybody having any experience with either hooks or textGRAB, please let me know if this looks like a reasonable thing to do. If the community has some other Ideas on how I could possibly monitor security related messages thrown by any application in the system, please suggest. I am looking forward to some useful advice for completing a challenging project.
First of all, you need to define, how you will "see" security alerts in code. "Security alert" is quite a vague term. Will it be some window with some caption and some message to the user or ... ?
Next, about web and processes: Windows hooks won't help you with your task. They are more low-level and not as advanced as you'd need. You can't hook network traffic (you need either network filter driver for pre-Vista or Microsoft Filtering Platform for Vista and later). See this question for some information about checking the process list with C# (there seems to be no easy way to catch process startup either).
It honestly sounds like you need a more solid direction. I commend you for trying to provide details, but It appears that you still need more information about your problem(s)..
I will attempt to answer some of your questions, but like I said - it sounds like you need to know more about your problems before we can provide you with optimal answer(s).
-Alerts is too vague a term, you will need to define this better. Are these 'alerts' applications that YOU have control over or are they third party applications? Not every application will show an 'Alert' in the same fashion, and even if they did - I think using a System Level Hook would probably be too problematic to implement your solution with. I'm not saying it's necessarily impossible, but you're talking about possibly implementing a different set of logic(to determine the data for a given application's Alert(s)) for each application that you want to monitor.
-It's impossible for any of us to determine the optimal storage mechanism for your particular needs, that is something that you will either need to provide more details about or decide on your own.
-How often you collect data is also something that you will have to either provide more details for or decide for on your own.
-C/C++ Would probably provide you with the most portable solution, although there is nothing preventing you from using c# to call Win32 API. (Not everyone has the .NET framework installed - believe it or not)
-The problem that you mentioned appears to be a somewhat specialized problem... I don't know of any existing software that will do everything that you want to do.
Another possible issue that you haven't touched on:
You haven't specified your target audience for this 'service', but I want you to know that if I found an application monitoring as many events as what you're talking about doing, I would promptly remove it and write a nasty letter to the company that wrote it.
In summary, Read this Article on hooks to get a better understanding of how they work.

Testing a wide variety of computers with a small company

I work for a small dotcom which will soon be launching a reasonably-complicated Windows program. We have uncovered a number of "WTF?" type scenarios that have turned up as the program has been passed around to the various not-technical-types that we've been unable to replicate.
One of the biggest problems we're facing is that of testing: there are a total of three programmers -- only one working on this particular project, me -- no testers, and a handful of assorted other staff (sales, etc). We are also geographically isolated. The "testing lab" consists of a handful of VMWare and VPC images running sort-of fresh installs of Windows XP and Vista, which runs on my personal computer. The non-technical types try to be helpful when problems arise, we have trained them on how to most effectively report problems, and the software itself sports a wide array of diagnostic features, but since they aren't computer nerds like us their reporting is only so useful, and arranging remote control sessions to dig into the guts of their computers is time-consuming.
I am looking for resources that allow us to amplify our testing abilities without having to put together an actual lab and hire beta testers. My boss mentioned rental VPS services and asked me to look in to them, however they are still largely very much self-service and I was wondering if there were any better ways. How have you, or any other companies in a similar situation handled this sort of thing?
EDIT: According to the lingo, our goal here is to expand our systems testing capacity via an elastic computing platform such as Amazon EC2. At this point I am not sure suggestions of beefing up our unit/integration testing are going to help very much as we are consistently hitting walls at the systems testing phase. Has anyone attempted to do this kind of software testing on a cloud-type service like EC2?
Tom
The first question I would be asking is if you have any automated testing being done?
By this I mean mainly mean unit and integration testing. If not then I think you need to immediately look into unit testing, firstly as part of your build processes, and second via automated runs on servers. Even with a UI based application, it should be possible to find software that can automate the actions of a user and tell you when a test has failed.
Apart from the tests you as developers can think of, every time a user finds a bug, you should be able to create a test for that bug, reproduce it with the test, fix it, and then add the test to the automated tests. This way if that bug is ever re-introduced your automated tests will find it before the users do. Plus you have the confidence that your application has been tested for every known issue before the user sees it and without someone having to sit there for days or weeks manually trying to do it.
I believe logging application activity and error/exception details is the most useful strategy to communicate technical details about problems on the customer side. You can add a feature to automatically mail you logs or let the customer do it manually.
The question is, what exactly do you mean to test? Are you only interested in error-free operation or are you also concerned how the software is accepted at the customer side (usability)?
For technical errors, write a log and manually test in different scenarios in different OS installations. If you could add unit tests, it could also help. But I suppose the issue is that it works on your machine but doesn't work somewhere else.
You could also debug remotely by using IDE features like "Attach to remote process" etc. I'm not sure how to do it if you're not in the same office, likely you need to build a VPN.
If it's about usability, organize workshops. New people will be working with your application, and you will be video and audio recording them. Then analyze the problems they encountered in a team "after-flight" sessions. Talk to users, ask what they didn't like and act on it.
Theoretically, you could also built this activity logging into the application. You'll need to have a clear idea though what to log and how to interpret the data.

Terrible DotNetNuke performance

I'm involved with a project using DotNetNuke version 05.01.04 Community Edition. We are building our new Intranet using it, but performance is terrible.
We have five people adding pages and content to it and every 15-30 seconds they experience a pause of 10 seconds or longer before the system continues and the next screens loads.
The server is Windows 2003, 3.8GHz with 1GB of RAM. I'm told by our server admin that the CPU and memory performance don't appear to be the bottleneck.
We currently have 350 pages in the system, we a plan to add 1000. So we need to resolve this performance problem so that we can enter content and so we can go live.
I just can't see where the bottleneck is. Is there a good why to determine the bottleneck when using DotNetNuke?
Modules installed
Publish:Engage (Not currently in
use)
Page Blaster (Doesn't appear
to providing caching when users
logged in using Integrated
Authentication)
SimpleGallery
XMod
Content Manager
IIS Setup
Application recycling completely disabled (Apart from a 2am recycle)
New findings: 18th March 2010
The main bottleneck was due to version 5.1.4 having a bug which caused 1300 database roundtrips on an average page, due to broken database in-memory caching. We've upgraded to 5.2.4 which has resolved this bottleneck.
Now the next biggest bottleneck is the navigation. We've used both DDR:Menu and DDN:Nav, but both have a major impact on performance.
Is there a navigation interface out there that doesn't drain performance so badly?
I think you need to start investigating this using performance profiling tools. For the DNN application itself I'd grab something like JetBrains DotTrace or Red Gate's ANTS Performance Profiler.
For the database SQL Server Profiler would be the first choice or a tool such as Red Gate's SQL Response.
Without profiling the application these you're going to be pulling at straws.
And as Tim pointed out in his comment, installing Firebug in Firefox with the YSlow add-in to see what resources are taking longest to serve to the browser.
Mitchel Sellers has some good tutorials and checklists to go through with regards to performance in DNN. Start with Explaining High Performance DotNetNuke Configuration and Management (which points to some of his earlier articles).
I have several years of dnn development and maintainance experience, when I have this kind of problem, I start doing things from database clean up. Next thing is, find for missing indexes, and/or rebuild all the indexes periodically (sql job scheduled for that) but major performance gain would be from clean up of table
Another good considerations would be, disabling trace, debug mode to false and turn off features of dnn that you don't use (scheduler is the first one to turn off)
Edit: consider keep alive as well
Hope this helps
Is your database on that server? If so, just throw in some more RAM, or get a faster disk array...
Have you considered creating this lot of pages directly through TSQL? It's not hard to do and may save you a lot of time.

"Works on my machine" - How to fix non-reproducible bugs?

Very occasionally, despite all testing efforts, I get hit with a bug report from a customer that I simply can't reproduce in the office.
(Apologies to Jeff for the 'borrowing' of the badge)
I have a few "tools" that I can use to try and locate and fix these, but it always feels a bit like I'm knife-and-forking it:-
Asking for more and more context from the customer: (systeminfo)
Log files from our application
Ad-hoc tests with the customer to attempt to change the behaviour
Providing customer with a new build with additional diagnostics
Thinking about the problem in the bath...
Site visit (assuming customer is somewhere warm and sunny)
Are there set procedures, or other techniques than anyone uses to resolve problems like this?
One of the attributes of good debuggers, I think is that they always have a lot of weapons in their toolkit. They never seem to get "stuck" for too long and there is always something else for them to try. Some of the things I've been known to do:
ask for memory dumps
install a remote debugger on a client machine
add tracing code to builds
add logging code for debugging purposes
add performance counters
add configuration parameters to various bits of suspicious code so I can turn on and off features
rewrite and refactor suspicious code
try to replicate the issue locally on a different OS or machine
use debugging tools such as application verifier
use 3rd party load generation tools
write simulation tools in-house for load generation when the above failed
use tools like Glowcode to analyse memory leaks and performance issues
reinstall the client machine from scratch
get registry dumps and apply them locally
use registry and file watcher tools
Eventually, I find the bug just gives up out of some kind of awe at my persistence. Or the client realises that it's probably a machine or client side install or configuration issue.
Extensive logging usually helps.
The easiest way is always to see the customer in action (assuming that its readily reproducible by the customer). Oftentimes, problems arise due to issues with the customer's computer environment, conflicts with other programs, etc - these are details which you will not be able to catch on your dev rig. So a site visit might be useful; but if that's not convenient, tools like RealVNC might help as well in letting you see the customer 'do their thing'.
(watching the customer in action also allows you to catch them out in any WTF moments that they might have)
Now, if the problem is intermittent, then things get somewhat more complicated. The best way to get around this problem would be to log useful information in places where you guess problems could occur and perhaps use a tool like Splunk to index the log files during analysis. A diagnostic build (i.e. with extra logging) might be useful in this case.
I'm just in the middle of implementing an automated error reporting system that sends back to me information (currently via email although you could use a webservice) from any exception encountered by the app.
That way I get (nearly) all the information that I would do if I was sitting in front of VS2008 and it really helps me to work out what the problem is.
The customers are also usually (sorta) impressed that I know about their problem as soon as they encounter it!
Also, if you use the Application.ThreadException error handler you can send back info on unexpected exceptions too!
We use all the methods you mention progressively starting with the easiest and proceeding to the harder.
However you forget that sometimes hardware is at fault. For example, memory could be malfunctioning and some computation-intensive code will behave strangely throwing exceptions with weird diagnostics. Of cource, it works on your machine, since you don't have faulty hardware.
Experience is needed to identify such errors and insist that customer tries to install the program on another machine or does hardware check. One thing that helps greatly is good error handling - when your code throws an exception it should provide details, not just indicate that something is bad. With good error indication it's easier to identify such suspicious issues related to faulty hardware.
I think one of the most important things is the ability to ask sensible questions around what the customer has reported... More often than not they're not mentioning something that they don't see as relevant, but is actually key.
Telepathy would also be useful...
We've had good success using EurekaLog with it posting directly to FogBugz. This gets us a bug report containing a call stack, along with related system info (other processes running, memory, network details etc) and a screen shot. Occasionally customers enter further info too, which is helpful. It's certainly, in most cases, made it much easier and quicker to fix bugs.
One technique I've found useful is building an application with an integrated "diagnostic" mode (enabled by a command line switch when you launch the app). That certainly avoids the need to create custom builds with additional logging.
Otherwise, it sounds like what you're doing is as good an approach as any.
Copilot (assuming customer is somewhere cold and rainy :)
The usual procedure for this is to expect something like this will happen and add a ton of logging information. Of course you don't enable it from the beginning, but only when this happens.
Usually customers don't like to have to install a new version or some diagnostic tools. It is not their job to do your debugging. And visiting a client for cases like these is rarely an option. You must involve the client as little as possible. Changing a switch and sending you a log file is OK - anything more than this is too much.
I like the alternative of thinking the problem at the bath. I will start from trying to find out the differences between my machine and the client's configuration.
As a software engineer doing webstuff (booking/shop/member systems etc) the most important thing for us is to get as much information from the customer as possible.
Going from
it's broke!
to
it's broke! & here are screenshots of
every option I picked whilst
generating this particular report
reduces the amount of time it takes us to reproduce and fix an issue no end.
It may be obvious, but it takes a fair amount of chasing to get this kind of information from our customers sometimes! But it's worth it just for those moments you find they're not actually doing what they say they are.
I had these problems also. My solution was to add lots of logging and give the customer a debug build with all the possible debug information. Then make sure dr Watson (it was on Windows NT) created a memory dump with enough information.
After loading the memory dump in the debugger I could find out where and why it crashed.
EDIT: Oh, this obviously only works if the application terminates violently...
I think following the trail of the actions user took can lead us to the reasons of failure or selective failures. But most of the times users are at loss to precisely describe the interactions with the applications, the automatic screenshot taking (if it is desktop app. for .net app you can check Jeff's UnhandledExceptionHandler). Logging all the important action which change state of the objects can also help us in understanding it.
I don't have this problem very often, but if I did, I would use a screen sharing or recorded application to watch the user in action without having to go there (unless, as you said, it's warm and sunny and the company pays the trip).
I have recently been investigating such an issue myself. Over the course of my carrier I have learnt that, while computer systems may be complex, they are predictable so have faith that you can find the problem. My approach to these kinds of issues two fold:
1) Gather as much detailed information as possible from the customer about their failure and analyse it meticulously for patterns. Gather multiple sets of data for multiple failure occurrences to build up a clearer picture.
2) Try and reproduce the failure in house. Continue to make your system more and more similar to the customers system until you can reproduce it, the system is identical or it becomes impractical to make it more similar.
While doing this consider:
1)What differences exist between this system and other working systems.
2)What has recently changed in your product or the customers configuration that has caused the problem to start occurring.
Regards
Depending on the issue you could get WinDbg dumps, they normally give a pretty good idea of what is going on. We have diagnosed quite a few problems that weren't crashed from minidumps.
For .Net apps we also was Trace.Writeline then we can get the user to fire up DbgView and send us the output.
Its very complicated issue . I was thinking writing some procedure for this . I just made some procedure for this non-reproducible bug . it might be helpful
When the Bug accorded .. There are several factors it might to occur.
I am Sure all bugs are reproducible . I always keep eye for these kind of issues..
Get the System Information
what other process the customer did before that.
Time period it occurs . its rare or frequent
its next action happened after the issue ( its always same or different )
Find the factors for this bug ( as developer )
Find the exact position where this issue happened .
Find ALL THE SYSTEM Factors on that time
check all memory leaks or user error issue or wrong condition in code
List out all facotrs to may cause this issue.
How the each factors are affected this and wat are the data is holding those factors
Check memeory issues happened
check the customer have the current update code like yours
check all log from atleast 1 month and find any upnormal operation happened . keep on note
Just a short anecdote (hence 'community wiki'): Last week I thought it was a clever idea in a Django app to import the module pprint for pretty printing Python data only if DEBUG was True:
if settings.DEBUG:
from pprint import pprint
Then I used here and there the pprint command as debugging statement:
pprint(somevar) # show somevar on the console
After finishing the work, I tested the app with setting DEBUG=False. You can guess what happened: The site broke with HTTP500 errors all over the place, and I did not know why, because there is no traceback if DEBUG is False. I was puzzled that the errors disappeared magically, if I switched back to debug mode.
It took me 1-2 hours of putting print statements all over the code to find that the code crashes at exactly the above pprint() line. Then it took me another half an hour to convince myself to stop banging my head on the table.
Now comes the moral of the story:
Not every thing that looks like a clever idea in the first view is such savvy in the end.
An important point to look at for debugging these errors are all configuration options and platform switches your code by itself makes. This can be quite a lot more than just some user preferences. Document good, if you make an assumption about the user's platform (e.g., if you test for Win/Mac/Linux only, will your code crash on BSD or Solaris?)
Cheers,
However tough a non-reproducible problem is - we can still have a structured and strategic approach to solve them - and I can say through experience that it requires out of box thinking in 50% of the cases. Generally speaking, one can categorize the problems into different types which helps to identify what tool to be used. For example if you have a non-reproducible application crash issue or a memory issue you can use profilers and nail down the issue caused in the particular functionality.
Also, one of the most important approach is inforamation rich logging. I also use a lot of enums to describe the state of the process depending on the scenario in question. for exampe, I used like Initiated, Triggered, Running, Waiting Repaired etc to describe the schedules states and saved them to DB at different stages.
Not mentioned yet, but "directed code review" is one good solution, especially if you didn't do a proper review (at least 1 hour per 100 lines of code) before release.
I have also seen impressive demos of AppSight Suite, which is basically an advanced environment monitoring and logging tool. It allows the customer to record what happens on his machine in an extensive but fairly compact log file which you can then replay.
As many have mentioned, extensive logging, and asking the client for the log files when something goes wrong. In addition, as I worked more with web apps, I'll also provide detailed, but succinct deployment documentation (e.g., deployment steps, environmental resources that need to be set up etc).
Here are common problems I've seen that lead to the types of problem you are describing:
Environment not set up properly (e.g., missing environment variables, data sources etc).
Application not fully deployed (e.g., database schema not deployed).
Difference in operating system configuration (default character encoding being the most common culprit for me).
Most of the time, these issues can be identified through the log content.
You can use tools like Microsoft SharedView or TeamViewer to connect to remote PC and inspect problem directly on site. Of course, you'll need cooperation with customer.

Resources