Memory dump for period of time - debugging

When a program is misbehaving, it is pretty easy to capture a memory dump of the process, and then analyze it with a tool like WinDBG. However, this is pretty limited, you only get a snapshot of what the process is doing, and in some cases finding why a certain part of the code was reached is really difficult.
Is there any way of capturing memory dumps for a period of time, like recording a movie rather than taking a picture, which would indicate what changed in that period of time, and the parts of the code that were executed in that time interval?

Recording many memory dumps
Is there any way of capturing memory dumps for a period of time, like recording a movie rather than taking a picture
Yes, that exists. It's called Procdump and you can define the number of dumps with the -n parameter and the seconds between dumps with -s. It might not work well for small values of s, because it takes longer to take the crash dump.
Example:
procdump -ma -n 10 -s 1 <PID> ./dumps
However, this technique is usually not very helpful, because you now have 10 dumps to analyze instead of just 1 - and analyzing 1 dump is already difficult. AFAIK, there's no tool that would compare two dumps and give you the differences.
Live debugging
IMHO, what you need is live debugging. And that's possible with WinDbg, too. Development debugging (using an IDE) and production debugging are two different skills. So you don't need to install a complete IDE such as Visual Studio on your customer's production environment. Actually, if you copy an existing WinDbg installation onto a USB stick, it will run portable.
Simply start WinDbg, attach to a process (F6), start a log file (.logopen), set up Microsoft symbols, configure exceptions (sx) and let the program run (g).
Remote debugging
Perhaps you may even want to have a look into WinDbg's remote debugging capabilities, however, that's a bit harder to set up, usually due to IT restrictions (firewall etc.).
Visual Studio also offers remote debugging, so you can use VS on your machine and just install a smaller program on your customer's machine. I hardly have experience with it, so I can't tell you much.
Logging
the parts of the code that were executed in that time interval?
The most typical approch I see applied by any company is turning on the logging capabilities of your application.
You can also record useful data with WPT (Windows Performance Toolkit), namely WPR (Windows Performance Recorder) and later analyze it with WPA (Windows Performance Analyzer). It will give you call stacks over time.

Related

Live vs. offline debugging

I've been trying to find the difference between these 2 types of debugging, but couldn't find it anywhere (been googling almost 30 minutes), so I'm asking here: What's the difference between live vs. offline debugging? What do people mean when they say a debugger is "live" vs. "offline"?
Debugging types
There are several ways of debugging that can be distinguished:
live debugging vs. post mortem debugging (what you call "offline" debugging, also called "dump debugging")
kernel debugging vs. user mode debugging
local debugging vs. remote debugging
which give 8 combinations in total.
For live debugging, you can distinguish between invasive debugging vs. noninvasive debugging.
Live debugging vs. offline debugging
In live debugging, the program is running and the debugger is attached to it. This means you can still interact with the program. You can set breakpoints, handle exceptions that would normally cause the program to terminate, modify the memory etc.
The downside of live debugging is its temporal/fluent nature. If you enter a wrong command or step too far, the situation is gone and might not be repeatable.
I mentioned that there are 2 sub-modes for live debugging: invasive and noninvasive debugging: in noninvasive debugging, the debugger does not attach to the target application. It suspends all of the program's threads and has access to the memory, registers, and other such information. However, the debugger cannot control the target.
In post mortem debugging, someone has captured a memory dump of a running program at a certain point in time. In many cases this is done upon a specific event, e.g. an unhandled exception that causes the program to terminate. Since the memory dump is a file on disk, you can analyze it as often as you want and you get the exact same situation.
The downside if post mortem debugging is, of course, that the program is not running, you can't interact with it and it's very hard to find out what happens next.
"Online" debugging is the normal process:
Tell the debugger to tell the program to step forwards;
Look at what the program state is at the moment;
Set a breakpoint for the future;
Tell the debugger to simply run the program;
If the breakpoint 'fires', have a look at the program state now.
There are two ways to "offline" debug:
You can take your source code and manually step through what the processor ought to be doing, watching for unexpected program paths.
Note if you do this, you need to diligently not "know" what the processor is "supposed" to do and just do that: you need to honestly obey the code as though you were the computer. Often you get other people, who don't know the code, to do this instead of you.
You take the result of a run-log, usually captured by a hardware probe, and use the debugger to "post mortem" the run.
The latter usually requires a processor that will transmit what it is doing out a "Trace" port (not all have this), and a hardware device (like a probe) connected to the Trace port to capture the data. That probe then communicates with a debugger, which takes the data and presents it to the programmer. The programmer can work backwards and forwards through this Trace log, and see the execution path that the code actually took, rather than the code the programmer thought it should take.
Some processors not only transmit what instruction they're currently processing, but also what data they read or wrote while doing this. A more sophisticated debugger can take this extra data and provide a 'snapshot' of the system at any time during the run, allowing the programmer to analyse why the code behaved the way it did.
The reason that it is called "offline" is because once the log has been captured, you can disconnect and power down the target, and look at the saved log at any time in the future without still being connected to the probe or processor.

How is WinDbg used, what exactly is it, and does it relate to .dmp files?

In the past, I have heard references to parsing .dmp files using WinDbg (I think - I might be wrong).
I have also done fairly extensive debugging with the help of .map files, and I have done extensive debugging using standard logical heuristics and the Visual Studio debugger.
However, occasionally, the program I am developing crashes and creates a .dmp file. I have never been able to interpret the .dmp file. A while ago, I posted a SO question regarding how to interpret .dmp files ( How to view .dmp file on Windows 7? ), but after somewhat significant effort I was unable to figure out how to interpret .dmp files using the answer to that question.
Today, I was viewing an unrelated SO question ( C++ try/throw/catch => machine code ), and a useful comment underneath the accepted answer has, once again, made reference to WinDbg.
If you really want to find this out though, it's easy - just trace
through it in WinDbg
I would like to follow this advice. However, for me, it's not easy to "just trace through it in WinDbg". I've tried in the past and can't figure out what exactly this means or what to do!
So, I'm trying again. "For once and for all", I would like to have plain-and-simple instructions regarding:
What is WinDbg
Assuming WinDbg is related to .dmp files, what exactly is a dump file and how does it relate to WinDbg (and correct me if my assumption is wrong)
How do you create .dmp files and, correspondingly, how do you use WinDbg to analyze them (again, correct me if I'm wrong about the relationship between WinDbg and .dmp files).
If you could please answer this question from the "starting point" of a programmer who ONLY has Visual Studio installed and running.
Thanks!
WinDbg is a multipurpose debugger. It can debug a live process by attaching, set breakpoints, etc like you would with any other debugger. It can also analyze crash dump files as well, which are .dmp files. It functions by you giving it commands.
A .dmp file is a memory dump of something. What that something is depends on what the memory dump is for. It could be for a process, for example. It could also be for the kernel. What is in the memory dump depends, too. In your case, it's probably what your process looked like at the time of it crashing. What the memory dump contains can vary depending on the dump type.
There are various ways. On Windows Vista+, Server 2008+ - you can do it right from the task manager. Right click the process, and click "Create Memory Dump". WinDbg can make a memory dump from a live process too by using the .dump command. Other tools like adplus can be used to automatically create a memory dump on certain conditions, like when a process exceeds a memory or CPU threshold, or when it crashes.
WinDbg can open a Crash Dump easily enough. What is important is that you get your symbols loaded correctly, first. Usually in the form of .pdb files or from a symbol server (though not necessary, or always possible, it is greatly helpful).
Once you have WinDbg running, take a look at the list of commands available to poke around in your crash dump.
WinDbg is a Gui version of the command line debugger cdb.exe, both are user-process and kernel mode debuggers, it uses DbgHelp.dll to issue commands to your application or NT kernel (you can also do the same as it has an api).
.Dmp files are memory dumps of varying detail, some can have minimal detail enough for call stacks of all threads, whilst others will put the entire user-mode memory, handle information, thread information, memory information etc.. see this for more info. So dump files have nothing to do with WinDbg, other than it can open them, incidentally you can open .dmp files in Visual Studio
Like #vcsjones has already stated you can do this using task manager (at least you can from Vista onwards), you can use procdump, you can do this once WinDbg is attached, I usually do a full mini dump like this: .dump /ma c:\mem.dmp, you can also set Windows to do this when a crash happens using Dr. Watson
However, you must have the symbols for Windows and your application in order to be able to generate sensible call stacks, note that for obvious reasons you cannot step through or set breakpoints in a a memory dump, you can only do this for a live process. You can also have WinDbg attach non invasively, so Visual Studio could be attached and you can attach WinDbg non invasively and use the toolset in WinDbg to assist debugging.
For me the main advantage of WinDbg is its free, it is a small download and install, it is fast, it has a very rich toolset for diagnosing problems that are either difficult or impossible to do using visual studio.

reason for crashing of the windows

I wrote some program which uses information about (reads via Windows) hardware of the current PC (big program, so I can't post here code) and sometimes my windows 7 crashes, the worst thing is that I have no idea why, and debug doesn't help me, is there any way to receive from windows 7 some kind of log, why it crashed? thanks in advance for any help
The correct (but somewhat ugly) answer:
Go to Computer->Properties, go to 'Advanced System Settings'.
Under startup and recovery, make sure it is set to "Kernel memory dump" and note the location of the dump file (on a completely default install, you are looking at C:\windows\memory.dmp)
You optimally want to install Windows Debugging tools (now in the Windows SDK) as well as setting the MS Symbol store in your symbol settings (http://msdn.microsoft.com/en-us/library/ff552208(v=vs.85).aspx)
Once youv'e done all that, wait for a crash and inspect memory.dmp in the debugger. Usually you will not see the exact crash because your driver vendors don't include symbols, but you will also generally get to see the DLL name that is involved in the crash, which should point you to what driver you are dealing with.
If you are not seeing a specific driver DLL name in the stack, it often indicates to me a hardware failure (like memory or overhead) that needs to be addressed.
MS has a good article here at technet that describes what I mentioned above (but step by step and in greater detail) http://blogs.technet.com/b/askcore/archive/2008/11/01/how-to-debug-kernel-mode-blue-screen-crashes-for-beginners.aspx
You can also look at the event log as someone else noted, but generally the information there is next to useless, beyond the actual kernel message (which can sometimes vaguely indicate whether the problem is driver or something else)

Win32 console processes in VISTA - 10% CPU, but VERY SLOW

I have a Win32 console application which is doing some computations, compiled in Compaq Visual Fortran (which probably doesn't matter).
I need to run a lot of them simultaneously.
In XP, they take around 90-100% CPU together, work very fast.
In Vista, no matter how many of them I run, they take no more than 10% of CPU (together), and work very slow respectively.
There is quite a bit of console output going on, but now VERY much.
I can minimize all the windows, it does not help. CPU is basically doing nothing...
Any ideas?
Update:
No, these are different machines, but they run relatively the same hardware. 2. Threads are not used, this is a VERY OLD (20 yrs) plain app for DOS, compiled in win32. It is supposed to compute iterations until they meet, consume all it has. My impression - VISTA just does NOT GIVE IT MORE CPU
Have you tried redirecting the console output to a file?
If your applications are being held up writing to the console (this happens sometimes unfortunately) then redirecting the output should help, as it's much quicker to write to a simple file than write to the console.
You do this like so
c:\temp> dir > output.log
If you really don't care about the output at all, you can throw it away, by redirecting to nul. eg:
c:\temp> dir > nul
There was a known "feature" in Vista that limits certain console applications to 32MB of RAM. I don't know if those compiled by Compaq Visual Fortran are affected by this "feature."
This article appears to have been updated as recently as October 2008, so the problem still exists.
To expound on Daok's post - your XP machine might be CPU bound for this process, whereas the vista machine is bound by some other resource.
To clarify:
output to stdout (or other) can be slowing down the processing. (as can context switching or file access, etc)
As Tim hinted, console output (stdout) is EXTREMELY expensive.
I suggest rerunning your test while redirecting the console output to a separate log file for each process. If possible, tune down the verbosity of the output in another test run.
Beyond that, there are other obvious possibilities: is the hardware significantly different, are there other major processes running, is there a shared resource that is under contention?
Other than the obvious, look for a nonobvious resource contention such as a shared file.
But the main area where I would look is whether there is a significant difference in how your code is compiled for the two OS environments--I wonder if your Fortran code is incurring some kind of special penalty when running on Vista, such as a compatibility mode. Look to see how well Vista is supported and whether you can target your compile for Vista specifically. Also look for anyone reporting similar issues, such as in bug reports, feature requests, etc.
Your loops are obviously not simple computations. There is a blocking system call in there somewhere. Just because it worked on XP doesn't mean the app is bug free.
Since you can minimize the console windows and see no improvement, I would not consider that an issue. In my experience console output slows a program down only if the console window is drawing text, not when it's minimized.
Is it the same machine hardware on your Vista and XP? It might use just 10% of the Vista because it doesn't require more. Are you using Thread? I think it requires more information about your project to have a better idea. Have you try to use a profiler to see what's going on?

What causes the MS Windows 'System' Process to go nuts when compiling?

A couple of times recently I have noticed that 'something' is causing the Windows System Process to sit at 50+% and it will not quit until the PC is rebooted. Happening on Win2k and Win XP so far.
This is particularly troublesome because it currently appears to be triggered by MSVC 2005/Incredibuild and rebooting the build servers is not a nice thing.
At the same time the 'System Idle Process' process is holding the rest of the CPU and the build steps themselves seem to be starved. ie. a module that normally takes <5 minutes to compile is currently taking 20+.
I'd take a few guesses at maybe being virus checker or tortoise svn but would desperatly like some other suggestions.
Edit:
I've been experiencing this as something that is triggered, and the culprit may not be ongoing. Thats not to say that some other ongoing process hasn't done something 'stupid' and is managing an active lock up of System while appearing to be idle itself.
System (100% of 1 core), and System Idle Process are sharing 98-100% of the total CPU.
Occasionaly mt.exe, link.exe, buildservice would get a look in at 1-2%.
I'm running VNC to view the machine, so it's getting a look in on occasion.
Edit 2:
When left the previous evening the build process seemed to be progressing all be it slowly, but after waiting another 13 hours the 1 hour build process hasn't completed. System is still hogging the 1 core.
My understanding is that the "System" process is the time spent in the kernel (so performing disk I/O, network I/O (you did mention Incredibuild) and the like) -- I'd check for disk fragmentation, virus checkers and possibly look at these on other machines in your Incredibuild cluster.
As the System Idle process runs at "Low" priority, it's a red herring that it'd be "taking up CPU time" -- if anything it's just showing that there is available CPU time available. The fact the processing is stuck to a single processor shows that the process is doing something that is not multi-core aware, or someone has set it's thread affinity to 1.
I've noticed the virus checking software that I use can radically slow down compilation but it does not extend beyond the end of the build. Turning off advanced and heuristic checking improves this to the extent that I do not have to disable the scanner entirely. I have changed my scanning strategy such that I use scheduled full scans now more than advanced on the fly scanning, as it hurts the perfromance of a number of apps. (n.b. I am using the latest cut of Kaspersky). I'm also using an automated backup tool (AJCBackup) that also needs to be restrained when compiling.
You may also want to consider disableing the Windows Indexing service on drives that are be used to create a lot of temporary and object files, as it doesn't provide much value in this context for the amount of performance it draws.
Edit: Have checked which processes are actually hogging the CPU core and traced them back to a given app?
We've encountered issues with Kaspersky and Incredibuild in our offices - compiles and sometimes links will just hang and never finish.
Only seems to affect some machines though which is wierd, and only Windows XP (Vista seems immune from what I've seen).
Only solution I've found so far is to turn Kaspersky off entirely - so if you find a solution then let me know!
RE: smacl, work from the Windows Search/Indexing Service (WSearch) won't be attributed to the System process's CPU time, it should come from the SearchIndexer.exe/SearchFilterHost.exe services (Vista+).
The majority of activity from System you will see will be in disk activity from the lazy writer and other disk accesses. CPU activity from System will be because of kernel activity such as drivers (ISRs/DPCs) and other kernel-level filters (which could include AV file and process filters).
Process Explorer (http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx) can aid in viewing CPU usage across processes, including System. You can use the public Microsoft Symbol Server and this resource to get you started.
If you can take a trace with Xperf (http://msdn.microsoft.com/en-us/performance/cc825801.aspx), I can help you analyze where the CPU time is being spent in the System (kernel) context. Xperf isn't officially supported on XP, but you can take a trace on XP and analyze it on other systems.
Xperf and Process Explorer should be able to shine a spotlight on exactly the module(s) that are causing the runaway CPU usage. Symbols may not even be necessary to diagnose the problem; simply the module name can often point to the component in question that is slowing down your system. For example, high CPU usage from ndis.sys can point to network interrupts, or activity from modules such as aavmker4.sys can point to AV software (Avast! in this case).
And as always, check if there are any updated drivers and AV software for your system.
In my office, a conflict between Incredibuild and Spyware Doctor's Immunize feature caused similar issues. Turning off Immunize solved it for us.
What anti-virus/malware do you use?
I'm having same hangs when compiling using IncrediBuild in VS2003, on clean Windows 7 without any anti-virus. It worked fine on same box in XP and Vista.

Resources