When does a memory dump get written after BSOD? - bsod

I have a machine that periodically BSOD's. I have full memory dumps configured. Sometimes, when rebooting AFTER the BSOD, the machine sits on a black screen for several minutes, then finally comes up to windows. My colleague just recently found out that during this time, the machine was writing the full memory dump to disk.
He identified this by shutting the machine down on the BSOD, plugging the drive in as a slave, seeing that no dump file was present, plugging it back into a machine, seeing the black screen for several minutes, and then finding the memory dump on disk.
So my question is, how does this work, internally? I swear I've seen the BSOD itself telling me that it is currently writing the dump file to disk, with a counter.

What's happening is that the BSOD overwrites the pagefile with the full contents of memory in order to avoid doing any complicated processing after a critical system error (bear in mind that the cause of the BSOD could have damaged any heaps, code, unloaded drivers etc, so the BSOD basically can't rely on anything). When the system reboots it discovers that the pagefile's been marked as a BSOD, and then sets about converting the RAW dump to a minidump file that can be analysed either by Microsoft's crash-reporting server or by a driver developer analysing the crash dump in WinDbg or Visual Studio.

While I don't know for sure, its possible that it has to write RAM to disk while displaying the screen, but when it reboots, it pulls the rest of the process memory space out of the swap file to create the full core dump.

This is the first time I hear something like that. I though the dump was always writting while the BSOD is shown. You can try to connect the kernel debugger in verbose mode and figure out what's happening.

Related

Detect unclean filesystem shutdown

I have a project where we manipulate large amounts of cached data using memory mapped files. We use Windows 10, NTFS and .NET.
When the user starts the application, we detect if the previous program session was shutdown correctly, and if so we reuse the cache.
However, this is a pain for developers when debugging. It's quite common to just stop the program being debugged. At next startup, the cached data needs to be recalculated, which takes time and is annoying.
So, we've been thinking we could introduce a 'transaction log', so that we can recover even if the previous shutdown was unclean.
Now for the actual problem.
There seems to be no guarantees in which order memory mapped files are flushed. In case the program is just stopped, there is no problem, since the entire memory mapped file will be flushed to disk by the operating system. The problem comes if there is a power cut. In this case, there are no guarantees what state the file is in. Our "transaction log" doesn't help either, unless we always flush the transaction log to disk before modifying the cache. This would defeat the purpose of our architecture, since it would introduce unacceptable performance penalties.
If we could somehow know that our memory mapped file on disk was previously left in a state where the OS didn't manange to flush all pages before operating system shutdown, we could just throw the entire file away at next startup. There would be a delay, but it would be totally acceptable since it would only occur after a power cut or similar event.
When the operating system boots, it knows that the file is possibly corrupt, because it knows the filesystem was not cleanly unmounted.
And finally, my question:
Is there some way to ask Windows if the file system was clean when it was mounted?
NTFS periodically commits its own logs and so there's a window in which a power fail could occur and NTFS would (correctly) state that the volume (as in, "NTFS DATA" not user data) is clean.
You will likely have to do what databases do which is to lock your cache into physical memory so that you can control the writes-to-disk.

Restore memory dump

If in testing on a computer without a debugger, say a client's computer, I encounter a bug that may have corrupted the state of the program but not actually crashed it, I know I can take a memory dump using the Windows Task Manager (right click on process name, create dump file).
I can use these with WinDbg to peek around in memory, etc., but what would be most useful to me is to be able to restore the dump into memory so that I can continue interacting with the program. Is this possible? If so, how? Is there a tool that can restore it or do I need to write my own.
The typical usermode dumps or minidumps do not contain enough information to do so. While they contain all usermode memory, they do not contain kernel memory, so open handles to kernel resources like files or network sockets will not be included in the dump (and even if they were, the hard disk has most likely changed so just trying to write to the hard disk may corrupt your system even more).
The only way I see to restore a memory dump is restoring the full memory and all other state like hard disk state, which can be done with most virtual machine software (which will, however, disconnect all your network connections on restore; gratefully most programs can handle lost network connectsions better than lost file handles).
I discovered that I could do this with Hyper-V snapshots. If I run my program in a virtual machine, I can optionally dump the memory, create a snapshot, transfer the dump if necessary, come back some time later, restore the snapshot and continue the program.

how to get memory dump after blue screen

I'm getting a lovely BSOD on bootup (STOP: 0x0000007E) from a driver I'm writing, and would like to load up the memory dump for analysis. However, it's not getting dumped anywhere. Everything is setup correctly in the Startup and Recovery settings, but I get no dump file, and nothing in the event log stating a dump has taken place. It looks like a dump is not even occurring...
I know the exact line of code causing it (a call to IoAttachDevice()), but am not sure why, and would like to view the DbgPrint output to see where exactly it's failing. Could Windows possibly be crashing before the dumping functionality is set up? If so, how do I get access to the state of the machine when the failure occurs?
UPDATE: Other possibly useful information: I'm running Windows XP through VirtualBox on a Linux host.
I don't know why you're not getting a dump file, but if you have ready access to the machine, attach a kernel debugger to it an repro the error - you'll be left with the machine sitting in the debugger, ready to go (you can have the debugger produce the dumpfile for you if you want to debug offline as well).
Right-click on "my computer" select "Advanced", under "startup and recovery" click "settings". select "kernel memory dump" or "complete memory dump".
What's the start setting of your driver? If it starts too early in the boot order, the filesystem might not be remounted read-write yet, and therefore there's no place for a dump to go.
Drivers under development shouldn't generally be set to auto-start until you've gotten the driver stable when loaded later. Of course you eventually need to set it to auto-start so you can verify it works correctly, but that comes later.

Can a simple program be responsible for a BSOD?

I've got a customer who told me that my program (simple user-land program, not a driver) is crashing his system with a Blue Screen Of Death (BSOD). He says he has never encountered that with other program and that he can reproduce it easily with mine.
The BSOD is of type CRITICAL_OBJECT_TERMINATION (0x000000F4) with object type 0x3 (process): A process or thread crucial to system operation has unexpectedly exited or been terminate.
Can a simple program be responsible for a BSOD (even on Vista...) or should he check the hardware or OS installation?
Just because your program isn't a driver doesn't mean it won't use a driver.
In theory, your code shouldn't be able to BSOD the computer. It's up to the OS to make sure that doesn't happen. By definition, that means there's a problem somewhere either in hardware or in code other than your program. That doesn't preclude there being a bug in your code as well though.
The easiest way to cause a BSOD with a user-space program is (afaik) to kill the Windows subsystem process (csrss.exe). This doesn't need faulty hardware nor a bug in the kernel or a driver, it only needs administrator privileges1.
What is your code exactly doing? The error message ("A process or thread crucial to system operation has unexpectedly exited or been terminate.") sounds like one of the essential system processes terminated. Maybe you are killing a process and unintentionally got the wrong process?
If somehow possible you could try to get a memory dump from that customer. Using the Debugging Tools for Windows you can then further analyze that dump as described here.
1Windows doesn't prevent you from doing so because it "keeps administrators in control of their computer". So this is by design and not a bug. Read Raymond's articles and you will see why.
Short answer is yes. Long answer depends on what is you program is suppose to do and how it does it?
Normally, it shouldn't. If it does, there must be either
A bug in the Windows kernel (possible but very unlikely)
A bug in a device driver (not necessarily in a device your program uses, this could get quite complicated)
A fault in the hardware
I would bet on option number two (device driver) but it would be interesting if you could get us a more detailed dump.
Well, yes it can - but for many different reasons.
That's why we test on different machines, operating systems, hardware etc..
Have you set some requirements for your program and is your user following them?
If you can't duplicate it yourself, and your program doesn't need admin to run, I'd be a bit suspicous about
The stability of that system's hardware
The virus/malware status of that system.
If you can get physical access to the client box, it might be worth running a full virus scan with an up-to-date scanner, and running a full memtest on it.
I had a system once that seemed stable, except that a certian few programs wouldn't run on it (and would sometimes crash the box). Memtest showed my RAM had some bad bits, but they were in higer sims, so they only got accessed if a program tried to use a lot of RAM.
No, and that is pretty much by definition. The worst thing that you can say is that a user-land application may have "triggered" a Windows bug or a driver bug. But a modern desktop Operating System is fully responsible for its own integrity; a BSOD is a failure of that integrity. Therefore the OS is responsible, and only the OS.
(Example of a BSOD bug that your application alone could expose: a virus scanner implemented as a driver, that crashes when executing a file from sector 0xFFFFFFFF, a sector that on this one machine just happens to contain one DLL of your application)
I had problems when exit my app without stopping all the processes and BD connections when the program ends (I crashed the entire IDE). I place the "stopping and disconnecting" code in the "Terminate" of "Form_Closed" event of my main form and the problem wa solved, I don't know it this is your situation.
Another problem can be if the user is trying to access the same resources your app is using (databases, hardware, sockets, etc). Ask him/her about what apps he/she is using when the BSOD happens.
A virus can't be discarded.

How can I use up RAM quickly to test garbage collection?

Windows Server 2008. How can I quickly use up RAM so to induce GC in my app. If there is a way to do it without needing Visual Studio or installing a language runtime it would be good.
EDIT: I don't want to have to write an app and then copy it over to the server. I'm looking for a way to do it quickly without writing an app that requires an IDE or installation of a runtime/compiler.
Perhaps a powershell or batch script?...
I don't think using up RAM outside your process is going to necessarily trigger GC.
If I understand your question correctly, you have a program Foo.exe that is written in some unknown language, running on some unknown runtime (are you not allowed to post the details for some reason, or do you just not know?), and you want to try to get that program's runtime to trigger a garbage collection. However, you want to do this by using up RAM outside of foo.exe.
You could do this by creating a simple batch file that just started up a hundred copies of IE or Word or whatever program you want. However, I don't think that will do what you want it to do. If your process has already allocated a certain amount of memory, it won't necessarily give that memory up or trigger GC just because other processes are being started. It may page to disk, or may force other programs to page to disk. But not all Garbage Collectors are alike, so we can't really help without more details. I'm pretty sure some VM's never give back memory once they've allocated it, even after GC.
You could run your program inside a virtual machine such as Virtual Box, where you specify the memory ceiling of the guest operating system.
I'm having trouble imagining a scenario where this would be necessary though. Could you provide more information about the problem?
If you are using java you can specify the max amount of memory using Xmx. Search for JVM memory setting

Resources