Restore memory dump - debugging

If in testing on a computer without a debugger, say a client's computer, I encounter a bug that may have corrupted the state of the program but not actually crashed it, I know I can take a memory dump using the Windows Task Manager (right click on process name, create dump file).
I can use these with WinDbg to peek around in memory, etc., but what would be most useful to me is to be able to restore the dump into memory so that I can continue interacting with the program. Is this possible? If so, how? Is there a tool that can restore it or do I need to write my own.

The typical usermode dumps or minidumps do not contain enough information to do so. While they contain all usermode memory, they do not contain kernel memory, so open handles to kernel resources like files or network sockets will not be included in the dump (and even if they were, the hard disk has most likely changed so just trying to write to the hard disk may corrupt your system even more).
The only way I see to restore a memory dump is restoring the full memory and all other state like hard disk state, which can be done with most virtual machine software (which will, however, disconnect all your network connections on restore; gratefully most programs can handle lost network connectsions better than lost file handles).

I discovered that I could do this with Hyper-V snapshots. If I run my program in a virtual machine, I can optionally dump the memory, create a snapshot, transfer the dump if necessary, come back some time later, restore the snapshot and continue the program.

Related

Detect unclean filesystem shutdown

I have a project where we manipulate large amounts of cached data using memory mapped files. We use Windows 10, NTFS and .NET.
When the user starts the application, we detect if the previous program session was shutdown correctly, and if so we reuse the cache.
However, this is a pain for developers when debugging. It's quite common to just stop the program being debugged. At next startup, the cached data needs to be recalculated, which takes time and is annoying.
So, we've been thinking we could introduce a 'transaction log', so that we can recover even if the previous shutdown was unclean.
Now for the actual problem.
There seems to be no guarantees in which order memory mapped files are flushed. In case the program is just stopped, there is no problem, since the entire memory mapped file will be flushed to disk by the operating system. The problem comes if there is a power cut. In this case, there are no guarantees what state the file is in. Our "transaction log" doesn't help either, unless we always flush the transaction log to disk before modifying the cache. This would defeat the purpose of our architecture, since it would introduce unacceptable performance penalties.
If we could somehow know that our memory mapped file on disk was previously left in a state where the OS didn't manange to flush all pages before operating system shutdown, we could just throw the entire file away at next startup. There would be a delay, but it would be totally acceptable since it would only occur after a power cut or similar event.
When the operating system boots, it knows that the file is possibly corrupt, because it knows the filesystem was not cleanly unmounted.
And finally, my question:
Is there some way to ask Windows if the file system was clean when it was mounted?
NTFS periodically commits its own logs and so there's a window in which a power fail could occur and NTFS would (correctly) state that the volume (as in, "NTFS DATA" not user data) is clean.
You will likely have to do what databases do which is to lock your cache into physical memory so that you can control the writes-to-disk.

When does a memory dump get written after BSOD?

I have a machine that periodically BSOD's. I have full memory dumps configured. Sometimes, when rebooting AFTER the BSOD, the machine sits on a black screen for several minutes, then finally comes up to windows. My colleague just recently found out that during this time, the machine was writing the full memory dump to disk.
He identified this by shutting the machine down on the BSOD, plugging the drive in as a slave, seeing that no dump file was present, plugging it back into a machine, seeing the black screen for several minutes, and then finding the memory dump on disk.
So my question is, how does this work, internally? I swear I've seen the BSOD itself telling me that it is currently writing the dump file to disk, with a counter.
What's happening is that the BSOD overwrites the pagefile with the full contents of memory in order to avoid doing any complicated processing after a critical system error (bear in mind that the cause of the BSOD could have damaged any heaps, code, unloaded drivers etc, so the BSOD basically can't rely on anything). When the system reboots it discovers that the pagefile's been marked as a BSOD, and then sets about converting the RAW dump to a minidump file that can be analysed either by Microsoft's crash-reporting server or by a driver developer analysing the crash dump in WinDbg or Visual Studio.
While I don't know for sure, its possible that it has to write RAM to disk while displaying the screen, but when it reboots, it pulls the rest of the process memory space out of the swap file to create the full core dump.
This is the first time I hear something like that. I though the dump was always writting while the BSOD is shown. You can try to connect the kernel debugger in verbose mode and figure out what's happening.

Locking sharable memory

Is there away to page into memory another process’s entire image? In a couple of weeks, our IT staff will be replacing all of the "core" network switches. This will bring down the network. This will be done after normal business hours. During this time, several users will still be using a program that I have written. It will be a nightmare to install local copies of my program on each user's machine. The program normally runs from a network share. The only time the program will access the network is when the program executes its executable (image) code. How can I get the Windows Memory Manager to load the entire image into memory and hold it "lock" there until the network is back online?
You can relink your program with the /swaprun:net option:
http://msdn.microsoft.com/en-us/library/w0628bwh.aspx
You could write it so that it copies itself locally to temp directory and then have it run that copy as a separate process, and then kill itself(the first copy). I've done this little juggling act before, but it depends on how your program works whether or not it will like being run from the temp directory.
This isn't going to work.
Windows doesn't necessarily load a 'static' copy of the executable into memory, it's free to shuffle chunks around and page parts in and out. Often it loads resources (images, strings, etc.) from the executable after the program has started running. It often loads external libraries dynamically as well.
Edited to add:
There is no such thing as "a process's entire image". Every thread, for example, gets its own allocation.
Maybe you should explain why running from a different location (i.e., a local copy of the binary) won't work for you.

How can I use up RAM quickly to test garbage collection?

Windows Server 2008. How can I quickly use up RAM so to induce GC in my app. If there is a way to do it without needing Visual Studio or installing a language runtime it would be good.
EDIT: I don't want to have to write an app and then copy it over to the server. I'm looking for a way to do it quickly without writing an app that requires an IDE or installation of a runtime/compiler.
Perhaps a powershell or batch script?...
I don't think using up RAM outside your process is going to necessarily trigger GC.
If I understand your question correctly, you have a program Foo.exe that is written in some unknown language, running on some unknown runtime (are you not allowed to post the details for some reason, or do you just not know?), and you want to try to get that program's runtime to trigger a garbage collection. However, you want to do this by using up RAM outside of foo.exe.
You could do this by creating a simple batch file that just started up a hundred copies of IE or Word or whatever program you want. However, I don't think that will do what you want it to do. If your process has already allocated a certain amount of memory, it won't necessarily give that memory up or trigger GC just because other processes are being started. It may page to disk, or may force other programs to page to disk. But not all Garbage Collectors are alike, so we can't really help without more details. I'm pretty sure some VM's never give back memory once they've allocated it, even after GC.
You could run your program inside a virtual machine such as Virtual Box, where you specify the memory ceiling of the guest operating system.
I'm having trouble imagining a scenario where this would be necessary though. Could you provide more information about the problem?
If you are using java you can specify the max amount of memory using Xmx. Search for JVM memory setting

let's say I am writing my code and then my PC died, how necessary is it to do a complete scan if i don't want my later source code to be contaminated?

let's say I am writing a Ruby on Rails program and while editing a file, the machine blue screened. in this case, how necessary is it to re-scan the whole hard drive if I don't want my future files to be damaged?
Let's say if the OS is deleting a tmp file at the moment when my computer crashed, and still have some pointers to some sector on the hard drive. and if my newly created files happen to be in those sector, and next time the OS clean up files again, it may think that the "left-over" sector wasn't cleaned last time and clean it again, and damaging our source code. (esp with Ruby on Rails, where the source code could be generated by rails and not by us, and we may not know why our rails server doesn't work, if a file is affected). we can rely on SVN, but what if the file is affected before we check it in?
i think the official answer will be: "always scan the disk after a crash or power outage, for the data and even the space and indicate attempt to fix any bad sector", but the thing is, nowadays with the hard drive so big, it could take 2 hours to scan everything. And especially at work, we cannot wait for 2 hours if it is the middle of the day.
Does someone know if the modern OS, like XP, Vista, Mac OS, and Linux (when sometimes the power cord was loose and it didn't shut down properly and just shut down on 0% battery), with these modern OS, are our source code safe? Do they know how to structure to write to sector so that at most it will waste sector instead of overlapping sectors?
With a modern journaling file system (ext3/4, NTFS), the only problem would be that a file could be in a "half-written" state. Obviously scanning is not going to help this (that's what backups are for). The file system itself could not be corrupted. If you are using something like FAT, then yes, you should worry about this.
There's really only 1 issue here.
Is any file currently being written in some kind of "half written" state.
The primary cause of this would be if the application/editor is writing the file and the machine dies halfway through. In this case, the file be written is, well, half done. If it was over writing the original file, the original file is "gone", and the new one is "half done". If you don't have a back up file, then, well, you have a problem.
As far as a file having dangling pointers, or references to sectors not written, or somesuch thing. That problem depends on your file system.
The major, modern files ystems are journaled and "won't allow" this to happen. You may have a "half written", but that's because the application only got to write half of it, rather than the file system losing track of a sector pointer.
If you're playing file system games for performance, or whatever (such as using a UFS without logging), then you would want to run a fschk to clean up the file systems meta data.
But if you're using a modern operating system and file system (i.e. anything from the past 5 years), you won't have this problem.
Finally, if you do have version control running, then just do an "svn status", it will show you any "corrupted" files as they will have changed and it will detect that as well.
i see some information on
http://en.wikipedia.org/wiki/Journaling_file_system
Journalized file systems
File systems may provide journaling, which provides safe recovery in the event of a system crash. A journaled file system writes some information twice: first to the journal, which is a log of file system operations, then to its proper place in the ordinary file system. Journaling is handled by the file system driver, and keeps track of each operation taking place that changes the contents of the disk. In the event of a crash, the system can recover to a consistent state by replaying a portion of the journal. Many UNIX file systems provide journaling including ReiserFS, JFS, and Ext3.
In contrast, non-journaled file systems typically need to be examined in their entirety by a utility such as fsck or chkdsk for any inconsistencies after an unclean shutdown. Soft updates is an alternative to journaling that avoids the redundant writes by carefully ordering the update operations. Log-structured file systems and ZFS also differ from traditional journaled file systems in that they avoid inconsistencies by always writing new copies of the data, eschewing in-place updates.

Resources