Imagine there's a memory-mapped file and the application is writing to it constantly. Eventually, Windows will probably flush that page to disk. How does Windows ensure that a stable snapshot of that page is flushed to disk?
Probably, the disk hardware is copying the memory into it's internal memory before writing it. That's not atomic. If the application writes randomly to that page the disk hardware might copy data that has never existed at any point in time.
Does this mean that memory mapped files might leave a page on disk in a state that has never actually existed? That could be a problem to consistency.
Or does Windows lock the page during flushing? That could be a problem because a write to that page might result in very high latency.
How does Windows ensure that a stable snapshot of that page is flushed to disk?
It doesn't need to. If the page doesn't get changed during the flush operation, the data is consistent. If the page does get changed during the flush operation, then the page is marked as dirty, so it will be flushed again in due course and the data that got written to the disk is ignored.
(Incidentally, the data is probably not copied internally. The system should normally be able to use DMA to transfer it directly to the physical device.)
Related
I have a project where we manipulate large amounts of cached data using memory mapped files. We use Windows 10, NTFS and .NET.
When the user starts the application, we detect if the previous program session was shutdown correctly, and if so we reuse the cache.
However, this is a pain for developers when debugging. It's quite common to just stop the program being debugged. At next startup, the cached data needs to be recalculated, which takes time and is annoying.
So, we've been thinking we could introduce a 'transaction log', so that we can recover even if the previous shutdown was unclean.
Now for the actual problem.
There seems to be no guarantees in which order memory mapped files are flushed. In case the program is just stopped, there is no problem, since the entire memory mapped file will be flushed to disk by the operating system. The problem comes if there is a power cut. In this case, there are no guarantees what state the file is in. Our "transaction log" doesn't help either, unless we always flush the transaction log to disk before modifying the cache. This would defeat the purpose of our architecture, since it would introduce unacceptable performance penalties.
If we could somehow know that our memory mapped file on disk was previously left in a state where the OS didn't manange to flush all pages before operating system shutdown, we could just throw the entire file away at next startup. There would be a delay, but it would be totally acceptable since it would only occur after a power cut or similar event.
When the operating system boots, it knows that the file is possibly corrupt, because it knows the filesystem was not cleanly unmounted.
And finally, my question:
Is there some way to ask Windows if the file system was clean when it was mounted?
NTFS periodically commits its own logs and so there's a window in which a power fail could occur and NTFS would (correctly) state that the volume (as in, "NTFS DATA" not user data) is clean.
You will likely have to do what databases do which is to lock your cache into physical memory so that you can control the writes-to-disk.
I did read a following article:
http://surbhistechmusings.blogspot.com/2016/05/neo4j-performance-tuning.html
I come across
We loaded the files into OS cache
It is about loading file (on local disk) into OS cache.
Could you explain me how to do such loading?
And tell me please if such cache is in-memory?
What is cause that it can help?
Actually this is done simply by reading files most of the time.
Actually modern OSes fill the unused parts of your RAM with filesystem caches. And this is done when apps open files : the first time data is read from disk but then it is (transparently) read from memory. This is why an app you've just closed seems to start much faster the next time you launch it provided you did not shut down your computer between the two launches.
Many software rely on this mechanism like Kafka, PostgresSQL and so on.
I am the author of that blog. This is what I had used https://hoytech.com/vmtouch/
You can read up, extensive literature around it.
The idea behind doing this is ,almost every lookup is a random disk read which is veryyy slow, however if the file is in the os cache, it becomes much much faster.
If in testing on a computer without a debugger, say a client's computer, I encounter a bug that may have corrupted the state of the program but not actually crashed it, I know I can take a memory dump using the Windows Task Manager (right click on process name, create dump file).
I can use these with WinDbg to peek around in memory, etc., but what would be most useful to me is to be able to restore the dump into memory so that I can continue interacting with the program. Is this possible? If so, how? Is there a tool that can restore it or do I need to write my own.
The typical usermode dumps or minidumps do not contain enough information to do so. While they contain all usermode memory, they do not contain kernel memory, so open handles to kernel resources like files or network sockets will not be included in the dump (and even if they were, the hard disk has most likely changed so just trying to write to the hard disk may corrupt your system even more).
The only way I see to restore a memory dump is restoring the full memory and all other state like hard disk state, which can be done with most virtual machine software (which will, however, disconnect all your network connections on restore; gratefully most programs can handle lost network connectsions better than lost file handles).
I discovered that I could do this with Hyper-V snapshots. If I run my program in a virtual machine, I can optionally dump the memory, create a snapshot, transfer the dump if necessary, come back some time later, restore the snapshot and continue the program.
I am currently on a mission loading files into pagecache, and I want to load locked files, too. The goal is nothing more than pro-actively keeping a dataset in RAM, reducing loading times within third party applications.
Shadow copies were my first thought on this, but unfortunately seem to have separated pagecaches.
So is there any way cheating around the exclusive lock mechanism? Like fetching file fragment location on disk, accessing whole disk and reading directly (which I fear is another separated pagecache, anyways)?
Or is there a very different approach to directing the pagecache, e.g. some Windows API that can be told to load a specific file into pagecache?
You can access locked files in Windows from kernel-mode driver, or using our RawDisk product. But for your task (speed up DB file access) this won't work right as Windows' filesystem cache size is limited (it won't accommodate GBs of data).
In general, if I were to develop a large software project (for small application the amount of work needed is just enormous) I'd do the following: create a virtual drive backed by in-memory storage, present the DB file to the application via that virtual disk and flush drive contents to the disk on change asynchronously. All of this should be done in kernel mode (this is where development time grows to 12-15 man-months of work).
In theory the same can be done using one of our Virtual Storage products, but going back into user mode for callback handling would eliminate all that you gain from moving the data into RAM.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to invalidate the file system cache?
I'm writing a disk intensive win32 program. The first time it runs, it runs a lot slower while it scans the user's folders using FindFirstFile()/FindNextFile().
How can I repeat this first time performance without rebooting? Is there any way to force the system to discard everything in its disk cache?
I know that if I were reading a single file, I can disable caching by passing the FILE_FLAG_NO_BUFFERING flag to a call to CreateFile(). But it doesn't seem possible to do this when searching for files.
Have you thought about doing it on a different volume, and dismounting / remounting the volume? That will cause the vast majority of everything to be re-read from disk (though the cache down there won't care).
You need to create enough memory pressure to cause the memory manager and cache manager to discard the previously caches results. For the cache manager, you could try to open a large (I.e. Bigger than physical ram) file with caching enabled and then read it backwards (to avoid any sequential I/o optimizations). The interactions between vm and cache manager are a little more complex and much more dependent on os version.
There are also caches on the controller (possibly, but unlikely) and on the disk drive itself (likely). There are specific IoCtls to flush this cache, but in my experience, disk firmware is untested in this arena.
Check out the Clear function of CacheSet by SysInternals.
You could avoid a physical reboot by using a virtual machine.
I tried all the methods in the answers, including CacheSet, but they would not work for FindFirstFile/FindNextfile(). Here is what worked:
Scanning files over the network. When scanning a shared drive, it seems that windows does not cache the folders, so it is slow every time.
The simplest way to make any algorithm slower is to insert calls to Sleep(). This can reveal lots of problems in multi-threaded code, and that is what I was really trying to do.