Performance Counter for Memory Mapped Files - memory-management

When using memory mapped files I'm getting in situations where Windows stalls since new memory is allocated and processed faster than it can be written to disk using memory mappedfiles.
The only solution I see is to throttle my processing while the MiMappedPageWriter and the KeBalanceSetManager are doing their jobs. I would be completely fine if the application is running slower instead of a complete OS freeze.
It already helped to use SetWorkingSetSizeEx using a hard limit, because the MiMappedPageWriter is starting earlier to page-out to disk, but still on some drives the data is allocated faster. For example an SSD with 250MB/s does not manage it, but with 500MB/s it is getting better. But I have to support a wide range of hardware and cannot rely on fast drives.
I found that there once was a performance counter, for example: Memory\Mapped File Bytes Written/sec, that I could use not monitor periodically (see: https://docs.microsoft.com/en-us/windows-server/management/windows-performance-monitor/memory-performance-counter-mapped-file-bytes-written-sec) but it seems that all the links have gone.
I have searched on many places, but couldn't find the performance counters for this.
Is there still a source for this?

Related

How to limit Hard Drive Disk I/O when reading/writing a file on disk?

I have a few Rust programs that read data from a file, do some operations, and write data on another file.
Simple enough, but I've been having a big issue in that my programs saturate the HDD max I/O and can only be executed when no other process is in use.
To be more precise, I'm currently using BufReader and BufWriter with a buffer size of 64 KB which is fantastic in and of itself to read/write a file as quickly as possible. But reading at 250MB/s and writing at the same time at 250MB/s has a tendency to overflow what the HDD can manage. Suffice to say that I'm all for speed and whatnot, but I realized that those Rust programs are asking for too much resources from the HDD and seems to be stalled by the Operating System (Windows) to let other processes work in peace. The files I'm reading/writing are generally a few Gigabytes
Now I know I could just add some form of wait() between each read/write operation on the disk but, I don't know how to find out at which speed I'm currently reading/writing and am looking for a more optimal solution. Plus even after reading the docs, I still can't find an option on BufReader/BufWriter that could limit HDD I/O operations to some arbitrary value (let's say 100MB/s for example).
I looked through the sysinfo crate but it does not seem to help in finding out current and maximum I/O for the HDD.
Am I out of luck and should I delve deeper in systems programming to find a solution ? Or is there already something that might teach how to prioritize my calls to the HDD or to simply limit my calls to some arbitrary value calculated from the currently available I/O rate of the HDD ?
After reading a bit more on the subject, apart from trying to read/write a lot of data and calculate from its performance, it seems like you can't find out HDD max I/O rate during the execution of the program and can only guess a constant at which HDD I/O rate can't go higher. (see https://superuser.com/questions/795483/how-to-limit-hdd-write-speed-for-chosen-programs/795488#795488)
But, you can still monitor disk activity, and with the number guessed earlier, you can use wait() more accurately than always limiting yourself at a constant speed. (here is a crate for Rust : https://github.com/myfreeweb/systemstat).
Prioritizing the process with the OS might be overkill since I'm trying to slip between other processes and share whatever resources are available at that time.

Pre-warm disk cache

After some theoretical discussion today I decided to do some research, but I did not find anything conclusive.
Here's the problem:
We have written a tool that reads around 10Gb of image files from a data set of several terabytes. We want to speed up the execution time by minimizing I/O overhead. The idea would be to "pre-warm" the disk cache, as we known beforehand what directory we will be reading from as the tool executes. Is there any API or method to give this hint to Windows so that it can start pre-warming the disk cache, speeding up future disk access as the files are already in RAM (of which there is plenty on the machines we run the tool on)?
I know Windows does readahead on a single file, but what if I have a directory with thousands of files?
I haven't found any direct win32 APIs or command line tools to do this directly.
What if I start a low priority background thread, opening all the files for reading and closing them?
I could of course memory map all the files and pin them in RAM, but that would probably run the risk of starving the main worker thread of I/O.
The general idea here is that the tool "bursts" I/O requests, as each thread will do I/O and CPU processing in sequence, hence we could use the "idle" I/O time to preload the remaining files into RAM.
(I could of course benchmark, and I will, but I would like to understand a bit more of how this works in order to be more scientific and less cargo culty).

What happens when I try to write a file ( say text) that consumes more memory than the RAM?

Suppose I open notepad ( not necessarily ) and write a text file of 6 GB ( again, suppose) . I have no running processes other than notepad itself, and the memory assigned to the user processes has a limit of less than 6 GB. My disc memory is sufficient though.
What happens to the file now? I know that writing is definitely possible and virtual memory may get involved , but I am not sure how. Does virtual memory actually get involved? Either way, can you please explain what happens from an OS point of view?
Thanks
From the memory point of view, the notepad allocates a 6Gb buffer in memory to store the text you're seeing. The process consists of data segment (which includes the buffer above, but not only) and a code segment (the notepad native code), so the total process space is going to be larger than 6Gb.
Now, it's all virtual memory as far the process is concerned (yes, it is involved). If I understand your case right, the process won't fit into physical memory, so it's going to crash due to insufficient memory.
the memory assigned to the user processes has a limit of less than 6 GB.
If this is a hard limit enforced by the operating system, it may at it's own discretion kill the process with some error message. It may also do anything else it wants, depending on it's implementation. This part of the answer disregards virtual memory or any other way of swapping RAM to disk.
My disc memory is sufficient though. What happens to the file now? I know that writing is definitely possible and virtual memory may get involved , but I am not sure how.
At this point, when your question starts involving the disk, we can start talking about virtual memory and swapping. If virtual memory is involved and the 6GB limit is in RAM usage, not total virtual memory usage, parts of the file can be moved to disk. This could be parts of the file currently out of view on screen or similar. The OS then manages what parts of the (more than 6GB of) data is available in RAM, and swaps in/out data depending on what the program needs (i.e. where in the file you are working).
Does virtual memory actually get involved?
Depends on weather it is enabled in the OS in question or not, and how it is configured.
Yes a lot of this depends on the OS in question and it's implementation and how it handles cases like this. If the OS is poorly written it may crash itself.
Before I give you a precise answer, let me explain few things.
I could suggest you to open Linux System Monitor or Windows Task Manager, and then open heavy softwares like a game, Android Studio, IntelliJ e.t.c
Go to the memory visualization tap. You will notice that each of the applications( processes) consume a certain amount of memory. Okey fine!
Some machines if not most, support memory virtualization.
It's a concept of allocating a certain amount of the hard disk space as a back up plan just in case some application( process) consumes a lot of memory and if it is not active at times, then it gets moved from the main memory to the virtual memory to create a priority for other tasks that are currently busy.
A virtual memory is slower that the main memory as it is located in the hard disk. However is it still faster than fetching data directly from the disk.
When launching any application, not all its application size will be loaded to memory, but only the necessary files that are required at that particular time will be loaded to memory. Thus why you can play a 60GB game in a machine that has a 4GB RAM.
To answer your question:
If you happen to launch a software that consumes all the memory resources of your machine, your machine will freeze. You will even hear the sounds made by its cooling system. It will be louder and faster.
I hope I clarified this well

What pitfalls should I be wary of when memory mapping BIG files?

I have a bunch of big files, each file can be over 100GB, the total amount of data can be 1TB and they are all read-only files (just have random reads).
My program does small reads in these files on a computer with about 8GB main memory.
In order to increase performance (no seek() and no buffer copying) i thought about using memory mapping, and basically memory-map the whole 1TB of data.
Although it sounds crazy at first, as main memory << disk, with an insight on how virtual memory works you should see that on 64bit machines there should not be problems.
All the pages read from disk to answer to my read()s will be considered "clean" from the OS, as these pages are never overwritten. This means that all these pages can go directly to the list of pages that can be used by the OS without writing back to disk OR swapping (wash them). This means that the operating system could actually store in physical memory just the LRU pages and would operate just reads() when the page is not in main memory.
This would mean no swapping and no increase in i/o because of the huge memory mapping.
This is theory; what I'm looking for is any of you who has every tried or used such an approach for real in production and can share his experience: are there any practical issues with this strategy?
What you are describing is correct. With a 64-bit OS you can map 1TB of address space to a file and let the OS manage reading and writing to the file.
You didn't mention what CPU architecture you are on but most of them (including amd64) the CPU maintains a bit in each page table entry as to whether data in the page has been written to. The OS can indeed use that flag to avoid writing pages that haven't been modified back to disk.
There would be no increase in IO just because the mapping is large. The amount of data you actually access would determine that. Most OSes, including Linux and Windows, have a unified page cache model in which cached blocks use the same physical pages of memory as memory mapped pages. I wouldn't expect the OS to use more memory with memory mapping than with cached IO. You're just getting direct access to the cached pages.
One concern you may have is with flushing modified data to disk. I'm not sure what the policy is on your OS specifically but the time between modifying a page and when the OS will actually write that data to disk may be a lot longer than your expecting. Use a flush API to force the data to be written to disk if it's important to have it written by a certain time.
I haven't used file mappings quite that large in the past but I would expect it to work well and at the very least be worth trying.

Memory mapped files causes low physical memory

I have a 2GB RAM and running a memory intensive application and going to low available physical memory state and system is not responding to user actions, like opening any application or menu invocation etc.
How do I trigger or tell the system to swap the memory to pagefile and free physical memory?
I'm using Windows XP.
If I run the same application on 4GB RAM machine it is not the case, system response is good. After getting choked of available physical memory system automatically swaps to pagefile and free physical memory, not that bad as 2GB system.
To overcome this problem (on 2GB machine) attempted to use memory mapped files for large dataset which are allocated by application. In this case virtual memory of the application(process) is fine but system cache is high and same problem as above that physical memory is less.
Even though memory mapped file is not mapped to process virtual memory system cache is high. why???!!! :(
Any help is appreciated.
Thanks.
If your data access pattern for using the memory mapped file is sequential, you might get slightly better page recycling by specifying the FILE_FLAG_SEQUENTIAL_SCAN flag when opening the underlying file. If your data pattern accesses the mapped file in random order, this won't help.
You should consider decreasing the size of your map view. That's where all the memory is actually consumed and cached. Since it appears that you need to handle files that are larger than available contiguous free physical memory, you can probably do a better job of memory management than the virtual memory page swapper since you know more about how you're using the memory than the virtual memory manager does. If at all possible, try to adjust your design so that you can operate on portions of the large file using a smaller view.
Even if you can't get rid of the need for full random access across the entire range of the underlying file, it might still be beneficial to tear down and recreate the view as needed to move the view to the section of the file that the next operation needs to access. If your data access patterns tend to cluster around parts of the file before moving on, then you won't need to move the view as often. You'll take a hit to tear down and recreate the view object, but since tearing down the view also releases all the cached pages associated with the view, it seems likely you'd see a net gain in performance because the smaller view significantly reduces memory pressure and page swapping system wide. Try setting the size of the view based on a portion of the installed system RAM and move the view around as needed by your file processing. The larger the view, the less you'll need to move it around, but the more RAM it will consume potentially impacting system responsiveness.
As I think you are hinting in your post, the slow response time is probably at least partially due to delays in the system while the OS writes the contents of memory to the pagefile to make room for other processes in physical memory.
The obvious solution (and possibly not practical) is to use less memory in your application. I'll assume that is not an option or at least not a simple option. The alternative is to try to proactively flush data to disk to continually keep available physical memory for other applications to run. You can find the total memory on the machine with GlobalMemoryStatusEx. And GetProcessMemoryInfo will return current information about your own application's memory usage. Since you say you are using a memory mapped file, you may need to account for that in addition. For example, I believe the PageFileUsage information returned from that API will not include information about your own memory mapped file.
If your application is monitoring the usage, you may be able to use FlushViewOfFile to proactively force data to disk from memory. There is also an API (EmptyWorkingSet) that I think attempts to write as many dirty pages to disk as possible, but that seems like it would very likely hurt performance of your own application significantly. Although, it could be useful in a situation where you know your application is going into some kind of idle state.
And, finally, one other API that might be useful is SetProcessWorkingSetSizeEx. You might consider using this API to give a hint on an upper limit for your application's working set size. This might help preserve more memory for other applications.
Edit: This is another obvious statement, but I forgot to mention it earlier. It also may not be practical for you, but it sounds like one of the best things you might do considering that you are running into 32-bit limitations is to build your application as 64-bit and run it on a 64-bit OS (and throw a little bit more memory at the machine).
Well, it sounds like your program needs more than 2GB of working set.
Modern operating systems are designed to use most of the RAM for something at all times, only keeping a fairly small amount free so that it can be immediately handed out to processes that need more. The rest is used to hold memory pages and cached disk blocks that have been used recently; whatever hasn't been used recently is flushed back to disk to replenish the pool of free pages. In short, there isn't supposed to be much free physical memory.
The principle difference between using a normal memory allocation and memory mapped a files is where the data gets stored when it must be paged out of memory. It doesn't necessarily have any effect on when the memory will be paged out, and will have little effect on the time it takes to page it out.
The real problem you are seeing is probably not that you have too little free physical memory, but that the paging rate is too high.
My suggestion would be to attempt to reduce the amount of storage needed by your program, and see if you can increase the locality of reference to reduce the amount of paging needed.

Resources