When developing a Windows application for x64, then the user address space on Windows Vista and Windows 7 x64 is 8TB.
Let's say that I have an application which consumes significantly less than available physical memory (500MB-1GB) in normal working set, and in addition, I have much more than that (let's say 3GB-4GB) in distinct chunks (much smaller than the remaining memory size- let's say 100MB) which are to be loaded exclusively. Of course, whilst technically, I could easily fit an extra 4GB in the address space, the reality is that most of it would have to be paged, except on the higher-end computers which have 6-8GB of RAM.
The question is, am I going to destroy the computer's performance by exhausting the page file by consuming very large amounts of page memory for a single application? Or, equivalently, what would be an appropriate maximum for the amount of memory that I can put into page file?
In addition, would this actually increase my performance on the higher end of machines, as opposed to just loading the data manually from the associated files at the appropriate time?
If your distinct chunks are already stored as files, then treat them as memory mapped files. By doing so, your application doesn't have to manage reading / writing the data. Furthermore (and germane to your question), the data is backed by your files on disk and not the system's page file.
It is up to the operating system to manage the system's resources. Current page files are usually only constrained by available disk space. The OS will manage the system's performance by balancing the physical allocations given to each process. Physical memory allocation and not page file use is more likely to cause performance issues in other running applications.
You may consider providing a setting to toggle how much memory your application will use in case any customers see adverse performance effects.
Related
I am intending to write a program to create huge relational networks out of unstructured data - the exact implementation is irrelevant but imagine a GPT-3-style large language model. Training such a model would require potentially 100+ gigabytes of available random access memory as links get reinforced between new and existing nodes in the graph. Only a small portion of the entire model would likely be loaded at any given time, but potentially any region of memory may be accessed randomly.
I do not have a machine with 512 Gb of physical RAM. However, I do have one with a 512 Gb NVMe SSD that I can dedicate for the purpose. I see two potential options for making this program work without specialized hardware:
I can write my own memory manager that would swap pages between "hot" resident memory and "cold" on the hard disk, probably using memory-mapped files or some similar construct. This would require me coding all memory accesses in the modeling program to use this custom memory manager, and coding the page cache and concurrent access handlers and all of the other low-level stuff that comes along with it, which would take days and very likely introduce bugs. Also performance would likely be poor. Or,
I can configure the operating system to use the entire SSD as a page file / SWAP filesystem, and then just have the program reserve as much virtual memory as it needs - the same as any other normal program, relying on the kernel's memory manager which is already doing the page mapping + swapping + caching for me.
The problem I foresee with #2 is making the operating system understand what I am trying to do in a "cooperative" way. Ideally I would like to hint to the OS that I would only like a specific fraction of resident memory and swap the rest, to keep overall system RAM usage below 90% or so. Otherwise the OS will allocate 99% of physical RAM and then start aggressively compacting and cutting down memory from other background programs, which ends up making the whole system unresponsive. Linux apparently just starts sacrificing entire processes if it gets too bad.
Does there exist a kernel command in any language or operating system that would let me tell the OS to chill out and proactively swap user memory to disk? I have looked through VMM functions in kernel32.dll and the Linux paging and swap daemon (kswapd) documentation, but nothing looks like what I need. Perhaps some way to reserve, say, 1Gb of pages and then "donate" them back to the kernel to make sure they get used for processes that aren't my own? Some way to configure memory pressure or limits or make kswapd work more aggressively for just my process?
Suppose I open notepad ( not necessarily ) and write a text file of 6 GB ( again, suppose) . I have no running processes other than notepad itself, and the memory assigned to the user processes has a limit of less than 6 GB. My disc memory is sufficient though.
What happens to the file now? I know that writing is definitely possible and virtual memory may get involved , but I am not sure how. Does virtual memory actually get involved? Either way, can you please explain what happens from an OS point of view?
Thanks
From the memory point of view, the notepad allocates a 6Gb buffer in memory to store the text you're seeing. The process consists of data segment (which includes the buffer above, but not only) and a code segment (the notepad native code), so the total process space is going to be larger than 6Gb.
Now, it's all virtual memory as far the process is concerned (yes, it is involved). If I understand your case right, the process won't fit into physical memory, so it's going to crash due to insufficient memory.
the memory assigned to the user processes has a limit of less than 6 GB.
If this is a hard limit enforced by the operating system, it may at it's own discretion kill the process with some error message. It may also do anything else it wants, depending on it's implementation. This part of the answer disregards virtual memory or any other way of swapping RAM to disk.
My disc memory is sufficient though. What happens to the file now? I know that writing is definitely possible and virtual memory may get involved , but I am not sure how.
At this point, when your question starts involving the disk, we can start talking about virtual memory and swapping. If virtual memory is involved and the 6GB limit is in RAM usage, not total virtual memory usage, parts of the file can be moved to disk. This could be parts of the file currently out of view on screen or similar. The OS then manages what parts of the (more than 6GB of) data is available in RAM, and swaps in/out data depending on what the program needs (i.e. where in the file you are working).
Does virtual memory actually get involved?
Depends on weather it is enabled in the OS in question or not, and how it is configured.
Yes a lot of this depends on the OS in question and it's implementation and how it handles cases like this. If the OS is poorly written it may crash itself.
Before I give you a precise answer, let me explain few things.
I could suggest you to open Linux System Monitor or Windows Task Manager, and then open heavy softwares like a game, Android Studio, IntelliJ e.t.c
Go to the memory visualization tap. You will notice that each of the applications( processes) consume a certain amount of memory. Okey fine!
Some machines if not most, support memory virtualization.
It's a concept of allocating a certain amount of the hard disk space as a back up plan just in case some application( process) consumes a lot of memory and if it is not active at times, then it gets moved from the main memory to the virtual memory to create a priority for other tasks that are currently busy.
A virtual memory is slower that the main memory as it is located in the hard disk. However is it still faster than fetching data directly from the disk.
When launching any application, not all its application size will be loaded to memory, but only the necessary files that are required at that particular time will be loaded to memory. Thus why you can play a 60GB game in a machine that has a 4GB RAM.
To answer your question:
If you happen to launch a software that consumes all the memory resources of your machine, your machine will freeze. You will even hear the sounds made by its cooling system. It will be louder and faster.
I hope I clarified this well
I have a bunch of big files, each file can be over 100GB, the total amount of data can be 1TB and they are all read-only files (just have random reads).
My program does small reads in these files on a computer with about 8GB main memory.
In order to increase performance (no seek() and no buffer copying) i thought about using memory mapping, and basically memory-map the whole 1TB of data.
Although it sounds crazy at first, as main memory << disk, with an insight on how virtual memory works you should see that on 64bit machines there should not be problems.
All the pages read from disk to answer to my read()s will be considered "clean" from the OS, as these pages are never overwritten. This means that all these pages can go directly to the list of pages that can be used by the OS without writing back to disk OR swapping (wash them). This means that the operating system could actually store in physical memory just the LRU pages and would operate just reads() when the page is not in main memory.
This would mean no swapping and no increase in i/o because of the huge memory mapping.
This is theory; what I'm looking for is any of you who has every tried or used such an approach for real in production and can share his experience: are there any practical issues with this strategy?
What you are describing is correct. With a 64-bit OS you can map 1TB of address space to a file and let the OS manage reading and writing to the file.
You didn't mention what CPU architecture you are on but most of them (including amd64) the CPU maintains a bit in each page table entry as to whether data in the page has been written to. The OS can indeed use that flag to avoid writing pages that haven't been modified back to disk.
There would be no increase in IO just because the mapping is large. The amount of data you actually access would determine that. Most OSes, including Linux and Windows, have a unified page cache model in which cached blocks use the same physical pages of memory as memory mapped pages. I wouldn't expect the OS to use more memory with memory mapping than with cached IO. You're just getting direct access to the cached pages.
One concern you may have is with flushing modified data to disk. I'm not sure what the policy is on your OS specifically but the time between modifying a page and when the OS will actually write that data to disk may be a lot longer than your expecting. Use a flush API to force the data to be written to disk if it's important to have it written by a certain time.
I haven't used file mappings quite that large in the past but I would expect it to work well and at the very least be worth trying.
I have a 2GB RAM and running a memory intensive application and going to low available physical memory state and system is not responding to user actions, like opening any application or menu invocation etc.
How do I trigger or tell the system to swap the memory to pagefile and free physical memory?
I'm using Windows XP.
If I run the same application on 4GB RAM machine it is not the case, system response is good. After getting choked of available physical memory system automatically swaps to pagefile and free physical memory, not that bad as 2GB system.
To overcome this problem (on 2GB machine) attempted to use memory mapped files for large dataset which are allocated by application. In this case virtual memory of the application(process) is fine but system cache is high and same problem as above that physical memory is less.
Even though memory mapped file is not mapped to process virtual memory system cache is high. why???!!! :(
Any help is appreciated.
Thanks.
If your data access pattern for using the memory mapped file is sequential, you might get slightly better page recycling by specifying the FILE_FLAG_SEQUENTIAL_SCAN flag when opening the underlying file. If your data pattern accesses the mapped file in random order, this won't help.
You should consider decreasing the size of your map view. That's where all the memory is actually consumed and cached. Since it appears that you need to handle files that are larger than available contiguous free physical memory, you can probably do a better job of memory management than the virtual memory page swapper since you know more about how you're using the memory than the virtual memory manager does. If at all possible, try to adjust your design so that you can operate on portions of the large file using a smaller view.
Even if you can't get rid of the need for full random access across the entire range of the underlying file, it might still be beneficial to tear down and recreate the view as needed to move the view to the section of the file that the next operation needs to access. If your data access patterns tend to cluster around parts of the file before moving on, then you won't need to move the view as often. You'll take a hit to tear down and recreate the view object, but since tearing down the view also releases all the cached pages associated with the view, it seems likely you'd see a net gain in performance because the smaller view significantly reduces memory pressure and page swapping system wide. Try setting the size of the view based on a portion of the installed system RAM and move the view around as needed by your file processing. The larger the view, the less you'll need to move it around, but the more RAM it will consume potentially impacting system responsiveness.
As I think you are hinting in your post, the slow response time is probably at least partially due to delays in the system while the OS writes the contents of memory to the pagefile to make room for other processes in physical memory.
The obvious solution (and possibly not practical) is to use less memory in your application. I'll assume that is not an option or at least not a simple option. The alternative is to try to proactively flush data to disk to continually keep available physical memory for other applications to run. You can find the total memory on the machine with GlobalMemoryStatusEx. And GetProcessMemoryInfo will return current information about your own application's memory usage. Since you say you are using a memory mapped file, you may need to account for that in addition. For example, I believe the PageFileUsage information returned from that API will not include information about your own memory mapped file.
If your application is monitoring the usage, you may be able to use FlushViewOfFile to proactively force data to disk from memory. There is also an API (EmptyWorkingSet) that I think attempts to write as many dirty pages to disk as possible, but that seems like it would very likely hurt performance of your own application significantly. Although, it could be useful in a situation where you know your application is going into some kind of idle state.
And, finally, one other API that might be useful is SetProcessWorkingSetSizeEx. You might consider using this API to give a hint on an upper limit for your application's working set size. This might help preserve more memory for other applications.
Edit: This is another obvious statement, but I forgot to mention it earlier. It also may not be practical for you, but it sounds like one of the best things you might do considering that you are running into 32-bit limitations is to build your application as 64-bit and run it on a 64-bit OS (and throw a little bit more memory at the machine).
Well, it sounds like your program needs more than 2GB of working set.
Modern operating systems are designed to use most of the RAM for something at all times, only keeping a fairly small amount free so that it can be immediately handed out to processes that need more. The rest is used to hold memory pages and cached disk blocks that have been used recently; whatever hasn't been used recently is flushed back to disk to replenish the pool of free pages. In short, there isn't supposed to be much free physical memory.
The principle difference between using a normal memory allocation and memory mapped a files is where the data gets stored when it must be paged out of memory. It doesn't necessarily have any effect on when the memory will be paged out, and will have little effect on the time it takes to page it out.
The real problem you are seeing is probably not that you have too little free physical memory, but that the paging rate is too high.
My suggestion would be to attempt to reduce the amount of storage needed by your program, and see if you can increase the locality of reference to reduce the amount of paging needed.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Virtual memory was introduced to help run more programs with limited memory. But in todays environment of inexpensive RAM, is it still relevant?
Since there will be no disk access if it is disabled and all the programs will be memory resident, will it not improve the performance and program response times?
Is there any essential requirement for virtual memory in windows apart from running more programs as stated above? Something windows internal not known to us.
Some pedantry: virtual memory is not just the pagefile. The term encompasses a whole range of techniques that give the program the illusion that it has one single contiguous address space, some of which is the program's code, some of which is data, and some of which are DLLs or memory-mapped files.
So to your lead-in question: yes, virtual memory is required. It's what makes modern OS's work.
Don't disable virtual memory. 2GB is not nearly enough to even consider this. Regardless, you should always keep virtual memory on even if you do have enough since it will only ever be used when you actually need it. Much better to be safe than sorry since NOT having it active means you simply hit a wall, while having it active means your computer starts swapping to the hard drive but continues to run.
Yes, because it's the basis of all on-demand paging that occurs in a modern operating system, not just Windows.
Windows will always use all of your memory, if not for applications then for caching whatever you read from your hard drive. Because if that memory is not used, then you're throwing your investment in memory away. Basically, Windows uses your RAM as a big fat cache to your hard drives. And this happens all the time, as the relevant pages are only brought into main memory when you address the content of that page.
The question is really what is the use of a pagefile considering how much memory modern computers have and what's going on under the hood in the OS.
It's common for the Windows task manager to show not much physical memory being used, but, your having many page faults? Win32 will never allocate all it's physical memory. It always saves some for new resource needs. With a big pagefile vs small pagefile, Win32 will be slower to allocate physical memory to a process.
For a few days now I've been using a very small pagefile (200 MB fixed) in Vista with 3GB of addressable physical memory. I have had no crashes or problems. Haven't tried things like large video editing or many different processes open at once. I wouldn't recommend no pagefile since the OS can never shuffle pages around in physical memory leading to the development of holes. A large pagefile is fail-safe for people who wouldn't know how to manually increase the pagefile if a low memory warning pops up or the OS crashes.
Some points:
The kernel will use some of the physical memory and this will be shared through VM mapping with all other processes. Other processes will be in the remaining physical memory. VM makes each process see a 4GB mem space, the OS at the lower 2GB. Each process will need much less than the 4GB of physical memory, this amount is it's committed memory requirement. When programming, a malloc or new will reserve memory but not commit it. Things like the first write to the memory will commit it. Some memory is immedietely committed by the OS for each process.
Your question is really about using a page file, and not virtual memory, as kdgregory said. Probably the most important use for virtual memory is so that the OS can protect once process's memory from another processes memory, while still giving each process the illusion of a contiguous, flat virtual address space. The actual physical addresses can and will become fragmented, but the virtual addresses will appear contiguous.
Yes, virtual memory is vital. The page file, maybe not.
Grrr. Disk space is probably always going to be cheaper than RAM. One of my lab computers has 512MB of RAM. That used to be enough when I got it, but now it has slowed to a crawl swapping and I need to put more RAM in it. I am not running more software programs now than I was then, but they have all gotten more bloated, and they often spawn more "daemon" programs that just sit there doing nothing but wait for some event and use up memory. I look at my process list and the "in-memory" column for the file explorer is 40MB. For Firefox it's 162MB. Java's "update scheduler" jusched.exe uses another 3.6MB. And that's just the physical-memory, these numbers don't include the swap space.
So it's really important to save the quicker, more expensive memory for what can't be swapped out. I can spare tens of GB on my hard drive for swap space.
Memory is seen as cheap enough that the OS and many programs don't try to optimize any more. On the one hand it's great because it makes programs more maintainable and debuggable and quicker to develop. But I hate having to keep putting in more RAM into my computer.
A good explanation at
http://blogs.technet.com/markrussinovich/archive/2008/11/17/3155406.aspx
To optimally size your paging file you
should start all the applications you
run at the same time, load typical
data sets, and then note the commit
charge peak (or look at this value
after a period of time where you know
maximum load was attained). Set the
paging file minimum to be that value
minus the amount of RAM in your system
(if the value is negative, pick a
minimum size to permit the kind of
crash dump you are configured for). If
you want to have some breathing room
for potentially large commit demands,
set the maximum to double that number.
Virtual memory is much more than simply an extension of RAM. In reality, virtual memory is a system they virtualizes access to physical memory. Applications are presented with a consistent environment that is completely independent of RAM size. This offers a number of important advantages quite appart from the increased memory availabilty. Virtual memory is an integral part of the OS and cannot possibly be disabled.
The pagefile is NOT virtual memory. Many sources have claimed this, including some Microsoft articles. But it is wrong. You can disable the pagefile (not recommended) but this will not disable virtual memory.
Virtual mmeory has been used in large systems for some 40 years now and it is not going away anytime soon. The advantages are just too great. If virtual memory were eliminated all current 32 bit applications (and 64 bit as well) would become obsolete.
Larry Miller
Microsoft MCSA
Virtual memory is a safety net for situations when there is not enough RAM available for all running application. This was very common some time ago and today when you can have large amounts of system RAM it is less so.
Some say to leave page file alone and let it be managed by Windows. Some people say that even if you have large RAM keeping big pagefile cannot possibly hurt because it will not be used. That is not true since Windows does pre-emptive paging to prepare for spikes of memory demand. If that demand never comes this is just wasted HDD activity and we all know that HDD is the slowest component of any system. Pre-emptive paging with big enough RAM is just pointless and the only thing it does is to slow down any other disk activity that happens at the same time. Not to mention additional disk wear. Plus big page file means gigabytes of locked disk space.
A lot of people point to Mark Russinovich article to back up their strong belief that page file should not be disabled at any circumstances and so many clever people at Microsoft have thought it so thoroughly that we, little developers, should never question default Windows policy on page file size. But even Russinovich himself writes:
Set the paging file minimum to be that value (Peak Commit Charge) minus the amount of RAM in your system (if the value is negative, pick a minimum size to permit the kind of crash dump you are configured for).
So if you have large RAM amounts and your peek commit charge is never more than 50% of your RAM even when you open all your apps at once and then some, there is no need have page file at all. So in those situations 99.99% of time you will never need more memory than your RAM.
Now I am not advocating for disabling page file it but having it in size of your RAM or more is just waste of space and unnecessary activity that can slow down something else. Page file gives you a safety net in those rare (with plenty of RAM) situations when system does need more memory and to prevent it from getting out of memory which will most likely make your system unstable and unusable.
The only real need for page file is kernel dumps. If you need full kernel dumps you need at least 400 MB of paging file. But if you are happy with mini dumps, minimum is 16 MB. So to have best of both worlds which is
virtually no page file
safety net of virtual memory
I would suggest to configure Windows for mini kernel dumps, set minimum page file size to 16 MB and maximum to whatever you want. This way page file would be practically unused but would automatically expand after first out of memory error to prevent your system from being unusable. If you happen to have at least one out of memory issue you should of course reconsider your minimum size. If you really want to be safe make page file min. size 1 GB. For servers though you should be more careful.
Unfortunately, it is still needed because the windows operating system has a tendency to 'overcache'.
Plus, as stated above, 2GB isn't really enough to consider turning it off. Heck, I probably wouldn't turn it off until I had 8GB or more.
G-Man
Since there will be no disk access if it is disabled and all the programs will be memory resident, will it not improve the performance and program response times?
I'm not totally sure about other platforms, but I had a Linux machine where the swap-space had been accidently disabled. When a process used all available memory, the machine basically froze for 5 minutes, the load average went to absurd numbers and the kernel OOM killer kicked in and terminated several processes. Reenabling swap fixed this entirely.
I never experienced any unnecessary swapping to disc - it only happened when I used all the available memory. Modern OS's (even 5-10 year old Linux distros) deal with swap-space quite intelligently, and only use it when required.
You can probably get by without swap space, since it's quite rare to reach 4GB of memory usage with a single process. With a 64-bit OS and say 8GB of RAM it's even more rare.. but, there's really no point disabling swap-space, you don't gain much (if anything), and when you run out of physical memory without it, bad things happen..
Basically - any half-decent OS should only use disc-swap (or virtual-memory) when required. Disabling swap only stops the OS being able to fall back on it, which causes the OOM killer to strike (and thus data-loss when processes are terminated).