What is Working Set? - windows

I'm confused with the concept of Working Set ,while reading the Memory Management code of the Windows Research Kernel.

The "working set" is short hand for "parts of memory that the current algorithm is using" and is determined by which parts of memory the CPU just happens to access. It is totally automatic to you. If you are processing an array and storing the results in a table, the array and the table are your working set.
This is discussed because the CPU will automatically store accessed memory in cache, close to the processor. The working set is a nice way to describe the memory you want stored. If it is small enough, it can all fit in the cache and your algorithm will run very fast. On the OS level, the kernel has to tell the CPU where to find the physical memory your application is using (resolving virtual addresses) every time you access a new page (typically 4k in size) so also you want to avoid that hit as much as possible.
See What Every Programmer Should Know About Memory - PDF for graphs of algorithm performance vs size of working set (around page 23) and lots of other interesting info.
Basically - write your code to access the smallest amount of memory possible (i.e classes are small, not too many of them), and try to ensure tight loops run on a very very small subset of that memory.

The "working set" is an informal term meaning the memory that's being accessed "frequently" (for some definition of frequently) by an application or set of applications. Applications may also allocate memory that they access infrequently (no more than once every few dozen seconds, perhaps not even once an hour); this would be outside of the working set.
An example might be if you have two Firefox Windows, a minimized one that you haven't looked at for several hours, and an open one that you're browsing in right now. The memory used to store the data associated with the open window is going to be in the working set; the memory used to store the data associated with the window that's not open and that you haven't looked at for several hours is not in the working set.
This is mainly used in discussions about whether you have enough RAM in your system. If your working set is smaller than your RAM, you can work comfortably, because the data your program or programs frequently access is always in memory. If your working set is larger than your RAM, the operating system will be constantly swapping pages out to disk to make room to swap in pages that an application wants to access; these swapped out pages, being in the working set, will almost immediately be needed again, meaning that you've got to take other pages and write them out to disk, and it just goes on and on like this. This is referred to as "thrashing."
If you're not reading or writing many files, your disk light is on all the time, and your system feels very slow, that's a pretty good sign that you're thrashing.

Roughly, the working set is the areas of memory in active use. http://en.wikipedia.org/wiki/Working_set

All the memory your program is using that is not in the "working set" is marked for swap to disk. When the operating system needs more memory for other work it will try to keep the working set of each program in memory, but everything else is up for grabs.

The working set is the set of pages that are physically in memory at any one time. Although the working set if quoted and displayed in kilobytes the smallest working set you can have is 4k (8K on Itanium) as thats the size of a page in Windows.
To see the working set of a process look at the "Mem Usage" column in the task manager's "Processes" tab.
If you're running a .NET app you can watch the working set be reduced looking at the process in the task manager process tab and then minimizing the application. Its working set is dramatically reduced as Windows swaps it out to the page file (since the process is assumed to not be "working" as much).

The term "Working Set" is related to the Working Set Page Replacement Algorithm (the algorithm is very well explained in this article of Andrew Tanenbaum).
So the working set is a number of pages a process demands to be loaded from memory during execution. The working set consists of the most recently loaded page and the k pages that have been loaded before. When it comes to make a frame free for a new demanded page, only pages that are not in the working set may be swapped.

Working set is a subset of virtual pages resident in physical memory.

Related

Flexibility of page file on Windows x64

When developing a Windows application for x64, then the user address space on Windows Vista and Windows 7 x64 is 8TB.
Let's say that I have an application which consumes significantly less than available physical memory (500MB-1GB) in normal working set, and in addition, I have much more than that (let's say 3GB-4GB) in distinct chunks (much smaller than the remaining memory size- let's say 100MB) which are to be loaded exclusively. Of course, whilst technically, I could easily fit an extra 4GB in the address space, the reality is that most of it would have to be paged, except on the higher-end computers which have 6-8GB of RAM.
The question is, am I going to destroy the computer's performance by exhausting the page file by consuming very large amounts of page memory for a single application? Or, equivalently, what would be an appropriate maximum for the amount of memory that I can put into page file?
In addition, would this actually increase my performance on the higher end of machines, as opposed to just loading the data manually from the associated files at the appropriate time?
If your distinct chunks are already stored as files, then treat them as memory mapped files. By doing so, your application doesn't have to manage reading / writing the data. Furthermore (and germane to your question), the data is backed by your files on disk and not the system's page file.
It is up to the operating system to manage the system's resources. Current page files are usually only constrained by available disk space. The OS will manage the system's performance by balancing the physical allocations given to each process. Physical memory allocation and not page file use is more likely to cause performance issues in other running applications.
You may consider providing a setting to toggle how much memory your application will use in case any customers see adverse performance effects.

How much memory should a caching system use on Windows?

I'm developing a client/server application where the server holds large pieces of data such as big images or video files which are requested by the client and I need to create an in-memory client caching system to hold a few of those large data to speed up the process. Just to be clear, each individual image or video is not that big but the overall size of all of them can be really big.
But I'm faced with the "how much data should I cache" problem and was wondering if there are some kind of golden rules on Windows about what strategy I should adopt. The caching is done on the client, I do not need caching on the server.
Should I stay under x% of global memory usage at all time ? And how much would that be ? What will happen if another program is launched and takes up a lot of memory, should I empty the cache ?
Should I request how much free memory is available prior to caching and use a fixed percentage of that memory for my needs ?
I hope I do not have to go there but should I ask the user how much memory he is willing to allocate to my application ? If so, how can I calculate the default value for that property and for those who will never use that setting ?
Rather than create your own caching algorithms why don't you write the data to a file with the FILE_ATTRIBUTE_TEMPORARY attribute and make use of the client machine's own cache.
Although this approach appears to imply that you use a file, if there is memory available in the system then the file will never leave the cache and will remain in memory the whole time.
Some advantages:
You don't need to write any code.
The system cache takes account of all the other processes running. It would not be practical for you to take that on yourself.
On 64 bit Windows the system can use all the memory available to it for the cache. In a 32 bit Delphi process you are limited to the 32 bit address space.
Even if your cache is full and your files to get flushed to disk, local disk access is much faster than querying the database and then transmitting the files over the network.
It depends on what other software runs on the server. I would make it possible to configure it manually at first. Develop a system that can use a specific amount of memory. If you can, build it so that you can change that value while it is running.
If you got those possibilities, you can try some tweaking to see what works best. I don't know any golden rules, but I'd figure you should be able to set a percentage of total memory or total available memory with a specific minimum amount of memory to be free for the system at all times. If you save a miminum of say 500 MB for the server OS, you can use the rest, or 90% of the rest for your cache. But those numbers depend on the version of the OS and the other applications running on the server.
I think it's best to make the numbers configurable from the outside and create a management tool that lets you set the values manually first. Then, if you found out what works best, you can deduct formulas to calculate those values, and integrate them in your management tool. This tool should not be an integral part of the cache program itself (which will probably be a service without GUI anyway).
Questions:
One image can be requested by multiple clients? Or, one image can be requested by multiple times in a short interval?
How short is the interval?
The speed of the network is really high? Higher than the speed of the hard drive?? If you have a normal network, then the harddrive will be able to read the files from disk and deliver them over network in real time. Especially that Windows is already doing some good caching so the most recent files are already in cache.
The main purpose of the computer that is running the server app is to run the server? Or is just a normal computer used also for other tasks? In other words is it a dedicated server or a normal workstation/desktop?
but should I ask the user how much
memory he is willing to allocate to my
application ?
I would definitively go there!!!
If the user thinks that the server application is not a important application it will probably give it low priority (low cache). Else, it it thinks it is the most important running app, it will allow the app to allocate all RAM it needs in detriment of other less important applications.
Just deliver the application with that setting set by default to a acceptable value (which will be something like x% of the total amount of RAM). I will use like 70% of total RAM if the main purpose of the computer to hold this server application and about 40-50% if its purpose is 'general use' computer.
A server application usually needs resources set aside for its own use by its administrator. I would not care about others application behaviour, I would care about being a "polite" application, thereby it should allow memory cache size and so on to be configurable by the administator, which is the only one who knows how to configure his systems properly (usually...)
Defaults values should anyway take into consideration how much memory is available overall, especially on 32 bit systems with less than 4GB of memory (as long as Delphi delivers only 32 bit apps), to leave something free to the operating systems and avoids too frequent swapping. Asking the user to select it at setup is also advisable.
If the application is the only one running on a server, a value between 40 to 75% of available memory could be ok (depending on how much memory is needed beyond the cache), but again, ask the user because it's almost impossible to know what other applications running may need. You can also have a min cache size and a max cache size, start by allocating the lower value, and then grow it when and if needed, and shrink it if necessary.
On a 32 bit system this is a kind of memory usage that could benefit from using PAE/AWE to access more than 3GB of memory.
Update: you can also perform a monitoring of cache hits/misses and calculate which cache size would fit the user needs best (it could be too small but too large as well), and the advise the user about that.
To be honest, the questions you ask would not be my main concern. I would be more concerned with how effective my cache would be. If your files are really that big, how many can you hold in the cache? And if your client server app has many users, what are the chances that your cache will actually cache something someone else will use?
It might be worth doing an analysis before you burn too much time on the fine details.

Why aren't locked pages counted into the working set size?

The purpose of the VirtualLock WinAPI call is to lock pages into the working set of a process. However, the WorkingSet64 API inexplicably doesn't count those pages.
Possibly as a result of this, neither Process Explorer nor the standard Task Manager count locked pages in their per-process memory usage statistics.
What's up with this? Could someone intimately familiar with virtual memory in WinNT shed some light on this inconsistency, which can cause gigabytes of used RAM to go essentially undetected? (think of SQL Server or VirtualBox)
Ah, that is easily explained: You're using the wrong API. GetProcessWorkingSetSize queries the minimum and maximum working set sizes. Those are quotas, not acutal values.
The minimum working set size is what Windows will guarantee to keep locked in RAM as long as the world does not end. The maximum working set size is the amount of memory that Windows will allow your process before pages are moved into the pool (they are not necessarily gone, but accessing them causes a fault and re-mapping).
You want GetProcessMemoryInfo
EDIT:
Since it is now clear that you were not using the wrong API (only named the wrong func), I've done some testing (VirtualAlloc and memory mapped files, both in combination with VirtualLock) on my XP system. At first sight, it looked like you are totally right. Allocating 512MB or memory mapping 512MB out of a 650MB file added 512MB to the virtual size but did not increase the working set. Following with a VirtualLock(512MB) did not affect the working set at all!
Then it occurred to me that VirtualLock took exactly zero time in every case, which did not seem plausible e.g. for having to fetch half a gigabyte from disk. So, I checked the return code and guess what. Windows doesn't think that locking 512MB is a good idea, and will refuse to do it.
Repeated the experiment with only 64MB, and behold, the working set immediately went up by 64MB, just as it should. So, in one word: "works for me".
Just to be sure, you did check the return code?
On a second look, this behaviour is even well-defined and well-documented. The docs to VirtualLock state explicitly:
The maximum number of pages that a
process can lock is equal to the
number of pages in its minimum working
set minus a small overhead.
With and without locking, after appropriately setting the WS quotas:
VirtualBox is a different matter, what you see in the task manager is only the working set of the "Interface" program and "Manager" frontend, both of which maintain working set sizes of below 64M at all times. Though I'm not sure what memory it maybe allocates in some drivers, or if they lock memory at all.
I'm currently running 2 virtual machines with 1.6GB main memory each. Seeing how my 32-bit Windows only sees 3.25GB, that would leave a mere 50MB for if the memory belonging to the VMs is locked. Besides, Process Explorer tells me that Firefox alone has a working set of 474MB and going up while I'm typing this (holy...?!!). That does not make it likely that all the memory in the virtual machines is really locked, because such figures would be entirely impossible then.
As requested, here's a shot of VMMap:
The figures are admittedly funny... the VM has 1.6M total of which according to VMMap 821MiB are reserved and 772MiB are committed, Process Explorer only shows 163MiB and 54MiB, respectively. Something is definitively fishy there, but I suspect this is probably some obscure VirtualBox hackery rather than a Windows issue.

Memory mapped files causes low physical memory

I have a 2GB RAM and running a memory intensive application and going to low available physical memory state and system is not responding to user actions, like opening any application or menu invocation etc.
How do I trigger or tell the system to swap the memory to pagefile and free physical memory?
I'm using Windows XP.
If I run the same application on 4GB RAM machine it is not the case, system response is good. After getting choked of available physical memory system automatically swaps to pagefile and free physical memory, not that bad as 2GB system.
To overcome this problem (on 2GB machine) attempted to use memory mapped files for large dataset which are allocated by application. In this case virtual memory of the application(process) is fine but system cache is high and same problem as above that physical memory is less.
Even though memory mapped file is not mapped to process virtual memory system cache is high. why???!!! :(
Any help is appreciated.
Thanks.
If your data access pattern for using the memory mapped file is sequential, you might get slightly better page recycling by specifying the FILE_FLAG_SEQUENTIAL_SCAN flag when opening the underlying file. If your data pattern accesses the mapped file in random order, this won't help.
You should consider decreasing the size of your map view. That's where all the memory is actually consumed and cached. Since it appears that you need to handle files that are larger than available contiguous free physical memory, you can probably do a better job of memory management than the virtual memory page swapper since you know more about how you're using the memory than the virtual memory manager does. If at all possible, try to adjust your design so that you can operate on portions of the large file using a smaller view.
Even if you can't get rid of the need for full random access across the entire range of the underlying file, it might still be beneficial to tear down and recreate the view as needed to move the view to the section of the file that the next operation needs to access. If your data access patterns tend to cluster around parts of the file before moving on, then you won't need to move the view as often. You'll take a hit to tear down and recreate the view object, but since tearing down the view also releases all the cached pages associated with the view, it seems likely you'd see a net gain in performance because the smaller view significantly reduces memory pressure and page swapping system wide. Try setting the size of the view based on a portion of the installed system RAM and move the view around as needed by your file processing. The larger the view, the less you'll need to move it around, but the more RAM it will consume potentially impacting system responsiveness.
As I think you are hinting in your post, the slow response time is probably at least partially due to delays in the system while the OS writes the contents of memory to the pagefile to make room for other processes in physical memory.
The obvious solution (and possibly not practical) is to use less memory in your application. I'll assume that is not an option or at least not a simple option. The alternative is to try to proactively flush data to disk to continually keep available physical memory for other applications to run. You can find the total memory on the machine with GlobalMemoryStatusEx. And GetProcessMemoryInfo will return current information about your own application's memory usage. Since you say you are using a memory mapped file, you may need to account for that in addition. For example, I believe the PageFileUsage information returned from that API will not include information about your own memory mapped file.
If your application is monitoring the usage, you may be able to use FlushViewOfFile to proactively force data to disk from memory. There is also an API (EmptyWorkingSet) that I think attempts to write as many dirty pages to disk as possible, but that seems like it would very likely hurt performance of your own application significantly. Although, it could be useful in a situation where you know your application is going into some kind of idle state.
And, finally, one other API that might be useful is SetProcessWorkingSetSizeEx. You might consider using this API to give a hint on an upper limit for your application's working set size. This might help preserve more memory for other applications.
Edit: This is another obvious statement, but I forgot to mention it earlier. It also may not be practical for you, but it sounds like one of the best things you might do considering that you are running into 32-bit limitations is to build your application as 64-bit and run it on a 64-bit OS (and throw a little bit more memory at the machine).
Well, it sounds like your program needs more than 2GB of working set.
Modern operating systems are designed to use most of the RAM for something at all times, only keeping a fairly small amount free so that it can be immediately handed out to processes that need more. The rest is used to hold memory pages and cached disk blocks that have been used recently; whatever hasn't been used recently is flushed back to disk to replenish the pool of free pages. In short, there isn't supposed to be much free physical memory.
The principle difference between using a normal memory allocation and memory mapped a files is where the data gets stored when it must be paged out of memory. It doesn't necessarily have any effect on when the memory will be paged out, and will have little effect on the time it takes to page it out.
The real problem you are seeing is probably not that you have too little free physical memory, but that the paging rate is too high.
My suggestion would be to attempt to reduce the amount of storage needed by your program, and see if you can increase the locality of reference to reduce the amount of paging needed.

Windows Stalls When My Program Uses Swapfile

I am running a user mode program on normal priority. My program is searching an NP problem, and as a result, uses up a lot of memory which eventually ends up in the swap file.
Then my mouse freezes up, and it takes forever for task manager to open up and let me end the process.
What I want to know is how I can stop my Windows operating system from completely locking up from this even though only 1 out of my 2 cores are being used.
Edit:
Thanks for the replies.
I know that making it use less memory will help, but it just doesn't make sense to me that the whole OS should lock up.
The obvious answer is "use less memory". When your app uses up all the
available memory, the OS has to page the task manager (etc.) out to make room for your app. When you switch programs, the OS has to page the other programs back in (as they are needed).
Disk reads are slower than memory reads, so everything appears to be
going slower.
If you want to avoid this, have your app manage its own memory, or
use a better algorithm than brute force. (There are genetic
algorithms, simulated annealing, etc.)
The problem is that when another program (e.g. explorer.exe) is going to execute, all of its code and memory has been swapped out. To make room for the other program Windows has to first write data that your program is using to disk, then load up the other program's memory. Every new page of code that is executed in the other program requires disk access, causing it to run slowly.
I don't know the access pattern of your program, but I'm guessing it touches all of its memory pages a lot in a random fashion, which makes the problem worse because as soon as Windows evicts a memory page from your program, suddenly you need it again and Windows has to find some other page to give the same treatment.
To give other processes more RAM to live in, you can use SetProcessWorkingSetSize to reduce the maximum amount of RAM that your program may use. Of course this will make your program run more slowly because it has to do more swapping.
Another alternative you could try is to add more drives to the system, and distribute the swap files over those. You may have a dual-core CPU, but you have only a single drive. Distributing the swap file over multiple drives allows Windows to balance work across them (although I don't have first-hand experience of how well it does this).
I don't think there's a programming answer to this question, aside from "restructure your app to use less memory." The swapfile problem is most likely due to the bottleneck in accessing the disk, especially if you're using an IDE HDD or a highly fragmented swapfile.
It's a bit extreme, but you could always minimise your swap file so you don't have all the disk thashing, and your program isn't allowed to allocate much virtual memory. Under Control panel / Advanced / Advanced tab / Perfromance / Virtual memory, set the page file to custom size and enter a value of 2mb (smallest allowed on XP). When an allocation fails, you should get an exception and be able exit gracefully. It doesn't quite fix your problem, just speeds it up ;)
Another thing worth considering would be if you are ona 32bit platform, port to a 64bit system and get a box with much more addressable RAM.

Resources