Can VirtualProtect be leveraged for performance? - winapi

The VirtualProtect() function in the Win32 API allows one to make memory pages read-only, write-only, executable-only, and a wide range of other settings.
I can see the security motivation, but if I had some memory I'd allocated, say on the heap, and I knew my application would only be reading from that memory, would setting the page to read-only improve access performance?
Likewise, does the same hold for setting it to write-only if I know the application will only be writing to that memory?
I ask, as I have a dim memory from my research into the Vulkan API, that marking certain memory objects, like attachments, as having certain access patterns, tells the driver to optimize that memory object and its layout.

Related

Why should userspace applications lock Ebpf maps?

When you create EBPF maps, memory is allocated in kernel space. And kernel memory never gets swapped out. Then, why is there a need for the userspace application to call setrlimit() with RLIMIT_MEMLOCK?
RLIMIT_MEMLOCK refers to the amount of memory that may be locked into RAM, it is not specific to allocations in the user space memory address range. On pre-5.11 kernels, memory used for eBPF objects (programs, maps, BTF objects etc.) is accounted against this resource, meaning that if you create too many of them at a given time, or even in a short interval of time (given that there is a small delay after the deletion of an object before the kernel reclaims the corresponding memory area), you may hit the resource limit and get a -EPERM as an answer. For privileged users, calling setrlimit() to lift the restrictions on this resource is the usual workaround indeed.
Note that in Linux 5.11, rlimit-based accounting was dropped in favour of cgroup-based memory accounting. The rlimit had a number of downsides (refer to the cover letter linked above for details), and the cgroup-based accounting is more flexible, while allowing a better control and offering easier ways to retrieve the current amount of memory used. It should also reflect the actual memory consumption, while this was not necessarily the case with rlimit.

Memory mapped files causes low physical memory

I have a 2GB RAM and running a memory intensive application and going to low available physical memory state and system is not responding to user actions, like opening any application or menu invocation etc.
How do I trigger or tell the system to swap the memory to pagefile and free physical memory?
I'm using Windows XP.
If I run the same application on 4GB RAM machine it is not the case, system response is good. After getting choked of available physical memory system automatically swaps to pagefile and free physical memory, not that bad as 2GB system.
To overcome this problem (on 2GB machine) attempted to use memory mapped files for large dataset which are allocated by application. In this case virtual memory of the application(process) is fine but system cache is high and same problem as above that physical memory is less.
Even though memory mapped file is not mapped to process virtual memory system cache is high. why???!!! :(
Any help is appreciated.
Thanks.
If your data access pattern for using the memory mapped file is sequential, you might get slightly better page recycling by specifying the FILE_FLAG_SEQUENTIAL_SCAN flag when opening the underlying file. If your data pattern accesses the mapped file in random order, this won't help.
You should consider decreasing the size of your map view. That's where all the memory is actually consumed and cached. Since it appears that you need to handle files that are larger than available contiguous free physical memory, you can probably do a better job of memory management than the virtual memory page swapper since you know more about how you're using the memory than the virtual memory manager does. If at all possible, try to adjust your design so that you can operate on portions of the large file using a smaller view.
Even if you can't get rid of the need for full random access across the entire range of the underlying file, it might still be beneficial to tear down and recreate the view as needed to move the view to the section of the file that the next operation needs to access. If your data access patterns tend to cluster around parts of the file before moving on, then you won't need to move the view as often. You'll take a hit to tear down and recreate the view object, but since tearing down the view also releases all the cached pages associated with the view, it seems likely you'd see a net gain in performance because the smaller view significantly reduces memory pressure and page swapping system wide. Try setting the size of the view based on a portion of the installed system RAM and move the view around as needed by your file processing. The larger the view, the less you'll need to move it around, but the more RAM it will consume potentially impacting system responsiveness.
As I think you are hinting in your post, the slow response time is probably at least partially due to delays in the system while the OS writes the contents of memory to the pagefile to make room for other processes in physical memory.
The obvious solution (and possibly not practical) is to use less memory in your application. I'll assume that is not an option or at least not a simple option. The alternative is to try to proactively flush data to disk to continually keep available physical memory for other applications to run. You can find the total memory on the machine with GlobalMemoryStatusEx. And GetProcessMemoryInfo will return current information about your own application's memory usage. Since you say you are using a memory mapped file, you may need to account for that in addition. For example, I believe the PageFileUsage information returned from that API will not include information about your own memory mapped file.
If your application is monitoring the usage, you may be able to use FlushViewOfFile to proactively force data to disk from memory. There is also an API (EmptyWorkingSet) that I think attempts to write as many dirty pages to disk as possible, but that seems like it would very likely hurt performance of your own application significantly. Although, it could be useful in a situation where you know your application is going into some kind of idle state.
And, finally, one other API that might be useful is SetProcessWorkingSetSizeEx. You might consider using this API to give a hint on an upper limit for your application's working set size. This might help preserve more memory for other applications.
Edit: This is another obvious statement, but I forgot to mention it earlier. It also may not be practical for you, but it sounds like one of the best things you might do considering that you are running into 32-bit limitations is to build your application as 64-bit and run it on a 64-bit OS (and throw a little bit more memory at the machine).
Well, it sounds like your program needs more than 2GB of working set.
Modern operating systems are designed to use most of the RAM for something at all times, only keeping a fairly small amount free so that it can be immediately handed out to processes that need more. The rest is used to hold memory pages and cached disk blocks that have been used recently; whatever hasn't been used recently is flushed back to disk to replenish the pool of free pages. In short, there isn't supposed to be much free physical memory.
The principle difference between using a normal memory allocation and memory mapped a files is where the data gets stored when it must be paged out of memory. It doesn't necessarily have any effect on when the memory will be paged out, and will have little effect on the time it takes to page it out.
The real problem you are seeing is probably not that you have too little free physical memory, but that the paging rate is too high.
My suggestion would be to attempt to reduce the amount of storage needed by your program, and see if you can increase the locality of reference to reduce the amount of paging needed.

What is private bytes, virtual bytes, working set?

I am trying to use the perfmon windows utility to debug memory leaks in a process.
This is how perfmon explains the terms:
Working Set is the current size, in bytes, of the Working Set of this process. The Working Set is the set of memory pages touched recently by the threads in the process. If free memory in the computer is above a threshold, pages are left in the Working Set of a process even if they are not in use. When free memory falls below a threshold, pages are trimmed from Working Sets. If they are needed they will then be soft-faulted back into the Working Set before leaving main memory.
Virtual Bytes is the current size, in bytes, of the virtual address space the process is using. Use of virtual address space does not necessarily imply corresponding use of either disk or main memory pages. Virtual space is finite, and the process can limit its ability to load libraries.
Private Bytes is the current size, in bytes, of memory that this process has allocated that cannot be shared with other processes.
These are the questions I have:
Is it the Private Bytes which I should measure to be sure if the process is having any leaks as it does not involve any shared libraries and any leaks, if happening, will come from the process itself?
What is the total memory consumed by the process? Is it the Virtual Bytes or is it the sum of Virtual Bytes and Working Set?
Is there any relation between Private Bytes, Working Set and Virtual Bytes?
Are there any other tools that give a better idea of the memory usage?
The short answer to this question is that none of these values are a reliable indicator of how much memory an executable is actually using, and none of them are really appropriate for debugging a memory leak.
Private Bytes refer to the amount of memory that the process executable has asked for - not necessarily the amount it is actually using. They are "private" because they (usually) exclude memory-mapped files (i.e. shared DLLs). But - here's the catch - they don't necessarily exclude memory allocated by those files. There is no way to tell whether a change in private bytes was due to the executable itself, or due to a linked library. Private bytes are also not exclusively physical memory; they can be paged to disk or in the standby page list (i.e. no longer in use, but not paged yet either).
Working Set refers to the total physical memory (RAM) used by the process. However, unlike private bytes, this also includes memory-mapped files and various other resources, so it's an even less accurate measurement than the private bytes. This is the same value that gets reported in Task Manager's "Mem Usage" and has been the source of endless amounts of confusion in recent years. Memory in the Working Set is "physical" in the sense that it can be addressed without a page fault; however, the standby page list is also still physically in memory but not reported in the Working Set, and this is why you might see the "Mem Usage" suddenly drop when you minimize an application.
Virtual Bytes are the total virtual address space occupied by the entire process. This is like the working set, in the sense that it includes memory-mapped files (shared DLLs), but it also includes data in the standby list and data that has already been paged out and is sitting in a pagefile on disk somewhere. The total virtual bytes used by every process on a system under heavy load will add up to significantly more memory than the machine actually has.
So the relationships are:
Private Bytes are what your app has actually allocated, but include pagefile usage;
Working Set is the non-paged Private Bytes plus memory-mapped files;
Virtual Bytes are the Working Set plus paged Private Bytes and standby list.
There's another problem here; just as shared libraries can allocate memory inside your application module, leading to potential false positives reported in your app's Private Bytes, your application may also end up allocating memory inside the shared modules, leading to false negatives. That means it's actually possible for your application to have a memory leak that never manifests itself in the Private Bytes at all. Unlikely, but possible.
Private Bytes are a reasonable approximation of the amount of memory your executable is using and can be used to help narrow down a list of potential candidates for a memory leak; if you see the number growing and growing constantly and endlessly, you would want to check that process for a leak. This cannot, however, prove that there is or is not a leak.
One of the most effective tools for detecting/correcting memory leaks in Windows is actually Visual Studio (link goes to page on using VS for memory leaks, not the product page). Rational Purify is another possibility. Microsoft also has a more general best practices document on this subject. There are more tools listed in this previous question.
I hope this clears a few things up! Tracking down memory leaks is one of the most difficult things to do in debugging. Good luck.
The definition of the perfmon counters has been broken since the beginning and for some reason appears to be too hard to correct.
A good overview of Windows memory management is available in the video "Mysteries of Memory Management Revealed" on MSDN: It covers more topics than needed to track memory leaks (eg working set management) but gives enough detail in the relevant topics.
To give you a hint of the problem with the perfmon counter descriptions, here is the inside story about private bytes from "Private Bytes Performance Counter -- Beware!" on MSDN:
Q: When is a Private Byte not a Private Byte?
A: When it isn't resident.
The Private Bytes counter reports the commit charge of the process. That is to say, the amount of space that has been allocated in the swap file to hold the contents of the private memory in the event that it is swapped out. Note: I'm avoiding the word "reserved" because of possible confusion with virtual memory in the reserved state which is not committed.
From "Performance Planning" on MSDN:
3.3 Private Bytes
3.3.1 Description
Private memory, is defined as memory allocated for a process which cannot be shared by other processes. This memory is more expensive than shared memory when multiple such processes execute on a machine. Private memory in (traditional) unmanaged dlls usually constitutes of C++ statics and is of the order of 5% of the total working set of the dll.
You should not try to use perfmon, task manager or any tool like that to determine memory leaks. They are good for identifying trends, but not much else. The numbers they report in absolute terms are too vague and aggregated to be useful for a specific task such as memory leak detection.
A previous reply to this question has given a great explanation of what the various types are.
You ask about a tool recommendation:
I recommend Memory Validator. Capable of monitoring applications that make billions of memory allocations.
http://www.softwareverify.com/cpp/memory/index.html
Disclaimer: I designed Memory Validator.
There is an interesting discussion here: http://social.msdn.microsoft.com/Forums/en-US/vcgeneral/thread/307d658a-f677-40f2-bdef-e6352b0bfe9e/
My understanding of this thread is that freeing small allocations are not reflected in Private Bytes or Working Set.
Long story short:
if I call
p=malloc(1000);
free(p);
then the Private Bytes reflect only the allocation, not the deallocation.
if I call
p=malloc(>512k);
free(p);
then the Private Bytes correctly reflect the allocation and the deallocation.

How does the operating system know how much memory my app is using? (And why doesn't it do garbage collection?)

When my task manager (top, ps, taskmgr.exe, or Finder) says that a process is using XXX KB of memory, what exactly is it counting, and how does it get updated?
In terms of memory allocation, does an application written in C++ "appear" different to an operating system from an application that runs as a virtual machine (managed code like .NET or Java)?
And finally, if memory is so transparent - why is garbage collection not a function-of or service-provided-by the operating system?
As it turns out, what I was really interested in asking is WHY the operating system could not do garbage collection and defrag memory space - which I see as a step above "simply" allocating address space to processes.
These answers help a lot! Thanks!
This is a big topic that I can't hope to adequately answer in a single answer here. I recommend picking up a copy of Windows Internals, it's an invaluable resource. Eric Lippert had a recent blog post that is a good description of how you can view memory allocated by the OS.
Memory that a process is using is basically just address space that is reserved by the operating system that may be backed by physical memory, the page file, or a file. This is the same whether it is a managed application or a native application. When the process exits, the operating system deletes the memory that it had allocated for it - the virtual address space is simply deleted and the page file or physical memory backings are free for other processes to use. This is all the OS really maintains - mappings of address space to some physical resource. The mappings can shift as processes demand more memory or are idle - physical memory contents can be shifted to disk and vice versa by the OS to meet demand.
What a process is using according to those tools can actually mean one of several things - it can be total address space allocated, total memory allocated (page file + physical memory) or memory a process is actually using that is resident in memory. Task Manager has a separate column for each of these possibilities.
The OS can't do garbage collection since it has no insight into what that memory actually contains - it just sees allocated pages of memory, it doesn't see objects which may or may not be referenced.
Whereas the OS handles allocates at the virtual address level, in the process itself there are other memory managers which take these large, page-sized chunks and break them up into something useful for the application to use. Windows will return memory allocated in 64k boundaries, but then the heap manager breaks it up into smaller chunks for use by each individual allocation done by the program via new. In .Net applications, the CLR will hand off new objects off of the garbage collected heap and when that heap reaches its limits, will perform garbage collection.
I can't speak to your question about the differences in how the memory appears in C++ vs. virtual machines, etc, but I can say that applications are typically given a certain memory range to use upon initialization. Then, if the application ever requires more, it will request it from the operating system, and the operating system will (generally) grant it. There are many implementations of this - in some operating systems, other applications' memory is moved away so as to give yours a larger contiguous block, and in others, your application gets various chunks of memory. There may even be some virtual memory handling involved. Its all up to an abstracted implementation. In any case, the memory is still treated as contiguous from within your program - the operating system will handle that much at least.
With regard to garbage collection, the operating system knows the bounds of your memory, but not what is inside. Furthermore, even if it did look at the memory used by your application, it would have no idea what blocks are used by out-of-scope variables and such that the GC is looking for.
The primary difference is application management. Microsoft distinguishes this as Managed and Unmanaged. When objects are allocated in memory they are stored at a specific address. This is true for both managed and unmanaged applications. In the managed world the "address" is wrapped in an object handle or "reference".
When memory is garbage collected by a VM, it can safely suspend the application, move objects around in memory and update all the "references" to point to their new locations in memory.
In a Win32 style app, pointers are pointers. There's no way for the OS to know if it's an object, an arbitrary block of data, a plain-old 32-bit value, etc. So it can't make any inferences about pointers between objects, so it can't move them around in memory or update pointers between objects.
Because the way references are handled, the OS can't take over the GC process and instead it's left up to the VM to manage the memory used by the application. For that reason, VM applications appear exactly the same to the OS. They simply request blocks of memory for use and the OS gives it to them. When the VM performs GC and compacts it's memory it's able to free memory back to the OS for use by another app.

What is the purpose of allocating pages in the pagefile with CreateFileMapping?

The function CreateFileMapping can be used to allocate space in the pagefile (if the first argument is INVALID_HANDLE_VALUE). The allocated space can later be memory mapped into the process virtual address space.
Why would I want to do this instead of using just VirtualAlloc?
It seems that both functions do almost the same thing. Memory allocated by VirtualAlloc may at some point be pushed out to the pagefile. Why should I need an API that specifically requests that my pages be allocated there in the first instance? Why should I care where my private pages live?
Is it just a hint to the OS about my expected memory usage patterns? (Ie, the former is a hint to swap out those pages more aggressively.)
Or is it simply a convenience method when working with very large datasets on 32-bit processes? (Ie, I can use CreateFileMapping to make >4Gb allocations, then memory map smaller chunks of the space as needed. Using the pagefile saves me the work of manually managing my own set of files to "swap" to.)
PS. This question is sparked by an article I read recently: http://blogs.technet.com/markrussinovich/archive/2008/11/17/3155406.aspx
From the CreateFileMappingFunction:
A single file mapping object can be shared by multiple processes.
Can the Virtual memory be shared across multiple processes?
One reason is to share memory among different processes. Different processes by only knowing the name of the mapping object can communicate over page file. This is preferable over creating a real file and doing the communications. Of course there may be other use cases. You can refer to Using a File Mapping for IPC at MSDN for more information.

Resources