Does Windows clear memory pages? - windows

I know that Windows has an option to clear the page file when it shuts down.
Does Windows do anything special with the actual physical/virtual memory when it goes in or out of scope?
For instance, let's say I run application A, which writes a recognizable string to a variable in memory, and then I close the application. Then I run application B. It allocates a large chunk of memory, leaves the contents uninitialized, and searches it for the known string written by application A.
Is there ANY possibility that application B will pick up the string written by application A? Or does Windows scrub the memory before making it available?

Windows does "scrub" the freed memory returned by a process before allocating it to other processes. There is a kernel thread specifically for this task alone.
The zero page thread runs at the lowest priority and is responsible for zeroing out free pages before moving them to the zeroed page list[1].
Rather than worrying about retaining sensitive data in the paging file, you should be worried about continuing to retain it in memory (after use) in the first place. Clearing the page-file on shutdown is not the default behavior. Also a system crash dump will contain any sensitive info that you may have in "plain-text" in RAM.
Windows does NOT "scrub" the memory as long as it is allocated to a process (obviously). Rather it is left to the program(mer) to do so. For this very purpose one can use the SecureZeroMemory() function.
This function is defined as the RtlSecureZeroMemory() function ( see WinBase.h). The implementation of RtlSecureZeroMemory() is provided inline and can be used on any version of Windows ( see WinNT.h)
Use this function instead of ZeroMemory() when you want to ensure that your data will be overwritten promptly, as some C++ compilers can optimize a call to ZeroMemory() by removing it entirely.
WCHAR szPassword[MAX_PATH];
/* Obtain the password */
if (GetPasswordFromUser(szPassword, MAX_PATH))
{
UsePassword(szPassword);
}
/* Before continuing, clear the password from memory */
SecureZeroMemory(szPassword, sizeof(szPassword));
Don't forget to read this interesting article by Raymond Chen.

Related

accessing process memory parts

I'm currently studying memory management of OS by the video lecture. The instructor says,
In fact, you may have, and it is quite often the case that there may
be several parts of the process memory, which are not even accessed at
all. That is, they are neither executed, loaded or stored from memory.
I don't understand the saying since even if in a simple C program, we access whole address space of it. Don't we?
#include <stdio.h>
int main()
{
printf("Hello, World!");
return 0;
}
Could you elucidate the saying? If possible could you provide an example program wherein "several parts of the process memory, which are not even accessed at all" when it is run.
Imagine you have a large and complicated utility (e.g. a compiler), and the user asks it for help (e.g. they type gcc --help instead of asking it to compile anything). In this case, how much of the utility's code and data is used?
Most programs have various optional parts that aren't used (e.g. maybe something that works with graphics will have some code for 16 bits per pixel and other code for 32 bits per pixel, and will determine which code to use and not use the other code). Most heap allocators are "eager" (e.g. they'll ask the OS for 20 MiB of space and then might only "malloc() 2 MiB of it). Sometimes a program will memory map a huge file but then only access a small part of it.
Even for your trivial "hello world" example code; the virtual address space probably contains a huge (several MiB) shared library to support lots of C standard library functions (e.g. puts(), fprintf(), sprintf(), ...) and your program only uses a small part of that shared library; and your program probably reserves a conservative amount of space for its stack (e.g. maybe 20 KiB of space for its stack) and then probably only uses a few hundred bytes of stack.
In a virtual memory system, the address space of the process is created in secondary store at start up. Little or nothing gets placed in memory. For example, the operating system may use the executable file as the page file for the code and static data. It just sets up an internal structure that says some range of memory is mapped to these blocks in the executable file. The same goes for shared libraries. The other data gets mapped to the page file.
As your program runs it starts page faulting rapidly because nothing is in memory and the operating system has to load it from secondary storage.
If there is something that your program does not reference, it never gets loaded into memory.
If you had global variable declared like
char somedata [1045] ;
and your program never references that variable, it will never get loaded into memory. The same goes for code. If you have pages of code that done get execute (e.g. error handling code) it does not get loaded. If you link to shared libraries, you will likely bece including a lot of functions that you never use. Likewise, they will not get loaded if you do not execute them.
To begin with, not all of the address space is backed by physical memory at all times, especially if your address space covers 248+ bytes, which your computer doesn't have (which is not to say you can't map most of the address space to a single physical page of memory, which would be of very little utility for anything).
And then some portions of the address space may be purposefully permanently inaccessible, like a few pages near virtual address 0 (to catch NULL pointer dereferences).
And as it's been pointed out in the other answers, with on-demand loading of programs, you may have some portions of the address space reserved for your program but if the program doesn't happen to need any of its code or data there, nothing needs to be loader there either.

x86 - kernel - programs, cleaning and memory overwrite

I am not sure about something. Take linux for example; when a program exits, the kernel is responsible for cleaning after the process.
How can one be sure that physical memory is never overwritten from process A to process B (different virtual memories (page entries) leading to the same physical allocation)?
How is it prevented?
Linux assigns pages to and frees pages from processes using the facilities described here.(Search the kernel sources for more detailed information.)
That means, the kernel saves information about the used pages in some data structure (could be a bitmap, for example) and only the unused ones are exposed as usable to new processes.
That prevents mistakenly assigning pages in use to new process. Any behavior beyond that would be a bug and a magnificent security hole.

CreateFileMapping and MapViewOfFile with interprocess (un)synchronized multithreaded access?

I use a Shared Memory area to get som data to a second process.
The first process uses CreateFileMapping(INVALID_HANDLE_VALUE, ..., PAGE_READWRITE, ...) and MapViewOfFile( ... FILE_MAP_WRITE).
The second process uses OpenFileMapping(FILE_MAP_WRITE, ...) and MapViewOfFile( ... FILE_MAP_WRITE).
The docs state:
Multiple views of a file mapping object
are coherent if they contain identical data at a specified time.
This occurs if the file views are derived from any file mapping object
that is backed by the same file. (...)
With one important exception, file views derived from any file mapping
object that is backed by the same file are coherent or identical at a
specific time. Coherency is guaranteed for views within a process and
for views that are mapped by different processes.
The exception is related to remote files. (...)
Since I'm just using the Shared Memory as is (backed by the paging file) I would have assumed that some synchronization is needed between processes to see a coherent view of the memory another process has written. I'm unsure however what synchronization would be needed exactly.
The current pattern I have (simplified) is like this:
Process1 | Process2
... | ...
/* write to shared mem, */ | ::WaitForSingleObject(hDataReady); // real code has error handling
/* then: */
::SetEvent(hDataReady); | /* read from shared mem after wait returns */
... | ...
Is this enough synchronization, even for shared memory?
What sync is needed in general between the two processes?
Note that inside of one single process, the call to SetEvent would certainly constitute a full memory barrier, but it isn't completely clear to me whether that holds for shared memory across processes.
I have since come to believe that for memory-access synchronization purposes, it really does not matter if the concurrently accessed memory is shared between processes or just withing one process between threads.
That is, for Shared Memory (the one shared between processes) on Windows, the same restrictions and guidelines apply as with "normal" memory within a process that is just shared between the threads of the process.
The reason I believe this is that a process and a thread are somewhat orthogonal on Windows. A process is a "container" for threads, and in order for the process to be able to do anything, it needs at least one thread. So, for memory that is mapped into multiple process' address space, the synchronization requirements on the threads running within these different processes should be actually the same as for threads running within the same process.
So, the answer to my question Is this enough synchronization, even for shared memory? is that shared memory requires the same synchronization as "normal" memory. But of course, not all synchronization techniques works across process boundaries, so you are restricted in what you can use. (A Critical Section for exampled cannot be used across processes.)
If both of those code snippets are in a loop then in addition to the event you'll need a mutex so that Process1 doesn't start writing again while Process2 is still reading. To be more specific, the mutex must be acquired before reading or writing and released after reading or writing. Make sure the mutex has been released before calling WFSO in Process2.
My understanding is that although Windows may guarantee view coherency, it does not guarantee a write is fully completed before the client reads it.
For example, if you were writing "Hello world!" to the view, it could only be partially written when the client reads it, such as "Hello w".
Therefore, the view would be byte coherent, but not message coherent.
Personally, I use a mutex to guarantee thread-safe access.
Use Semaphore should be better than Event.

Mapping of Page allocated to user process in Kernel virtual address space

When a page is created for a process (which will be mapped into process address space), will that page be mapped into kernel address space ?
If not, then it won't have kernel virtual address. Then how the swapper will find the page and swap that out, if a need arises ?
If we're talking about the x86 or similar (in terms of page translation) architectures, at any given time there's one virtual address space and normally one part of it is reserved for the kernel and the other for user-mode processes.
On a context switch between two processes only the user-mode part of the virtual address space changes.
With such an organization, the kernel always has full access to the current user-mode process, because, again, there's only one current virtual address space at any moment for both the kernel and a user-mode process, it's not two, it's one. So, the kernel doesn't really have to have another, extra mapping for user-mode pages. But that's not the main point.
The main point is that the kernel keeps some sort of statistics for every page that if needed can be saved to the disk and reused elsewhere. The CPU marks each page's page table entry (PTE) as accessed when the page is first read from or written to and as dirty when it's first written to.
The kernel scans the PTEs periodically, reads the accessed and dirty markers to update said statistics and clears accessed and dirty so it can detect a change in them later (of course, if any). Based on this statistics it determines which pages are rarely used or long unused and can be repurposed.
If the "swapper" runs in the context of the current process and if it runs in the kernel, then in theory it has enough information from the kernel (the list of rarely used or long unused pages to save and unmap if dirty or just unmap if not dirty) and sufficient access to the pages of interest.
If the "swapper" itself runs as a user-mode process, things become more complicated because it doesn't have access to another process' pages by default and has to either create a mapping or ask the kernel do some extra work for it in the context of the process of interest.
So, finding rarely used and long unused pages and their addresses occurs in the kernel. The CPU helps by automatically marking PTEs as accessed and dirty. There may need to be an extra mapping to dirty pages if they get saved to the disk not in the context of the process that owns them.

Can address space be recycled for multiple calls to MapViewOfFileEx without chance of failure?

Consider a complex, memory hungry, multi threaded application running within a 32bit address space on windows XP.
Certain operations require n large buffers of fixed size, where only one buffer needs to be accessed at a time.
The application uses a pattern where some address space the size of one buffer is reserved early and is used to contain the currently needed buffer.
This follows the sequence:
(initial run) VirtualAlloc -> VirtualFree -> MapViewOfFileEx
(buffer changes) UnMapViewOfFile -> MapViewOfFileEx
Here the pointer to the buffer location is provided by the call to VirtualAlloc and then that same location is used on each call to MapViewOfFileEx.
The problem is that windows does not (as far as I know) provide any handshake type operation for passing the memory space between the different users.
Therefore there is a small opportunity (at each -> in my above sequence) where the memory is not locked and another thread can jump in and perform an allocation within the buffer.
The next call to MapViewOfFileEx is broken and the system can no longer guarantee that there will be a big enough space in the address space for a buffer.
Obviously refactoring to use smaller buffers reduces the rate of failures to reallocate space.
Some use of HeapLock has had some success but this still has issues - something still manages to steal some memory from within the address space.
(We tried Calling GetProcessHeaps then using HeapLock to lock all of the heaps)
What I'd like to know is there anyway to lock a specific block of address space that is compatible with MapViewOfFileEx?
Edit: I should add that ultimately this code lives in a library that gets called by an application outside of my control
You could brute force it; suspend every thread in the process that isn't the one performing the mapping, Unmap/Remap, unsuspend the suspended threads. It ain't elegant, but it's the only way I can think of off-hand to provide the kind of mutual exclusion you need.
Have you looked at creating your own private heap via HeapCreate? You could set the heap to your desired buffer size. The only remaining problem is then how to get MapViewOfFileto use your private heap instead of the default heap.
I'd assume that MapViewOfFile internally calls GetProcessHeap to get the default heap and then it requests a contiguous block of memory. You can surround the call to MapViewOfFile with a detour, i.e., you rewire the GetProcessHeap call by overwriting the method in memory effectively inserting a jump to your own code which can return your private heap.
Microsoft has published the Detour Library that I'm not directly familiar with however. I know that detouring is surprisingly common. Security software, virus scanners etc all use such frameworks. It's not pretty, but may work:
HANDLE g_hndPrivateHeap;
HANDLE WINAPI GetProcessHeapImpl() {
return g_hndPrivateHeap;
}
struct SDetourGetProcessHeap { // object for exception safety
SDetourGetProcessHeap() {
// put detour in place
}
~SDetourGetProcessHeap() {
// remove detour again
}
};
void MapFile() {
g_hndPrivateHeap = HeapCreate( ... );
{
SDetourGetProcessHeap d;
MapViewOfFile(...);
}
}
These may also help:
How to replace WinAPI functions calls in the MS VC++ project with my own implementation (name and parameters set are the same)?
How can I hook Windows functions in C/C++?
http://research.microsoft.com/pubs/68568/huntusenixnt99.pdf
Imagine if I came to you with a piece of code like this:
void *foo;
foo = malloc(n);
if (foo)
free(foo);
foo = malloc(n);
Then I came to you and said, help! foo does not have the same address on the second allocation!
I'd be crazy, right?
It seems to me like you've already demonstrated clear knowledge of why this doesn't work. There's a reason that the documention for any API that takes an explicit address to map into lets you know that the address is just a suggestion, and it can't be guaranteed. This also goes for mmap() on POSIX.
I would suggest you write the program in such a way that a change in address doesn't matter. That is, don't store too many pointers to quantities inside the buffer, or if you do, patch them up after reallocation. Similar to the way you'd treat a buffer that you were going to pass into realloc().
Even the documentation for MapViewOfFileEx() explicitly suggests this:
While it is possible to specify an address that is safe now (not used by the operating system), there is no guarantee that the address will remain safe over time. Therefore, it is better to let the operating system choose the address. In this case, you would not store pointers in the memory mapped file, you would store offsets from the base of the file mapping so that the mapping can be used at any address.
Update from your comments
In that case, I suppose you could:
Not map into contiguous blocks. Perhaps you could map in chunks and write some intermediate function to decide which to read from/write to?
Try porting to 64 bit.
As the earlier post suggests, you can suspend every thread in the process while you change the memory mappings. You can use SuspendThread()/ResumeThread() for that. This has the disadvantage that your code has to know about all the other threads and hold thread handles for them.
An alternative is to use the Windows debug API to suspend all threads. If a process has a debugger attached, then every time the process faults, Windows will suspend all of the process's threads until the debugger handles the fault and resumes the process.
Also see this question which is very similar, but phrased differently:
Replacing memory mappings atomically on Windows

Resources