Two views of a same memory mapped file in windows, how do they interact flush-wise? - windows

I'm specifically interested in the flush behaviour.
Suppose we created a MMF with CreateFileMapping(), and opened two views, V1 and V2 using MapViewOfFile() with zero offset.
Then I write something to A=V1+a and something to B=V2+b such that A and B belong to different physical memory pages.
Then if I flush the whole first view using FlushViewOfFile(V1, 0), will the dirty pages of the second view also be affected?
My goal is to have 2 views of the same file, where the first view is used for very small writes and very frequent flushes, while the second view is used for massive writes and is flushed only once in a while.
It is important that flushing small writes wouldn't cause flushing of massive writes.
Is this a default behaviour? If not, how to achieve it?
Thanks

CreateFileMappingA, see also CreateProcess, DuplicateHandle and OpenFileMapping functions.
"Creating a file mapping object does not actually map the view into a
process address space. The MapViewOfFile and MapViewOfFileEx
functions map a view of a file into a process address space.
With one important exception, file views derived from any file mapping object that is backed by the same file are coherent
or identical at a specific time. Coherency is guaranteed for views
within a process and for views that are mapped by different
processes.
The exception is related to remote files. Although CreateFileMapping works with remote files, it does not keep them
coherent. For example, if two computers both map a file as writable,
and both change the same page, each computer only sees its own writes
to the page. When the data gets updated on the disk, it is not
merged."
According to the FlushViewOfFile,
"Flushing a range of a mapped view initiates writing of dirty
pages within that range to the disk. Dirty pages are those whose
contents have changed since the file view was mapped. The
FlushViewOfFile function does not flush the file metadata, and it does not wait to return until the changes are flushed from the underlying hardware disk cache and physically written to disk. To
flush all the dirty pages plus the metadata for the file and ensure
that they are physically written to disk, call FlushViewOfFile and
then call the FlushFileBuffers function."
So, when you close views in UnmapViewOfFile
"When a process has finished with the file mapping object, it should
destroy all file views in its address space by using the
UnmapViewOfFile function for each file view.
Unmapping a mapped view of a file invalidates the range occupied by the view in the address space of the process and makes the range
available for other allocations. It removes the working set entry
for each unmapped virtual page that was part of the working set of the
process and reduces the working set size of the process. It also
decrements the share count of the corresponding physical page.
Modified pages in the unmapped view are not written to disk until their share count reaches zero, or in other words, until they are
unmapped or trimmed from the working sets of all processes that share
the pages. Even then, the modified pages are written "lazily" to
disk; that is, modifications may be cached in memory and written to
disk at a later time. To minimize the risk of data loss in the event
of a power failure or a system crash, applications should explicitly
flush modified pages using the FlushViewOfFile function."

Related

Fastest way to send large blobs of data from one program to another in Windows?

I need to send large blobs of data (~10MB) from one program to another in Windows 7. I would like a method that allows for at least a gigabyte per second total throughput with very low system load. To simplify this, all blobs may be the same size, and one program may be a child process of the other.
Method 1: Memory map the same file in both programs: CreateFileMapping() / MapViewOfFile()
In this case, the memory mapped file(s) presumably contains room for several blobs in a ring buffer. There would need to be some external mechanism to synchronize access to the ring buffer.
Method 2: Create named data sections
Method 3: WriteProcessMemory (suggested by Hristo Iliev below, thanks!)
Method 4: Read/write files on a RAM disk.
Method 5: Read/write to an anonymous pipe.
Method ?: Anything else? Perhaps write over TCP, use MPI, ...
I know that memory-mapped files (method 1) are considered the standard solution to this problem :)
How fast are memory-mapped files? (rough order of magnitude)
Is there an even faster method?
How much worse is the performance of the other methods? Which ones of them can hit GB/sec throughput?
If using memory mapped files, what is the best way for the programs to synchronize access to the data being passed? (ie: how would the producer indicate to the consumer that a new blob is available, and how would the consumer indicate it is done with a particular blob?)
If using memory mapped files, is it better to have one file for all blobs together (ring buffer in a file), or one file for each blob (ring buffer of files)?
You could also use WriteProcessMemory and have the first process to directly post the data into the address space of the second process. You'd need to develop a protocol of some kind. For example, the second process could send the virtual address of its receive buffer to the first process via a named pipe or a shared memory block, then the first process copies the data using WriteProcessMemory and when it is finished, signals the second one via a semaphore or something. This ought to be the fastest way to send data between two processes as it involves a single copy operation. The first process would need to obtain the proper rights on the second one and that should not be a problem as long as both processes belong to the same user.

Can I have a memory mapped file, mapped to two or more processes at the same time (windows)?

I need to have two processes share information through a memory mapped file. One of them is going to only read to the file and the other is only going to write to it.
Is it OK for me just to leave the file always mapped to those two processes? I am currently:
mapping the file to the reader process
Writing
Unmapping the file
Mapping the file to the writer process
reading
Unmapping
And repeating over and over every time I need the processes to share information. My concern is that all these calls to map and unmap may be expensive. Should I keep the file mapped to both process al the time? I could regulate the access to the shared memory through mutexes.
What is the best way to do this kind of task?
You don't need to unmap the file after reading or writing at all. Windows guarantees that the data "visible" in the mapping in two processes will be the same when the local file is mapped on one computer.
If you need to do this repeatedly, then maintain the mapping. Don't prematurely optimize. (If you do find there are problems, you can go back and fix them at that time.)

How do I create a memory-mapped file without a backing file on OSX?

I want to use a library that uses file descriptors as the basic means to access its data. For performance reasons, I don't want to have to commit files to the disk each before I use this library's functions.
I want to create (large) data blobs on the fly, and call into the library to send them to a server. As it stands, I have to write the file to disk, open it, pass the FD to the library, wait for it to finish, then delete the file on disk. Since I can re-create the blobs on demand (and they're not so large that they cause excessive virtual memory paging), saving them to disk buys me nothing, and incurs a large performance penalty.
Is it possible to assign a FD to a block of data that resides only as a memory-mapped entity?
You could mount a memory-backed filesystem: http://lists.apple.com/archives/darwin-kernel/2004/Sep/msg00004.html
Using this mechanism will increase memory pressure on the system, and will probably be paged out if memory pressure is great enough. It might be worthwhile to make it a configuration option, in case the user would rather some other application have first-choice of the memory.
Another option is to use POSIX shared memory segments: http://opengroup.org/onlinepubs/007908799/xsh/shm_open.html (I haven't used POSIX shared memory segments myself; if I understand them correctly, they were designed to solve exactly this problem.)
The shm_open() function creates a memory object and returns a file descriptor. You could then mmap(2) that file descriptor, do your work, and pass the file descriptor to the library.
Don't forget to shm_unlink the object when you're done; POSIX shared memory segments, message queues, and semaphore arrays don't automatically go away when the last process exits.

Opening a custom file on-demand

I have a custom file type that is implemented in sections with a header at the shows the offset and length of each section within the file.
Currently, whenever I want to interact with the file, I must either load and parse the entire thing up front, or else pick only the sections that I need and load just them.
What I would like to do is to achieve a hybrid approach where each of the sections is loaded on-demand.
It seems however that doing this has a lot of potential downsides in terms of leaving filesystem handles open for longer than I would like and the additional code complexity that I would incur.
Are there any standard patterns for this sort of thing? It seems that my options are to:
Just load the entire file and stop grousing about the cycles/memory wasted
Load the entire file into memory as raw bytes and then satisfy any requests for unloaded sections from the memory buffer rather than disk. This saves me the cost of parsing the unneeded sections and requires less memory (since the disk representation is much more compact than the object model around it), but still means that I waste memory for sections that I never end up loading.
Load whatever sections I need right away and close the file but hold onto the source location of the file. Then if another section is requested, re-open the file and load the data. In this case I could get strange results if the underlying file is changed.
Same as the above but leave a file handle open (perhaps allowing read sharing).
Load the file using Memory-Mapped IO and leave a view on the file open.
Any thoughts
If possible, MMAP-ing the whole file is usually the easiest thing to do if you have a random-access pattern. This way you just delegate the loading/unloading issue to the OS and you have 1 & 2 for free.
If you have very special access patterns, you can even use something like fadvise() (I don't the exact Win32 equivalent) to tell the OS your access intend.
If your file is more than 2GB and you can either go the 64bits way or to mmap() the file on demand.
If the file is relatively small, mmap-ing the entire file is good enough. If the file is large, you could leave a mmap view open, and just move it around the file and resize it to view each section when needed.

Memory mapped files optional write possible?

When using memory-mapped files it seems it is either read-only, or write-only. By this I mean you can't:
have one open for writing, and later decide not to save it
have open open for reading, and later decide to save it
Our application uses a writeable memory-mapped file to save data files, but since the user might want to exit without saving changes, we have to use a temporary file which the user actually edits. When the user opts to save the changes, the original file is overwritten with the temporary file so it has the latest changes. This is cumbersome because the files can be very large (>1GB) and it takes a long time to copy them.
I've tried many combinations of the flags used to create the file mapping but none seem to allow the flexibility of saving on demand. Can anyone confirm this is the case? Our application is written in Delphi, but it uses the standard Windows API to create the mapping, in our case
FMapHandle := CreateFileMapping(FFileHandle, nil, PAGE_READWRITE, 0, 2 * 65536, nil);
FBasePointer := MapViewOfFile(FileMapHandle, FILE_MAP_WRITE, FileOffsetHigh,
FileOffsetLow, NumBytes);
I don't think you can. By that I mean you may be able to, but it doesn't make any sense to me :-)
The whole point of a memory-mapped file is that it's a window onto the actual file. If you don't wany changes reflected in the file, you'll probably have to do something like batch up the changes in a data structure (e.g., an array of base address, size and data) and apply them when saving.
In which case, you wouldn't actually need the memory mapped file, just read in and maintain the chunks you want to change (lock the file first if there's a chance of multi-user access).
Update:
Have you thought of the possibility of, when doing a save, deleting the original file and just renaming the temporary file to the original file name? That's likely to be much faster than copying 1G of data from temporary to original. That way, if you don't want it saved, just delete the temporary file and keep the original.
You'll still have to copy the original data to the temporary file when loading but you won't have to copy the temporary data back (whether you save it or not) - that would halve the time taken.
Possible, but non-trivial.
You have to understand memory mapped basics, and the difference between the three modes of memory-mapped files. Both set aside a part of your virtual address space and create a mapping entry in an internal table. No physical RAM is initially allocated. Hence, when you DO try to access the memory, the CPU faults and the OS has to fix up. It does so by copying the file contents to RAM and mapping the RAM to your process, at the faulting address.
Now, the difference between the three modes is how the descriptors are set on the mapped pages. In all cases you get read access on the pages. (The first mode). However, if you ask for write access and subsequently write to it, on your first write the page is marked as writeable and dirty. It can then be written back to the original file, at the discretion of the OS (Second mode). Finally, it's possible to get copy-on-write semantics. You still start out with only read access to the page in memory. When you write to it, the CPU still faults and the OS needs to fix it up. With copy-on-write, that fixup is done by setting the backing store of the changed page to the page file, instead of the original mapped file.
So, in your case you want to use copy-on-write mode. If the user decides to discard the modifications, no problem. You simply discard the memory mapping. All pages that were modified in memory, and were backed by the page file are also discarded.
If the user does decide to save, you've got a slightly harder task. You now need to figure out which parts of the file have changed. Those changes are in memory, and you need to reapply those to the source file. You can do this with Page Guards. So, when the user decides to save, copy all modified pages to a separate memory block, remap the (unchanged) file for write, and apply the changes.

Resources