Memory mapped files optional write possible? - windows

When using memory-mapped files it seems it is either read-only, or write-only. By this I mean you can't:
have one open for writing, and later decide not to save it
have open open for reading, and later decide to save it
Our application uses a writeable memory-mapped file to save data files, but since the user might want to exit without saving changes, we have to use a temporary file which the user actually edits. When the user opts to save the changes, the original file is overwritten with the temporary file so it has the latest changes. This is cumbersome because the files can be very large (>1GB) and it takes a long time to copy them.
I've tried many combinations of the flags used to create the file mapping but none seem to allow the flexibility of saving on demand. Can anyone confirm this is the case? Our application is written in Delphi, but it uses the standard Windows API to create the mapping, in our case
FMapHandle := CreateFileMapping(FFileHandle, nil, PAGE_READWRITE, 0, 2 * 65536, nil);
FBasePointer := MapViewOfFile(FileMapHandle, FILE_MAP_WRITE, FileOffsetHigh,
FileOffsetLow, NumBytes);

I don't think you can. By that I mean you may be able to, but it doesn't make any sense to me :-)
The whole point of a memory-mapped file is that it's a window onto the actual file. If you don't wany changes reflected in the file, you'll probably have to do something like batch up the changes in a data structure (e.g., an array of base address, size and data) and apply them when saving.
In which case, you wouldn't actually need the memory mapped file, just read in and maintain the chunks you want to change (lock the file first if there's a chance of multi-user access).
Update:
Have you thought of the possibility of, when doing a save, deleting the original file and just renaming the temporary file to the original file name? That's likely to be much faster than copying 1G of data from temporary to original. That way, if you don't want it saved, just delete the temporary file and keep the original.
You'll still have to copy the original data to the temporary file when loading but you won't have to copy the temporary data back (whether you save it or not) - that would halve the time taken.

Possible, but non-trivial.
You have to understand memory mapped basics, and the difference between the three modes of memory-mapped files. Both set aside a part of your virtual address space and create a mapping entry in an internal table. No physical RAM is initially allocated. Hence, when you DO try to access the memory, the CPU faults and the OS has to fix up. It does so by copying the file contents to RAM and mapping the RAM to your process, at the faulting address.
Now, the difference between the three modes is how the descriptors are set on the mapped pages. In all cases you get read access on the pages. (The first mode). However, if you ask for write access and subsequently write to it, on your first write the page is marked as writeable and dirty. It can then be written back to the original file, at the discretion of the OS (Second mode). Finally, it's possible to get copy-on-write semantics. You still start out with only read access to the page in memory. When you write to it, the CPU still faults and the OS needs to fix it up. With copy-on-write, that fixup is done by setting the backing store of the changed page to the page file, instead of the original mapped file.
So, in your case you want to use copy-on-write mode. If the user decides to discard the modifications, no problem. You simply discard the memory mapping. All pages that were modified in memory, and were backed by the page file are also discarded.
If the user does decide to save, you've got a slightly harder task. You now need to figure out which parts of the file have changed. Those changes are in memory, and you need to reapply those to the source file. You can do this with Page Guards. So, when the user decides to save, copy all modified pages to a separate memory block, remap the (unchanged) file for write, and apply the changes.

Related

Two views of a same memory mapped file in windows, how do they interact flush-wise?

I'm specifically interested in the flush behaviour.
Suppose we created a MMF with CreateFileMapping(), and opened two views, V1 and V2 using MapViewOfFile() with zero offset.
Then I write something to A=V1+a and something to B=V2+b such that A and B belong to different physical memory pages.
Then if I flush the whole first view using FlushViewOfFile(V1, 0), will the dirty pages of the second view also be affected?
My goal is to have 2 views of the same file, where the first view is used for very small writes and very frequent flushes, while the second view is used for massive writes and is flushed only once in a while.
It is important that flushing small writes wouldn't cause flushing of massive writes.
Is this a default behaviour? If not, how to achieve it?
Thanks
CreateFileMappingA, see also CreateProcess, DuplicateHandle and OpenFileMapping functions.
"Creating a file mapping object does not actually map the view into a
process address space. The MapViewOfFile and MapViewOfFileEx
functions map a view of a file into a process address space.
With one important exception, file views derived from any file mapping object that is backed by the same file are coherent
or identical at a specific time. Coherency is guaranteed for views
within a process and for views that are mapped by different
processes.
The exception is related to remote files. Although CreateFileMapping works with remote files, it does not keep them
coherent. For example, if two computers both map a file as writable,
and both change the same page, each computer only sees its own writes
to the page. When the data gets updated on the disk, it is not
merged."
According to the FlushViewOfFile,
"Flushing a range of a mapped view initiates writing of dirty
pages within that range to the disk. Dirty pages are those whose
contents have changed since the file view was mapped. The
FlushViewOfFile function does not flush the file metadata, and it does not wait to return until the changes are flushed from the underlying hardware disk cache and physically written to disk. To
flush all the dirty pages plus the metadata for the file and ensure
that they are physically written to disk, call FlushViewOfFile and
then call the FlushFileBuffers function."
So, when you close views in UnmapViewOfFile
"When a process has finished with the file mapping object, it should
destroy all file views in its address space by using the
UnmapViewOfFile function for each file view.
Unmapping a mapped view of a file invalidates the range occupied by the view in the address space of the process and makes the range
available for other allocations. It removes the working set entry
for each unmapped virtual page that was part of the working set of the
process and reduces the working set size of the process. It also
decrements the share count of the corresponding physical page.
Modified pages in the unmapped view are not written to disk until their share count reaches zero, or in other words, until they are
unmapped or trimmed from the working sets of all processes that share
the pages. Even then, the modified pages are written "lazily" to
disk; that is, modifications may be cached in memory and written to
disk at a later time. To minimize the risk of data loss in the event
of a power failure or a system crash, applications should explicitly
flush modified pages using the FlushViewOfFile function."

How to get a Win32 program to update the file size while still writing files

I have a Win32 program that keeps a file open and writes data to it over a period of several hours. I'd like for the file size, as shown in an Explorer window, to be updated every so often.
As an example, when a browser is downloading a large file, you can see the file size change over time, even though the file is still downloading.
With my current naive implementation, the file size remains zero until I close the file.
How do I do this in Win32? Currently the file is open using std::ofstream. Is this a proper application of std::ostream::flush() ? Or do I need to close and reopen the file with some regularity?
std::ostream::flush() makes sure you have your data safe on disk. Flushing the buffer is a valid use case in situations where the automatic flushes ain't good enough for you (e.g. there's too little data written over too long periods, the data is written constantly but needs to be accessible constantly too, you need to be sure the data gets logged in case of crash or power down etc.); yet, on some OS/filesystem combinations (see Why is the file size reported incorrectly for files that are still being written to?), that still won't update the file size accordingly. On Win32, you usually won't see size updates before actually closing/reopening the handle; sometimes re-reading the dir etc. will help, and sometimes it simply won't.
As such, you can use e.g. ReOpenFile to force that update, or simply use close/open instead of flushing. The exact solution depends whether you need the updated filesize so direly and the reduced output rate is not a real problem (in which case reopening is the best option), or if you can live with a wrong size reported (in which case flushes are your best option IMO).

How data is stored in windows clipboard

Whenever we copy any multimedia file or any file except text (Not sure about it) in clipboard , does it stores the address of file or data copy because whenever we copy any movie of like 3 GB , C disk size doesn't increases it means clipboard stores the address not the copy .Is it true ???
If you're copying files, you're dealing with file pointers like HDrop, which take almost no space. And almost no time to perform the copy. If you actually had to wait for 3 GB to be copied into a memory buffer, you would be waiting a long time, there would be a lot of I/O, and unless you had a lot of memory, your system would need to utilize pagefile space, thereby causing even more I/O.
You should also realize that unlike a text/HTML/RTF/graphic copy (where the data is actually on the clipboard), the clipboard cannot be used as a safety net. With text, you can copy, then delete the text, and paste it to get it back. Not so with files. If you copy a file, then delete that file, you won't be able to paste it. This may seem obvious, but it's important to understand when you're using any kind of clipboard manager that lets you go back and paste prior clips. You can paste a file pointer from 3 days ago, for example, but the result won't be that file from 3 days ago. It'll be whatever that file pointer references on today's disk.
does it stores the address of file
Basically yes, but not the really address but so-called handle of the file.
It's an abstract reference value to a resource, often memory or an open file, or a pipe.
Properly, in Windows, (and generally in computing) a handle is an abstraction which hides a real memory address from the API user, allowing the system to reorganize physical memory transparently to the program. Resolving a handle into a pointer locks the memory, and releasing the handle invalidates the pointer. In this case think of it as an index into a table of pointers.
You use the index for the system API calls, and the system can change the pointer in the table at will.
You can take a look at this article if you want to know how exactly the clipboard works: http://blogs.msdn.com/b/ntdebugging/archive/2012/03/16/how-the-clipboard-works-part-1.aspx
#Hot Cool stud :
To copy the path of a file/folder
Press Shift Down, select the file or folder, Right Click, you will see an extra menu_option as "Copy as path". Select it, and the path is copied to the clipboard

What does SetFileValidData doing ? what is the difference with SetEndOfFile?

I look for a way to extend a file asynchronously and efficiently .
In a support document Asynchronous Disk I/O Appears as Synchronous on Windows NT, Windows 2000, and Windows XP said:
NOTE: Applications can make the previously mentioned write operation
asynchronous by changing the Valid Data Length of the file by using
the SetFileValidData function, and then issuing a WriteFile.
in MSDN, SetFileValidData is a function for Sets the valid data length of the specified file.
But I still not understand what is the "valid data", what is the difference between it and the size of file?
I can use SetFilePointerEx and SetEndOfFile to extend the file size, but how do this by SetFileValidData?
SetFileValidData cannot input a argument large than the size of file. In this case, what is the living meaning of SetFileValidData?
When you use SetEndOfFile to increase the length of a file, the logical file length changes and the necessary disk space is allocated, but no data is actually physically written to the disk sectors corresponding to the new part of the file. The valid data length remains the same as it was.
This means you can use SetEndOfFile to make a file very large very quickly, and if you read from the new part of the file you'll just get zeros. The valid data length increases when you write actual data to the new part of the file.
That's fine if you just want to reserve space, and will then be writing data to the file sequentially. But if you make the file very large and immediately write data near the end of it, zeros need to be written to the new part of the file, which will take a long time. If you don't actually need the file to contain zeros, you can use SetFileValidData to skip this step; the new part of the file will then contain random data from previously deleted files.
Addendum:
The rules for sparse files are different.
You should not use SetFileValidData on a file that non-privileged users have read access to; this could leak content from deleted files that belonged to other users.
Please note that SetEndOfFile() doesn't write any zeros to any allocated sectors on disk, it just allocates the space pointers inside MFT records and then updates the space bitmap of the whole file system. But the OS, or FS, will record the valid/logical file length in its MFT record.
If you enlarge the file, from 1GB to 2GB, then the appended 1GB should be all zeros, but the FS won't write the zeros to disks, it refers to this file's valid length to know that the 1GB should be zeros. If you try to read from this enlarged 1GB portion, it will fill zeros directly in RAM and then feedback to your application. But if you write any byte inside this 1GB portion, the FS has to fill with zeros from the original 1GB offset to the current pointer that your application is trying to write to, but not the other bytes from the current location to the tail of the file. Meanwhile, it records the valid/logical length to be from 0 to the current location, the physical size and allocated size is still 2GB.
But, if you use SetFileValidData(), the FS will set the valid length to 2GB directly, and won't bother to fill any zeros. Whatever location you write to, it just writes, but whatever location you read from, you may read out some garbage data which was previously generated by other applications before the file was extended into that disk space.
Agree with Harry Johnston's answer, and from the practice point of view, while SetFileValidData has performance advantage because it does not require writing zeros, it does have security implications because the file might contain data from other deleted files. So a special privilege, SE_MANAGE_VOLUME_NAME, is required, as MSDN mentioned: http://msdn.microsoft.com/en-us/library/windows/desktop/aa365544(v=vs.85).aspx
The reason is that, if the user account of the running program doesn't have that privilege, using SetFileValidData can expose other user's deleted data into the view of that particular file, so normal users (non-administrators) are not allowed to do that. Even for privileged users, they still need to take care to use ACL (access control lists) in the file system to protect that file so that it is not shared with non-privileged users.
It seems that SenEndofFile does not really allocate reserved disk space for the target file, SetFileValidData is responsible for the job.
Refered to MSDN,
You can use the SetFileValidData function to create large files in very specific circumstances so that the performance of subsequent file I/O can be better than other methods. Specifically, if the extended portion of the file is large and will be written to randomly, such as in a database type of application, the time it takes to extend and write to the file will be faster than using SetEndOfFile and writing randomly.
If SetEndOfFile really allocate space, then SetFileValidData will do nothing better than SetEndOfFile when writing randomly. So SetEndOfFile may just create a sparse file with hole(s), while SetFileValidData do the actual allocation.

Opening a custom file on-demand

I have a custom file type that is implemented in sections with a header at the shows the offset and length of each section within the file.
Currently, whenever I want to interact with the file, I must either load and parse the entire thing up front, or else pick only the sections that I need and load just them.
What I would like to do is to achieve a hybrid approach where each of the sections is loaded on-demand.
It seems however that doing this has a lot of potential downsides in terms of leaving filesystem handles open for longer than I would like and the additional code complexity that I would incur.
Are there any standard patterns for this sort of thing? It seems that my options are to:
Just load the entire file and stop grousing about the cycles/memory wasted
Load the entire file into memory as raw bytes and then satisfy any requests for unloaded sections from the memory buffer rather than disk. This saves me the cost of parsing the unneeded sections and requires less memory (since the disk representation is much more compact than the object model around it), but still means that I waste memory for sections that I never end up loading.
Load whatever sections I need right away and close the file but hold onto the source location of the file. Then if another section is requested, re-open the file and load the data. In this case I could get strange results if the underlying file is changed.
Same as the above but leave a file handle open (perhaps allowing read sharing).
Load the file using Memory-Mapped IO and leave a view on the file open.
Any thoughts
If possible, MMAP-ing the whole file is usually the easiest thing to do if you have a random-access pattern. This way you just delegate the loading/unloading issue to the OS and you have 1 & 2 for free.
If you have very special access patterns, you can even use something like fadvise() (I don't the exact Win32 equivalent) to tell the OS your access intend.
If your file is more than 2GB and you can either go the 64bits way or to mmap() the file on demand.
If the file is relatively small, mmap-ing the entire file is good enough. If the file is large, you could leave a mmap view open, and just move it around the file and resize it to view each section when needed.

Resources