How do I create a CFSTR_FILEDESCRIPTOR of unknown size? - winapi

I have an email client that allows the user to export a folder of email as a MBOX file. The UX is that they drag the folder from inside the application to an explorer folder (e.g. the Desktop) and a file copy commences via the application adding CFSTR_FILEDESCRIPTOR and CFSTR_FILECONTENTS to the data object being dropped. The issue arises when working out how to specify the size of the "file". Because internally I store the email in a database and it takes quite a while to fully encode the output MBOX, especially if the folder has many emails. Until that encoding is complete I don't have an exact size... just an estimate.
Currently I return an IStream pointer to windows, and over-specify the size in the file descriptor (estimate * 3 or something). Then when I hit the end of my data I return a IStream::Read length less then the input buffer size. Which causes Windows to give up on the copy. In Windows 7 it leaves the "partial" file there in the destination folder which is perfect, but in XP it fails the copy completely, leaving nothing in the destination folder. Other versions may exhibit different behaviour.
Is there a way of dropping a file of unknown size onto explorer that has to be generated by the source application?
Alternatively can I just get the destination folder path and do all the copy progress + output internally to my application? This would be great, I have all the code to do it already. Problem is I'm not the process accepting the drop.
Bonus round: This also needs to work on Linux/GTK and Mac/Carbon so any pointers there would be helpful too.

Windows Explorer use three methods to detect size of stream (in order of priority):
nFileSizeHigh/Low fields of FILEDESCRIPTOR structure if FD_FILESIZE flags is present.
Calling IStream.Seek(0, STREAM_SEEK_END, FileSize).
Calling IStream.Stat. cbSize field of STATSTG structure is used as MAX file size only.
To pass to Explorer a file with unknown size it is necessary:
Remove FD_FILESIZE flags from FILEDESCRIPTOR structure.
IStream.Seek must not be implemented (must return E_NOTIMPL).
IStream.Stat must set cbSize field to -1 (0xFFFFFFFFFFFFFFFF).

Is there a way of dropping a file of unknown size onto explorer that has to be generated by the source application?
When providing CFSTR_FILEDESCRIPTOR, you don't have to provide a file size at all if you don't know it ahead of time. The FD_FILESIZE flag in the FILEDESCRIPTOR::dwFlags field is optional. Provide an exact size only if you know it, otherwise don't provide a size at all, not even an estimate. The copy will still proceed, but the target won't know the final size until IStream::Read() returns S_FALSE to indicate the end of the stream has been reached.
Alternatively can I just get the destination folder path and do all the copy progress + output internally to my application?
A drop operation does not provide any information about the target at all. And for good reason - the source doesn't need to know. A drop target does not need to know where the source data is coming from, only how to access it. The drag source does not need to know how the drop target will use the data, only how to provide the data to it.
Think of what happens if you drop a file (virtual or otherwise) onto an Explorer folder that is implemented as a Shell Namespace Extension that saves the file to another database, or uploads it to a remote server. The filesystem is not involved, so you wouldn't be able to manually copy your data to the target even if you wanted to. Only the target knows how its data is stored.
That being said, the only way I know to get the path of a filesystem folder being dropped into is to drag&drop a dummy file, and then monitor the filesystem for where the drop target creates/copies the file to. Then you can replace the dummy file with your real file. But this is not very reliable, and not very friendly to the target.

Related

Meta data file size changeable

I recently learned about metadata and how its information about the data itself.
Seeing how file size includes among those statistics would it be possible to change the file size to something absurd and unreasonable like 1000 petabytes; in this case, you can what would the effects be on a computer and how would it affect a windows 11 computer?
Not all metadata is directly editable. Some of it are simply properties of the file. You can't just set the file size to an arbitrary number, you have to actually edit the file itself. So to create a 1,000 petabytes file, you need to have that much disk space first.
Another example would by the file type. You can't change a jpeg into a png by setting filetype=png. You have to process and convert the file, giving you an entirely new and different file with it's own set of metadata/properties.

how does physical disk read work with volume shadow for ntfs?

my goal is to make a backup program reading a physical disk (with NTFS partitions) while using VSS for data consistency.
i use windows api's functions CreateFile with '\.\PhysicalDriveN'
as described here (basically, it allow me to access a disk as a big file)
https://support.microsoft.com/en-us/help/100027/info-direct-drive-access-under-win32
for tests i create volume shadows with this command
wmic shadowcopy call create Volume='C:\'
this is a temporary solution, i plan on using VSS via the program itself
My question is:
how are stored Volume shadows? does it stores data that have been modified since the volume shadow or does it store modification made since the last volume shadow?
in the first case:
when i read the disk, will i get consistent data (including ntfs metadata files)?
in the other case:
can i access a volume shadow the same way i would access a disk/partition? (in order to read hidden metadata files, etc)
-im am currenctly using windows 7 but planning on using it on differents version of windows server
-i've read a lot of microsoft doc about VSS but how it work seem really unclear for me (if you answer with one please explain a bit it meaning)
-i know that Volume shadows are stored in the folder "System Volume Information" as files with names like {3808876b-c176-4e48-b7ae-04046e6cc752}
"how are stored Volume shadows? does it stores data that have been modified since the volume shadow or does it store modification made since the last volume shadow?"
A hardware or software shadow copy provider uses one of the following methods for creating a shadow copy:(Answer by msdn doc)
Complete copy This method makes a complete copy (called a "full copy"
or "clone") of the original volume at a given point in time. This copy
is read-only.
Copy-on-write This method does not copy the original volume. Instead,
it makes a differential copy by copying all changes (completed write
I/O requests) that are made to the volume after a given point in time.
Redirect-on-write This method does not copy the original volume, and
it does not make any changes to the original volume after a given
point in time. Instead, it makes a differential copy by redirecting
all changes to a different volume.
"when i read the disk, will i get consistent data (including ntfs metadata files)?"
Even if an application does not have its files open in exclusive mode, it is possible—because of the finite time needed to open, back up, and close a file—that files copied to storage media may not all reflect the same application state.
"can i access a volume shadow the same way i would access a disk/partition? (in order to read hidden metadata files, etc)"
Requester Access to Shadow Copied Data
Paths on the shadow copied volume are obtained by replacing the root
of the original path with the device object. For example, given a path
on the original volume of "C:\DATABASE*.mdb" and a VSS_SNAPSHOT_PROP
instance of snapProp, you would obtain the path on the shadow copied
volume by concatenating snapProp.m_pwszSnapshotDeviceObject, "\",
and "\DATABASE*.mdb".
So i did more test and actually Shadow Volume are made at block level not file level. it mean that by using createfile with the path
\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy1 it would work in a similar way than using createfile with the path \\.\C:
So yeah you can access a shadow copy file system, it have it own boot sector, mft, etc.

where does NTFS stores file attributes

I'm currently developing a file system and doing some research on existing ones, and in the file system I have in mind I would like to add extra metadata (or file attributes) to files besides the ones generally stored by FS's like NTFS that stores for each file its filename, type, path, size, date of creation and modification, and the proprietary.
In NTFS in particular I found that the $MFT stores for each file attributes like the file's name in $FILENAME and its timestamps in $STANDARD_INFORMATION, but what about the rest of its attributes like its owner, location, size and type?
I just ask this in order to understand if its possible to complement a FS like NTFS with extra metadata about files, like I said before, but I can't seem to understand where it stores the metadata it already has...
The owner can be determined via the $SECURITY_DESCRIPTOR attribute. The location, I believe you mean path on the volume, can only be determined by parsing directories until you come across that particular file (the INDX blocks that make up the B*-Trees of the filesystem store references to file records in the MFT). File size can be determined accurately from the $DATA attribute.
The file type can only be determined either from the file's content (certain file formats have markers) or from the extension contained in the file name. The file system is agnostic when it comes to file types. If you were referring to file types as in files, directories, links etc, these can be determined from the file record itself.
As for adding extra metadata it would be unwise to add additional attributes that the NTFS driver doesn't recognize since you would have to write your own proprietary driver and distribute it. Machines that do not have that driver will see the drive as corrupt. You should also consider what happens when the attributes take more than the size of the file record (which is fixed in newer versions of NTFS and it's 1024 bytes, unlike old versions where the size of a record may vary).
A good idea to solve this problem and make the file system available to users that don't have your software or driver installed is to add named streams. You could use your own naming convention and store whatever you like and the NTFS driver will take care of the records for you even if they get outside the 1024 limit. The users that do not have the software installed can view that file system and won't know that those named streams exist since applications typically open the unnamed stream passed by the NTFS driver by default (unless otherwise specified).

Get file offset on disk/cluster number

I need to get any information about where the file is physically located on the NTFS disk. Absolute offset, cluster ID..anything.
I need to scan the disk twice, once to get allocated files and one more time I'll need to open partition directly in RAW mode and try to find the rest of data (from deleted files). I need a way to understand that the data I found is the same as the data I've already handled previously as file. As I'm scanning disk in raw mode, the offset of the data I found can be somehow converted to the offset of the file (having information about disk geometry). Is there any way to do this? Other solutions are accepted as well.
Now I'm playing with FSCTL_GET_NTFS_FILE_RECORD, but can't make it work at the moment and I'm not really sure it will help.
UPDATE
I found the following function
http://msdn.microsoft.com/en-us/library/windows/desktop/aa364952(v=vs.85).aspx
It returns structure that contains nFileIndexHigh and nFileIndexLow variables.
Documentation says
The identifier that is stored in the nFileIndexHigh and nFileIndexLow members is called the file ID. Support for file IDs is file system-specific. File IDs are not guaranteed to be unique over time, because file systems are free to reuse them. In some cases, the file ID for a file can change over time.
I don't really understand what is this. I can't connect it to the physical location of file. Is it possible later to extract this file ID from MFT?
UPDATE
Found this:
This identifier and the volume serial number uniquely identify a file. This number can change when the system is restarted or when the file is opened.
This doesn't satisfy my requirements, because I'm going to open the file and the fact that ID might change doesn't make me happy.
Any ideas?
Use the Defragmentation IOCTLs. For example, FSCTL_GET_RETRIEVAL_POINTERS will tell you the extents which contain file data.

Memory mapped files optional write possible?

When using memory-mapped files it seems it is either read-only, or write-only. By this I mean you can't:
have one open for writing, and later decide not to save it
have open open for reading, and later decide to save it
Our application uses a writeable memory-mapped file to save data files, but since the user might want to exit without saving changes, we have to use a temporary file which the user actually edits. When the user opts to save the changes, the original file is overwritten with the temporary file so it has the latest changes. This is cumbersome because the files can be very large (>1GB) and it takes a long time to copy them.
I've tried many combinations of the flags used to create the file mapping but none seem to allow the flexibility of saving on demand. Can anyone confirm this is the case? Our application is written in Delphi, but it uses the standard Windows API to create the mapping, in our case
FMapHandle := CreateFileMapping(FFileHandle, nil, PAGE_READWRITE, 0, 2 * 65536, nil);
FBasePointer := MapViewOfFile(FileMapHandle, FILE_MAP_WRITE, FileOffsetHigh,
FileOffsetLow, NumBytes);
I don't think you can. By that I mean you may be able to, but it doesn't make any sense to me :-)
The whole point of a memory-mapped file is that it's a window onto the actual file. If you don't wany changes reflected in the file, you'll probably have to do something like batch up the changes in a data structure (e.g., an array of base address, size and data) and apply them when saving.
In which case, you wouldn't actually need the memory mapped file, just read in and maintain the chunks you want to change (lock the file first if there's a chance of multi-user access).
Update:
Have you thought of the possibility of, when doing a save, deleting the original file and just renaming the temporary file to the original file name? That's likely to be much faster than copying 1G of data from temporary to original. That way, if you don't want it saved, just delete the temporary file and keep the original.
You'll still have to copy the original data to the temporary file when loading but you won't have to copy the temporary data back (whether you save it or not) - that would halve the time taken.
Possible, but non-trivial.
You have to understand memory mapped basics, and the difference between the three modes of memory-mapped files. Both set aside a part of your virtual address space and create a mapping entry in an internal table. No physical RAM is initially allocated. Hence, when you DO try to access the memory, the CPU faults and the OS has to fix up. It does so by copying the file contents to RAM and mapping the RAM to your process, at the faulting address.
Now, the difference between the three modes is how the descriptors are set on the mapped pages. In all cases you get read access on the pages. (The first mode). However, if you ask for write access and subsequently write to it, on your first write the page is marked as writeable and dirty. It can then be written back to the original file, at the discretion of the OS (Second mode). Finally, it's possible to get copy-on-write semantics. You still start out with only read access to the page in memory. When you write to it, the CPU still faults and the OS needs to fix it up. With copy-on-write, that fixup is done by setting the backing store of the changed page to the page file, instead of the original mapped file.
So, in your case you want to use copy-on-write mode. If the user decides to discard the modifications, no problem. You simply discard the memory mapping. All pages that were modified in memory, and were backed by the page file are also discarded.
If the user does decide to save, you've got a slightly harder task. You now need to figure out which parts of the file have changed. Those changes are in memory, and you need to reapply those to the source file. You can do this with Page Guards. So, when the user decides to save, copy all modified pages to a separate memory block, remap the (unchanged) file for write, and apply the changes.

Resources