Why isn't copy operation implemented in kernel? - windows

It's my understanding that most file IO operations are implemented in the kernel, such as CRUD, move or remove. However file copy is not implemented as a kernel level API.
In order to detect a file copy in the kernel one will need to use heuristics approach (discussion on this approach), e.g. as detect file reads, file creates and file writes from the same user with the same file name, but different paths.
Why copy is a user land operation?

First, because caring about whether or not two different files have the same content, where one file's content is copied directly from the other, is a user-space concern that has no logical reason to exist inside a kernel.
At best.
Bytes are bytes.
Second, how would the kernel distinguish copying a file between what are just two different file descriptors? See the man page for sendfile(). Why should the kernel track if the calling user called sendfile() to send the contents of a file to a TCP socket to who-knows-where or to another file?
Third, even if the kernel tracked copying a file, what on God's good Earth would it do with such data?
If you care about such file copy events, set up auditing.

Related

Compare-and-Swap over POSIX-compliant filesystem objects

There are several operations which POSIX-compliant operating systems can do atomically with filesystem objects (files and folders). Here is a list of such presumably atomic operations:
rename or move file or folder
create hardlink
create symlink
create folder
create and open an empty file
Is it possible to build Compare-and-Swap algorithm for manipulating a file based on these operations?
Let’s suppose we have several processes which are performing concurrent read/write on a single file. A file is characterized by its revision. Let’s say the revision is added to file name, and there is a symlink to the file which can be used by the processes to read it. The processes cannot (for some reasons) synchronize with mutexes, semaphores and so on, but they are able to create auxiliary files and folders. Are they able to perform revision-based Compare-and-Swap modifications of the file (create a new file, create and rename symlink), in the meaning that if several processes are going to modify it simultaneously, one will success and the rest will fail with some error code?
The algorithm has to be resistant to sudden termination of any processes at any step of algorithm.
Oh boy.
Let's assume that each process has access to a unique identifier, to avoid problems breaking symmetry. Here's a wait-free implementation of a one-shot consensus object.
Create a directory with a unique name.
Create a file in that directory whose name is the creating process's input.
Rename the directory to the name of the consensus object. This will fail unless this is the first such rename.
List the directory to which we tried to rename our own. The name of the file inside is the consensus decision.
Now it's possible to simulate an arbitrary object in a wait-free manner, using standard results in distributed computing. Have fun garbage collecting =P
If you consider fcntl(2) in your list of atomic operations, you can easily build a general mutex primitive. I use the flock(1) command line tool to do this in shell scripts regularly. (flock(1) is part of the util-linux-ng package.)
flock(2) is not specified by POSIX but fcntl(2) is. I think flock(1) may use fcntl(2) in some cases (e.g. NFS).
So the algorithm is something like:
Do a non-blocking fcntl() on some file that is unique to the resource you want to manipulate. This may be the data file itself, or some empty file that every process agrees to use as the mutex object.
2a. If the fcntl is successful, swap the data in the file.
2b. If the fcntl is not successful, don't touch the data file.
Release the fcntl on the file.
You could of course do a blocking fcntl(2), but there won't be any way to know what order each process blocks and gets woken up, so whether this is appropriate depends on the application.
Note that fcntl(2) is advisory, so it won't prevent unwanted manipulation of the data file.

True file descriptor clone

Why is there no true file descriptor clone mechanism when possible, like it is for disk files.
POSIX:
After a successful return from one of these system calls, the old and
new file descriptors may be used interchangeably. They refer to the
same open file description (see open(2)) and thus share file offset
and file status flags; for example, if the file offset is modified by
using lseek(2) on one of the descriptors, the offset is also changed
for the other.
Windows:
The duplicate handle refers to the same object as the original handle. Therefore, any changes to the object are reflected through both handles. For example, if you duplicate a file handle, the current file position is always the same for both handles. For file handles to have different file positions, use the CreateFile function to create file handles that share access to the same file.
Reasons for having a clone primitive:
When manipulating a file archive, I want each file in the archive has to be accessible independently. The file archive should behave somewhat like a virtual filesystem.
File type checking. Being able to clone file offsets makes it possible to read a small portion of the file without affecting the original position.
You should consider the following: file descriptor is merely an offset into the array of "file" (literally, that's what they are called) object pointers on the kernel side. So when you duplicate the file descriptor, the kernel will simply copy the value of the file pointer from one location in the array to another and increment the reference count on the pointed to object.
Thus, your issue is not with file descriptor duplication, but with management of the file offsets. The easy answer for this: do it yourself. That is, associate the current file offset with each file descriptor on the application side explicitly.
Of course, the most basic file access system calls read() and write() make use of kernel maintained file offset variable, if it's available (and it's only available if you are dealing with "normal" random access files). But more advanced file access system calls will expect the desired file offset to be supplied by the application on each invocation. Those include pread()/pwrite(), preadv()/pwritev() and aio_read()/aio_write (the later is probably the best approach for writing parallel access applications like the one you described).
On Windows, ReadFile()/WriteFile(), ReadFileScatter()/WriteFileGather() and ReadFileEx()/WriteFileEx() analogously expect to be passed the file offset on every invocation (via the lpOverlapped argument).

Storing data associated with opened file in linux kernel space

I want to observe some operations performed on files (and do it from kernel level). I have to attach some data to opened file descriptor (for instance a single int value). More precisely, for each opened file in sys_do_open I decide whether to track the file or not. For further usage I have to store somewhere this decision.
There is a field private_data in struct file, it seems to be good enough for my needs, but I suppose it's used also by other modules.
So, how can I store some data associated with opened file descriptor (for every opened file)?Any suggestions?

Move or copy and truncate a file that is in use

I want to be able to (programmatically) move (or copy and truncate) a file that is constantly in use and being written to. This would cause the file being written to would never be too big.
Is this possible? Either Windows or Linux is fine.
To be specific what I'm trying to do is log video with FFMPEG and create hour long videos.
It is possible in both Windows and Linux, but it would take cooperation between the applications involved. If the application that is writing the new data to the file is not aware of what the other application is doing, it probably would not work (well ... there is some possibility ... back to that in a moment).
In general, to get this to work, you would have to open the file shared. For example, if using the Windows API CreateFile, both applications would likely need to specify FILE_SHARE_READ and FILE_SHARE_WRITE. This would allow both (multiple) applications to read and write the file "concurrently".
Beyond sharing the file, though, it would also be necessary to coordinate the operations between the applications. You would need to use some kind of locking mechanism (either by locking some part of the file or some shared mutex/semaphore). Note that if you use file locking, you could lock some known offset in the file to act as a "semaphore" (it can even be a byte value beyond the physical end of the file). If one application were appending to the file at the same exact time that the other application were truncating it, then it would lead to unpredictable results.
Back to the comment about both applications needing to be aware of each other ... It is possible that if both applications opened the file exclusively and kept retrying the operations until they succeeded, then perform the operation, then close the file, it would essentially allow them to work without "knowledge" of each other. However, that would probably not work very well and not be very efficient.
Having said all that, you might want to consider alternatives for efficiency reasons. For example, if it were possible to have the writing application write to new files periodically, it might be more efficient than having to "move" the data constantly out of one file to another. Also, if you needed to maintain some portion of the file (e.g., move out the first 100 MB to another file and then move the second 100 MB to the beginning) that could be a fairly expensive operation as well.
logrotate would be a good option is linux, comes stock on just about any distro. I'm sure there's a similar windows service out there somewhere

Memory mapped files optional write possible?

When using memory-mapped files it seems it is either read-only, or write-only. By this I mean you can't:
have one open for writing, and later decide not to save it
have open open for reading, and later decide to save it
Our application uses a writeable memory-mapped file to save data files, but since the user might want to exit without saving changes, we have to use a temporary file which the user actually edits. When the user opts to save the changes, the original file is overwritten with the temporary file so it has the latest changes. This is cumbersome because the files can be very large (>1GB) and it takes a long time to copy them.
I've tried many combinations of the flags used to create the file mapping but none seem to allow the flexibility of saving on demand. Can anyone confirm this is the case? Our application is written in Delphi, but it uses the standard Windows API to create the mapping, in our case
FMapHandle := CreateFileMapping(FFileHandle, nil, PAGE_READWRITE, 0, 2 * 65536, nil);
FBasePointer := MapViewOfFile(FileMapHandle, FILE_MAP_WRITE, FileOffsetHigh,
FileOffsetLow, NumBytes);
I don't think you can. By that I mean you may be able to, but it doesn't make any sense to me :-)
The whole point of a memory-mapped file is that it's a window onto the actual file. If you don't wany changes reflected in the file, you'll probably have to do something like batch up the changes in a data structure (e.g., an array of base address, size and data) and apply them when saving.
In which case, you wouldn't actually need the memory mapped file, just read in and maintain the chunks you want to change (lock the file first if there's a chance of multi-user access).
Update:
Have you thought of the possibility of, when doing a save, deleting the original file and just renaming the temporary file to the original file name? That's likely to be much faster than copying 1G of data from temporary to original. That way, if you don't want it saved, just delete the temporary file and keep the original.
You'll still have to copy the original data to the temporary file when loading but you won't have to copy the temporary data back (whether you save it or not) - that would halve the time taken.
Possible, but non-trivial.
You have to understand memory mapped basics, and the difference between the three modes of memory-mapped files. Both set aside a part of your virtual address space and create a mapping entry in an internal table. No physical RAM is initially allocated. Hence, when you DO try to access the memory, the CPU faults and the OS has to fix up. It does so by copying the file contents to RAM and mapping the RAM to your process, at the faulting address.
Now, the difference between the three modes is how the descriptors are set on the mapped pages. In all cases you get read access on the pages. (The first mode). However, if you ask for write access and subsequently write to it, on your first write the page is marked as writeable and dirty. It can then be written back to the original file, at the discretion of the OS (Second mode). Finally, it's possible to get copy-on-write semantics. You still start out with only read access to the page in memory. When you write to it, the CPU still faults and the OS needs to fix it up. With copy-on-write, that fixup is done by setting the backing store of the changed page to the page file, instead of the original mapped file.
So, in your case you want to use copy-on-write mode. If the user decides to discard the modifications, no problem. You simply discard the memory mapping. All pages that were modified in memory, and were backed by the page file are also discarded.
If the user does decide to save, you've got a slightly harder task. You now need to figure out which parts of the file have changed. Those changes are in memory, and you need to reapply those to the source file. You can do this with Page Guards. So, when the user decides to save, copy all modified pages to a separate memory block, remap the (unchanged) file for write, and apply the changes.

Resources