Why do we need to change files i_generation number? - linux-kernel

Why we have an interface to change the i_generation number of file using chattr -v when FS already increments it when allocating a new inode from unused list?
Is there any other use of it apart from its usage in NFS(detecting stale file handles on inode reused after file deletion)?

Changing the gen number invalidates any open file handles right? If so, all future reads of the file will fail. So, if I were to guess, I would say this is one way to invalidate all open file handles, kind of in a lazy fashion, without the server maintaining any state about who has open file handles.

Related

How to get the inode structure of the /proc/pid/fd/n file of an opened file in kernel?

For a given fd number, I know I can get the inode structure of the opened file by calling fget_raw. But what I actually want to get is the inode structure of the file /proc/pid/fd/n, which is actually another file in procfs.
This may be done by calling path_lookup with a spliced path of proc/pid/fd/n, but I think it's not the best way of doing this. Is there a way to get the inode directly from a fd number or a file structure? I think such process should occur when a close syscall is made, because a close call should get the /proc/pid/fs/n entry in order to delete it, but I can't locate related code.
I'm doing this because I want to get the time when the socket was opened.

Golang simultaneous read/write to the file without explicit file lock

I have a situation where I need to concurrently read/write from/to the file, but the scope of operations is limited:
append only, no random offset writes
read from random position, where I know for sure the content has been written before(via append, internal access serialization via golang channel to ensure random read happens only after content's been appended)
there is only one process running
This is a high loaded application and I would like to avoid locking file for each read/write I do
I was going to open 2 files - one for read, another for append only
would doing so create some potential issues/bugs?
what is the recommended practice if I would like to avoid file locking for each read/write I do?
p.s. golang, linux, ext4
I'll assume by "random read" you actually mean "arbitrary read".
If I understand your use case correctly, you don't need to seek or lock or do anything manual. UNIX has this covered via O_APPEND. Here is what you can do:
Open the file with os.O_APPEND. This way every write, regardless of any preceding operations, will go to the end of the file
When reading use File.ReadAt. This lets you specify arbitrary offsets for your reads
Using this scheme you can avoid any sort of locking: the OS will do it for you. Because of the buffer cache this scheme is not even inefficient: appends and reads are pretty much independent.

True file descriptor clone

Why is there no true file descriptor clone mechanism when possible, like it is for disk files.
POSIX:
After a successful return from one of these system calls, the old and
new file descriptors may be used interchangeably. They refer to the
same open file description (see open(2)) and thus share file offset
and file status flags; for example, if the file offset is modified by
using lseek(2) on one of the descriptors, the offset is also changed
for the other.
Windows:
The duplicate handle refers to the same object as the original handle. Therefore, any changes to the object are reflected through both handles. For example, if you duplicate a file handle, the current file position is always the same for both handles. For file handles to have different file positions, use the CreateFile function to create file handles that share access to the same file.
Reasons for having a clone primitive:
When manipulating a file archive, I want each file in the archive has to be accessible independently. The file archive should behave somewhat like a virtual filesystem.
File type checking. Being able to clone file offsets makes it possible to read a small portion of the file without affecting the original position.
You should consider the following: file descriptor is merely an offset into the array of "file" (literally, that's what they are called) object pointers on the kernel side. So when you duplicate the file descriptor, the kernel will simply copy the value of the file pointer from one location in the array to another and increment the reference count on the pointed to object.
Thus, your issue is not with file descriptor duplication, but with management of the file offsets. The easy answer for this: do it yourself. That is, associate the current file offset with each file descriptor on the application side explicitly.
Of course, the most basic file access system calls read() and write() make use of kernel maintained file offset variable, if it's available (and it's only available if you are dealing with "normal" random access files). But more advanced file access system calls will expect the desired file offset to be supplied by the application on each invocation. Those include pread()/pwrite(), preadv()/pwritev() and aio_read()/aio_write (the later is probably the best approach for writing parallel access applications like the one you described).
On Windows, ReadFile()/WriteFile(), ReadFileScatter()/WriteFileGather() and ReadFileEx()/WriteFileEx() analogously expect to be passed the file offset on every invocation (via the lpOverlapped argument).

Is appending to a file atomic with Windows/NTFS?

If I'm writing a simple text log file from multiple processes, can they overwrite/corrupt each other's entries?
(Basically, this question Is file append atomic in UNIX? but for Windows/NTFS.)
You can get atomic append on local files. Open the file with FILE_APPEND_DATA access (Documented in WDK). When you omit FILE_WRITE_DATA access then all writes will ignore the the current file pointer and be done at the end-of file. Or you may use FILE_WRITE_DATA access and for append writes specify it in overlapped structure (Offset = FILE_WRITE_TO_END_OF_FILE and OffsetHigh = -1 Documented in WDK).
The append behavior is properly synchronized between writes via different handles. I use that regularly for logging by multiple processes. I do write BOM at every open to offset 0 and all other writes are appended. The timestamps are not a problem, they can be sorted when needed.
Even if append is atomic (which I don't believe it is), it may not give you the results you want. For example, assuming a log includes a timestamp, it seems reasonable to expect more recent logs to be appended after older logs. With concurrency, this guarantee doesn't hold - if multiple processes are waiting to write to the same file, any one of them might get the write lock - not just the oldest one waiting. Thus, logs can be written out of sequence.
If this is not desirable behaviour, you can avoid it by publishing logs entries from all processes to a shared queue, such as a named pipe. You then have a single process that writes from this queue to the log file. This avoids the conccurrency issues, ensures that logs are written in order, and works when file appends are not atomic, since the file is only written to directly by one process.
From this MSDN page on creating and opening Files:
An application also uses CreateFile to specify whether it wants to share the file for reading, writing, both, or neither. This is known as the sharing mode. An open file that is not shared (dwShareMode set to zero) cannot be opened again, either by the application that opened it or by another application, until its handle has been closed. This is also referred to as exclusive access.
and:
If you specify an access or sharing mode that conflicts with the modes specified in the previous call, CreateFile fails.
So if you use CreateFile rather than say File.Open which doesn't have the same level of control over the file access, you should be able to open a file in such a way that it can't get corrupted by other processes.
You'll obviously have to add code to your processes to cope with the case where they can't get exclusive access to the log file.
No it isn't. If you need this there is Transactional NTFS in Windows Vista/7.

File Unlocking and Deleting as single operation

Please note this is not duplicate of File r/w locking and unlink. (The difference - platform. Operations of files like locking and deletion have totally different semantics, thus the sultion would be different).
I have following problem. I want to create a file system based session storage where each session data is stored in simple file named with session ids.
I want following API: write(sid,data,timeout), read(sid,data,timeout), remove(sid)
where sid==file name, Also I want to have some kind of GC that may remove all timed-out sessions.
Quite simple task if you work with single process but absolutly not trivial when working with multiple processes or even over shared folders.
The simplest solution I thought about was:
write/read:
hanlde=CreateFile
LockFile(handle)
read/write data
UnlockFile(handle)
CloseHanlde(handle)
GC (for each file in directory)
hanlde=CreateFile
LockFile(handle)
check if timeout occured
DeleteFile
UnlockFile(handle)
CloseHanlde(handle)
But AFIAK I can't call DeleteFile on opended locked file (unlike in Unix where file locking is
not mandatory and unlink is allowed for opened files.
But if I put DeleteFile outside of Locking loop bad scenario may happen
GC - CreateFile/LockFile/Unlock/CloseHandle,
write - oCreateFile/LockFile/WriteUpdatedData/Unlock/CloseHandle
GC - DeleteFile
Does anybody have an idea how such issue may be solved? Are there any tricks that allow
combine file locking and file removal or make operation on file atomic (Win32)?
Notes:
I don't want to use Database,
I look for a solution for Win32 API for NT 5.01 and above
Thanks.
I don't really understand how this is supposed to work. However, deleting a file that's opened by another process is possible. The process that creates the file has to use the FILE_SHARE_DELETE flag for the dwShareMode argument of CreateFile(). A subsequent DeleteFile() call will succeed. The file doesn't actually get removed from the file system until the last handle on it is closed.
You currently have data in the record that allows the GC to determine if the record is timed out. How about extending that housekeeping info with a "TooLateWeAlreadyTimedItOut" flag.
GC sets TooLateWeAlreadyTimedItOut = true
Release lock
<== writer comes in here, sees the "TooLate" flag and so does not write
GC deletes
In other words we're using a kind of optimistic locking approach. This does require some additional complexity in the Writer, but now you're not dependent upon any OS-specifc wrinkles.
I'm not clear what happens in the case:
GC checks timeout
GC deletes
Writer attempts write, and finds no file ...
Whatever you have planned for this case can also be used in the "TooLate" case
Edited to add:
You have said that it's valid for this sequence to occur:
GC Deletes
(Very slightly later) Writer attempts a write, sees no file, creates a new one
The writer can treat "tooLate" flag as a identical to this case. It just creates a new file, with a different name, use a version number as a trailing part of it's name. Opening a session file the first time requires a directory search, but then you can stash the latest name in the session.
This does assume that there can only be one Writer thread for a given session, or that we can mediate between two Writer threads creating the file, but that must be true for your simple GC/Writer case to work.
For Windows, you can use the FILE_FLAG_DELETE_ON_CLOSE option to CreateFile - that will cause the file to be deleted when you close the handle. But I'm not sure that this satisfies your semantics (because I don't believe you can clear the delete-on-close attribute.
Here's another thought. What about renaming the file before you delete it? You simply can't close the window where the write comes in after you decided to delete the file but what if you rename the file before deleting it? Then when the write comes in it'll see that the session file doesn't exist and recreate it.
The key thing to keep in mind is that you simply can't close the window in question. IMHO there are two solutions:
Adding a flag like djna mentioned or
Require that a per-session named mutex be acquired which has the unfortunate side effect of serializing writes on the session.
What is the downside of having a TooLate flag? In other words, what goes wrong if you delete the file prematurely? After all your system has to deal with the file not being present...

Resources