How is /proc/[pid]/ns/mnt created by clone in kernel? - linux-kernel

I know clone function creates /proc/[pid]/ns/mnt link. But I couldn’t find where it calls something like proc_mkdir to create such directory.

clone() doesn't create anything in /proc.
When you open /proc/[pid]/ns/mnt, the path lookup logic in VFS finds /proc and then gets to proc_root_lookup().
From there it goes to proc_pid_lookup() to proc_pid_instantiate(), which creates, on the fly, an in-memory directory inode for /proc/[pid] via proc_pid_make_inode().

Related

Create system call function

I need to create a system call function to get all child folders of the directory. But, I don't have any idea to do that. Can you give me some keywords or advice to implement that?
asmlinkage long sys_get_child_folder(char* path, char** child_folder);
I'm smelling a XY problem; what is the actual problem you're trying to solve?
Why the heck do you want to create a new system call for that? Just open the directory, enumerate all its entries and filter out those, that are not directory inodes. The canonical way to do this is to use the opendir function. https://linux.die.net/man/3/opendir
Also keep in mind that if you're writing code that's supposed to run inside the kernel, be aware that from inside the kernel, the ususal file system mechanisms are difficult to reach. The reason for that is, that filesystems spawn namespaces, which are depending on the task context; the only robust way to access files from within the kernel, is to have a userspace process open them and then hand the file descriptor to some kernel code. But this is strongly discouraged.

How to get the inode structure of the /proc/pid/fd/n file of an opened file in kernel?

For a given fd number, I know I can get the inode structure of the opened file by calling fget_raw. But what I actually want to get is the inode structure of the file /proc/pid/fd/n, which is actually another file in procfs.
This may be done by calling path_lookup with a spliced path of proc/pid/fd/n, but I think it's not the best way of doing this. Is there a way to get the inode directly from a fd number or a file structure? I think such process should occur when a close syscall is made, because a close call should get the /proc/pid/fs/n entry in order to delete it, but I can't locate related code.
I'm doing this because I want to get the time when the socket was opened.

What is the purpose of creating a symbolic link between files?

Recently I came across the os library in Python and found out about the existence of symbolic links. I would like to know what a symbolic link is, why it exists, and what are various uses of it?
I will answer this from a perspective of an *nix user (specifically Linux). If you're interested in how this relates to Windows I suggest you look for tutorials like this one. This will be a bit of a roundabout, but I find it that symbolic links or symlinks are best explained together with hard links and generic properties of a filesystem on Linux.
Links and files on Linux
As a rule of thumb, in Linux everything is treated as a file. Directories are files that contain mappings from names (paths) to inodes, which are just unique identifiers of different objects residing on your system. Basically, if I give you a name like /home/gst/mydog.png the accessing process will first look into the / directory (the root directory) where it will find information on where to find home, then opening that file it will look into it to see where gst is and finally in that file it will try to find the location of mydog.png, and if successful try do whatever it set out to do with it. Going back to directory files, the mappings they contain are called links. Which brings us to hard and symbolic links.
Hard vs Symbolic links
A hard link is just a mapping like the one we discussed previously. It points directly to a certain object. A symlink on the other hand does not point directly to an object. Rather it just saves a path to an object. For example, say that I created a symbolic link to /home/gst/mydog.png at /home/gst/Desktop/mycat.png with os.symlink("/home/gst/mydog.png", "/home/gst/Desktop/mycat.png"). When I try to open it, the name /home/gst/Desktop/mycat.png is usually resolved to /home/gst/mydog.png. By following the symlink located at /home/gst/Desktop/mycat.png I actually (try to) access an object pointed to by /home/gst/mydog.png.
If I create a hard link (for example by calling os.link) I just add entries to the relevant directory files, such that the specific name can be followed to the linked object. When I create a symbolic link I create a file that contains a path to another file (which might be another symbolic link).
More specific to your question, if I pass /home/gst/Desktop/mycat.png to os.readlink it will return /home/gst/mydog.png. This name resolution also happens when calling functions in os with an (optional) parameter follow_symlinks set to True, however, if it's set to False the name does not get resolved (for instance you'd set it to false when you want to manipulate the symlink itself not the object it points to). From the module documentation:
not following symlinks: If follow_symlinks is False, and the last element of the path to operate on is a symbolic link, the function will operate on the symbolic link itself instead of the file the link points to. (For POSIX systems, Python will call the l... version of the function.)
You can check whether or not follow_symlinks is supported on your platform using os.supports_follow_symlinks. If it is unavailable, using it will raise a NotImplementedError.
Why use hard links?
This question has already been answered here, quoting from the accepted answer:
The main advantage of hard links is that, compared to soft links, there is no size or speed penalty. Soft links are an extra layer of indirection on top of normal file access; the kernel has to dereference the link when you open the file, and this takes a small amount of time. The link also takes a small amount of space on the disk, to hold the text of the link. These penalties do not exist with hard links because they are built into the very structure of the filesystem.
I'd like to add that hard links allow for an easy method of file backup. For every file the system keeps a count of its hard links. Once this count reaches 0 the memory segment on which the file is located is marked as free, meaning that the system will eventually overwrite it with another data (effectively deleting the previous file - which doesn't happen for at least as long as a running process has an opened stream associated with the file, but that's another story). Why would that matter?
Let's say you have a huge directory full of files you'd like to manipulate somehow (rename some, delete others, etc.) and you write a script to do this for you. However, you're not completely sure that the script will work as intended and you fear it might delete some wrong files. You also don't want to copy all the files, as this would take up too much space and time. One solution is to just create a hard link for each file at some other point in the filesystem. If you delete a file in the target directory, the associated object is still available because there's another hard link associated with it. Creating that many hard links will consume much less time and space than copying all the file, yet it will give you a reasonable backup strategy.
This is not the case with symbolic links. Remember, symlinks point to other links (possibly another symlink as well) not to actual files. Hence, I might create a symlink to a file, but that it will only save the link. If the (eventual) hard link that the symlink is pointing to gets removed from the system, trying to resolve the symlink won't lead you to a file. Such symlinks are said to be "broken" or "dangling". Thus you cannot rely on symlinks to preserve access to a certain file. (Conversely, deleting a symlink does not affect the link count associated with a target file.) So what's their use?
Why use symbolic links?
You can operate on symlinks as if they were the actual files to which they pointing somewhere down the line (except deleting them). This allows you to have multiple "access points" to a file, without having excess copies (that remain up to date, since they always access the same file). If you want to replace the file that is being accessed you only need to change it once and all of the symlinks will point to it (as long as the path saved by them is not changed). However, if you have hard links to a certain file and you then replace that file with another one, you also need to replace the hard links as otherwise they'll still be pointing to the old file.
Lastly, it is not uncommon to have different filesystems mounted on the same Linux machine. That is to say, that the way data is organized and interpreted at some point in the file hierarchy (say /home/gst/fs1) can be different to how it is organized and interpreted at another point (say /home/gst/Desktop/fs2). A hard link can only reside on the same filesystem as the file it's pointing to. Whereas, a symlink can be created on one filesystem but effectively pointing to a file on another filesystem (see answers to this question).
Symbolic links, also known as soft links, are special types of files that point to other files, much like shortcuts in Windows and Macintosh aliases. The data in the target file does not appear in a symbolic link, unlike a hard link. Instead, it points to another file system entry.
Read more here:
https://kb.iu.edu/d/abbe#:~:text=A%20symbolic%20link%2C%20also%20termed,somewhere%20in%20the%20file%20system.

ioutil.TempFile and umask

In my Go application instead of writing to a file directly I would like to write to a temporary that is renamed into the final file when everything is done. This is to avoid leaving partially written content in the file if the application crashes.
Currently I use ioutil.TempFile, but the issue is that it creates the file with the 0600 permission, not 0666. Thus with typical umask values one gets the 0600 permission, not expected 0644 or 0660. This is not a problem is the destination file already exist as I can fix the permission on the temporary to much the existing ones, but if the file does not exist, then I need somehow to deduce the current umask.
I suppose I can just duplicate ioutil.TempFile implementation to pass 0666 into os.OpenFile, but that does not sound nice. So the question is there a better way?
I don't quite grok your problem.
Temporary files must be created with as tight permissions as possible because the whole idea of having them is to provide your application with secure means of temporary storing data which is too big to fit in memory (or to hand the generated file over to another process). (Note that on POSIX systems, where an opened file counts as a live reference to it, it's even customary to immediately remove the file while having it open so that there's no way to modify its data other than writing it from the process which created it.)
So in my opinion you're trying to use a wrong solution to your problem.
So what I do in a case like yours is:
Create a file with the same name as old one but with the ".temp" suffix appended.
Write data there.
Close, rename it over the old one.
If you feel like using a fixed suffix is lame, you can "steal" the implementation of picking a unique non-conflicting file name from ioutil.TempFile(). But IMO this would be overengeneering.
You can use ioutil.TempDir to get the folder where temporary files should be stored an than create the file on your own with the right permissions.

True file descriptor clone

Why is there no true file descriptor clone mechanism when possible, like it is for disk files.
POSIX:
After a successful return from one of these system calls, the old and
new file descriptors may be used interchangeably. They refer to the
same open file description (see open(2)) and thus share file offset
and file status flags; for example, if the file offset is modified by
using lseek(2) on one of the descriptors, the offset is also changed
for the other.
Windows:
The duplicate handle refers to the same object as the original handle. Therefore, any changes to the object are reflected through both handles. For example, if you duplicate a file handle, the current file position is always the same for both handles. For file handles to have different file positions, use the CreateFile function to create file handles that share access to the same file.
Reasons for having a clone primitive:
When manipulating a file archive, I want each file in the archive has to be accessible independently. The file archive should behave somewhat like a virtual filesystem.
File type checking. Being able to clone file offsets makes it possible to read a small portion of the file without affecting the original position.
You should consider the following: file descriptor is merely an offset into the array of "file" (literally, that's what they are called) object pointers on the kernel side. So when you duplicate the file descriptor, the kernel will simply copy the value of the file pointer from one location in the array to another and increment the reference count on the pointed to object.
Thus, your issue is not with file descriptor duplication, but with management of the file offsets. The easy answer for this: do it yourself. That is, associate the current file offset with each file descriptor on the application side explicitly.
Of course, the most basic file access system calls read() and write() make use of kernel maintained file offset variable, if it's available (and it's only available if you are dealing with "normal" random access files). But more advanced file access system calls will expect the desired file offset to be supplied by the application on each invocation. Those include pread()/pwrite(), preadv()/pwritev() and aio_read()/aio_write (the later is probably the best approach for writing parallel access applications like the one you described).
On Windows, ReadFile()/WriteFile(), ReadFileScatter()/WriteFileGather() and ReadFileEx()/WriteFileEx() analogously expect to be passed the file offset on every invocation (via the lpOverlapped argument).

Resources