When is an inode's file_operations linked to its character device file_operations? - linux-kernel

I am reading the LDD3, and I would like to understand how the device driver file operations are called at the time a system call is performed.
From my understanding, when the open system call is performed, the struct file *filp gets its f_op attribute populated from the inode's i_fop.
But when/where does the inode get its i_fop attribute populated with the cdev's ops attribute?
My intuition is that when we call cdev_add in the driver, our device is added to the cdev_map with the MAJOR and MINOR numbers, but the inode is not yet linked to the character device. The inode would only be linked either when mknod is called to create the device file in the /dev directory, or when the device file is opened through the syscall.

The struct inode's i_fop member gets set to &def_chr_fops (in "fs/char_dev.c") for character special files by the init_special_inode() function (in "fs/inode.c"). That is called by the underlying filesystem (e.g. when it is populating its directory structures and inodes when mounted or when a new character special file is created in the filesystem by mknod().
When opening the file, the struct inode's i_fop is copied to the struct file's f_op member by the do_dentry_open() function called from the vfs_open() function (in "fs/open.c"). do_dentry_open() calls the open file operation handler. For character special files, the open file operation handler from def_chr_fops is the chrdev_open() function (in "fs/char_dev.c").
The chrdev_open() function looks up the struct cdev (if any) associated with the MAJOR/MINOR device number (from the inode's i_rdev member), copies the ops member from the struct cdev to the struct file's f_op member to replace the file operations, and calls the replacement open handler if there is one.

Related

cdev_alloc() vs cdev_init()

In Linux kernel modules, two different approaches can be followed when creating a struct cdev, as suggested in this site and in this answer:
First approach, cdev_alloc()
struct cdev *my_dev;
...
static int __init example_module_init(void) {
...
my_dev = cdev_alloc();
if (my_dev != NULL) {
my_dev->ops = &my_fops; /* The file_operations structure */
my_dev->owner = THIS_MODULE;
}
else
...
}
Second approach, cdev_init()
static struct cdev my_cdev;
...
static int __init example_module_init(void) {
...
cdev_init(&my_cdev, my_fops);
my_cdev.owner = THIS_MODULE;
...
}
(assuming that my_fops is a pointer to an initialized struct file_operations).
Is the first approach deprecated, or still in use?
Can cdev_init() be used also in the first approach, with cdev_alloc()? If no, why?
The second question is also in a comment in the linked answer.
Can cdev_init() be used also in the first approach, with cdev_alloc()?
No, cdev_init shouldn't be used for a character device, allocated with cdev_alloc.
At some extent, cdev_alloc is equivalent to kmalloc plus cdev_init. So calling cdev_init for a character device, created with cdev_alloc, has no sense.
Moreover, a character device allocated with cdev_alloc contains a hint that the device should be deallocated when no longer be used. Calling cdev_init for that device will clear that hint, so you will get a memory leakage.
Selection between cdev_init and cdev_alloc depends on a lifetime you want a character device to have.
Usually, one wants lifetime of a character device to be the same as lifetime of the module. In that case:
Define a static or global variable of type struct cdev.
Create the character device in the module's init function using cdev_init.
Destroy the character device in the module's exit function using cdev_del.
Make sure that file operations for the character device have .owner field set to THIS_MODULE.
In complex cases, one wants to create a character device at specific point after module's initializing. E.g. a module could provide a driver for some hardware, and a character device should be bound with that hardware. In that case the character device cannot be created in the module's init function (because a hardware is not detected yet), and, more important, the character device cannot be destroyed in the module's exit function. In that case:
Define a field inside a structure, describing a hardware, of pointer type struct cdev*.
Create the character device with cdev_alloc in the function which creates (probes) a hardware.
Destroy the character device with cdev_del in the function which destroys (disconnects) a hardware.
In the first case cdev_del is called at the time, when the character device is not used by a user. This guarantee is provided by THIS_MODULE in the file operations: a module cannot be unloaded if a file, corresponded to the character device, is opened by a user.
In the second case there is no such guarantee (because cdev_del is called NOT in the module's exit function). So, at the time when cdev_del returns, a character device can be still in use by a user. And here cdev_alloc really matters: deallocation of the character device will be deferred until a user closes all file descriptors associated with the character device. Such behavior cannot be obtained without cdev_alloc.
They do different things. The preference would be usual - prefer not to use dynamic allocation when not needed and allocate on stack when it's possible.
cdev_alloc() dynamically allocates my_dev, so it will call kfree(pointer) when cdev_del().
cdev_init() will not free the pointer.
Most importantly, the lifetime of the structure my_cdev is different. In cdev_init() case struct cdev my_cdev is bound to the containing lexical scope, while cdev_alloc() returns dynamically allocate pointer valid up until free-d.

What's the difference between &inode->i_data and inode->i_mapping

I found in most cases, i_data is just the dereferenced data of i_mapping like below, why set two same value in the one inode structure ?
crash> struct inode ffffffc073c1f360 -o
struct inode {
...
[ffffffc073c1f4a8] struct file_lock *i_flock;
**[ffffffc073c1f4b0] struct address_space i_data;**
[ffffffc073c1f558] struct list_head i_devices;
...
crash> struct inode ffffffc073c1f360
struct inode {
...
i_op = 0xffffffc0007ad1c0 <ext4_file_inode_operations>,
i_sb = 0xffffffc002010000,
**i_mapping = 0xffffffc073c1f4b0,**
i_security = 0xffffffc07230d050,
...
Address space always deals with page cache. While accessing a page in page cache if the owner of the page is file, the address_space object is embedded in the i_data field of VFS inode object. The i_mapping field of inode always points to address_space object of the owner of the pages containing the inode's data. The host field of the address_space object points to the inode object in which the descriptor is embedded.
E.g. if page belongs to a regular file that is stored in ext4 filesystem, the i_data of VFS inode points to inode of that file and i_mapping field of inode points to i_data of same inode, and the host field of address_space object points to the same inode.
However things are not that simple always. Suppose a page contains data read from block device file, which contains "raw" data of a block device, the address_space is embedded in the "master" inode of the file in the bdev, special filesystem associated with the block device(referenced by bd_inode). Hence the i_mapping field of the inode of block device file points to the
address_space object embedded in the master inode;correspondingly the host field of address_space object points to the master inode. In this way, all pages containing data read from block device have same address_space object, even if they have been accessed by referring to different block device files.
So there is slight distinction between the two when it come to page belongs to regular file or to block device special file.
From http://lkml.iu.edu/hypermail/linux/kernel/0105.2/1363.html:
i_data is "pages read/written by this inode"
i_mapping is "whom should I ask for pages?"
IOW, everything outside of individual filesystems should use the latter.
They are same if (and only if) inode owns the data. CODA (or anything that
caches data on a local fs) will have i_mapping pointing to the i_data of
inode it caches into. Ditto for block devices if/when they go into pagecache -
we should associate pagecache with struct block_device, since we can have
many inodes with the same major:minor. IOW, ->i_mapping should be pointing
to the same place for all of them.
From https://marc.info/?l=linux-fsdevel&m=99470104708354&w=2:
It is used by filesystems that wrap around existing ones. AFAIK Coda is
the only one in the tree that actually uses this.
All VFS functions always use inode->i_mapping->a_ops. Coda copies the
i_mapping of the underlying inode to it's own inode. This way we use the
same address space as the container file and avoid mapping the same file
pages in different locations in memory.
i_mapping is the true page cache. i_data is where an address_space resides and is allocated and freed with the inode, and is normally where i_mapping points to. But a filesystem can leave the i_data of an inode empty and points the i_mapping to the i_data of another inode, to avoid multiple page caches.

What does open() system call is transferred to Kernel Module?

I am writing a character device driver. In the sample code which I found over internet, mentions that we need to attach some file operations to this character device. In those file_operations there is one function named as open. But in that open call, there are not doing anything significant.
But if we want to use this character device, first we need to open the device and then only we can read/write anything on it. So I want to know how open() call is working exactly. Here is the link I am referring for character device driver :
http://appusajeev.wordpress.com/2011/06/18/writing-a-linux-character-device-driver/
The sequence for open() on the user side is very straightforward: it will invoke sys_open() on the kernel path, will do some path resolution and permission checking, then will path everything its got to dev_open() (and would not do anything else).
dev_open() gets parameters you have passed to it through the open() system call (+ quite a lot of information specific to kernel vfs subsystem, but this is rarely of concern).
Notice, that you're getting struct file parameter passed in. It has several useful fields:
struct file {
....
struct path f_path; // path of the file passed to open()
....
unsigned int f_flags; // 'flags' + 'mode' as passed to open()
fmode_t f_mode; // 'mode' as set by kernel (FMODE_READ/FMODE_WRITE)
loff_t f_pos; // position in file used by _llseek
struct fown_struct f_owner; // opening process credentials, like uid and euid
....
}
The rest you can dig out yourself by checking out examples in the source.

Should open method in Linux device driver return a file descriptor?

I'm studying Linux Device Driver programming 3rd edition and I have some questions about the open method, here's the "scull_open" method used in that book:
int scull_open(struct inode *inode, struct file *filp){
struct scull_dev *dev; /* device information */
dev = container_of(inode->i_cdev, struct scull_dev, cdev);
filp->private_data = dev; /* for other methods */
/* now trim to 0 the length of the device if open was write-only */
if ( (filp->f_flags & O_ACCMODE) == O_WRONLY) {
if (down_interruptible(&dev->sem))
return -ERESTARTSYS;
scull_trim(dev); /* ignore errors */
up(&dev->sem);
}
return 0; /* success */
}
And my questions are:
Shouldn't this function returns a file descriptor to the device just opened?
Isn't the "*filp" local to this function, then why we copy the contents of dev to it?
How we could use later in read and write methods?
could someone writes to my a typical "non-fatty" implementation of open method?
ssize_t scull_read(struct file *filp, char __user *buf, size_t count, loff_t *f_pos){
struct scull_dev *dev = filp->private_data;
...}
Userspace open function is what you are thinking of, that is a system call which returns a file descriptor int. Plenty of good references for that, such as APUE 3.3.
Device driver "open method" is a function within file_operations structure. It is different than userspace "file open". With the device driver installed, when user code does open of the device (e.g. accessing /dev/scull0), this "open method" would then get called.
Shouldn't this function returns a file descriptor to the device just opened ?
In linux device driver, open() returns either 0 or negative error code. File descriptor is internally managed by kernel.
Isn't the "*filp" local to this function, then why we copy the contents of dev to it ?
filp represents opened file and a pointer to it is passed to driver by kernel. Sometimes this filp is used sometimes it is not needed by driver. Copy contents of dev is required so that when some other function let us say read() is called driver can retrieve some device specific data.
How we could use later in read and write methods ?
One of the most common way to use filp in read/write() is to acquire lock. When device is opened driver will create a lock. Now when a read/write occurs same lock will be used to prevent data buffer from corruption in case multiple process are accessing same device.
could someone writes to my a typical "non-fatty" implementation of open method ?
As you are just studying please enjoy exploring more. An implementation can be found here

Where do you store user context in Linux character drivers?

It's been a while since I worked on a Linux kernel module, and I seem to remember that there was a place to stash context in your open() open implementation that would be available in your other file_operations... For example, if I want to maintain some state associated with everyone that opens my device node, if either the inode structure or the file structure that is passed to all the file_operations functions had a void* I could fill, I could very easily support any number of users.... Is this possible?
Found the answer. the "struct file*" that's passed to all the file_operations functions has a field called "private_data"... It's a void*, so you can populate in open, use it in read(), write() and ioctl() and free it in release()..

Resources