I found in most cases, i_data is just the dereferenced data of i_mapping like below, why set two same value in the one inode structure ?
crash> struct inode ffffffc073c1f360 -o
struct inode {
...
[ffffffc073c1f4a8] struct file_lock *i_flock;
**[ffffffc073c1f4b0] struct address_space i_data;**
[ffffffc073c1f558] struct list_head i_devices;
...
crash> struct inode ffffffc073c1f360
struct inode {
...
i_op = 0xffffffc0007ad1c0 <ext4_file_inode_operations>,
i_sb = 0xffffffc002010000,
**i_mapping = 0xffffffc073c1f4b0,**
i_security = 0xffffffc07230d050,
...
Address space always deals with page cache. While accessing a page in page cache if the owner of the page is file, the address_space object is embedded in the i_data field of VFS inode object. The i_mapping field of inode always points to address_space object of the owner of the pages containing the inode's data. The host field of the address_space object points to the inode object in which the descriptor is embedded.
E.g. if page belongs to a regular file that is stored in ext4 filesystem, the i_data of VFS inode points to inode of that file and i_mapping field of inode points to i_data of same inode, and the host field of address_space object points to the same inode.
However things are not that simple always. Suppose a page contains data read from block device file, which contains "raw" data of a block device, the address_space is embedded in the "master" inode of the file in the bdev, special filesystem associated with the block device(referenced by bd_inode). Hence the i_mapping field of the inode of block device file points to the
address_space object embedded in the master inode;correspondingly the host field of address_space object points to the master inode. In this way, all pages containing data read from block device have same address_space object, even if they have been accessed by referring to different block device files.
So there is slight distinction between the two when it come to page belongs to regular file or to block device special file.
From http://lkml.iu.edu/hypermail/linux/kernel/0105.2/1363.html:
i_data is "pages read/written by this inode"
i_mapping is "whom should I ask for pages?"
IOW, everything outside of individual filesystems should use the latter.
They are same if (and only if) inode owns the data. CODA (or anything that
caches data on a local fs) will have i_mapping pointing to the i_data of
inode it caches into. Ditto for block devices if/when they go into pagecache -
we should associate pagecache with struct block_device, since we can have
many inodes with the same major:minor. IOW, ->i_mapping should be pointing
to the same place for all of them.
From https://marc.info/?l=linux-fsdevel&m=99470104708354&w=2:
It is used by filesystems that wrap around existing ones. AFAIK Coda is
the only one in the tree that actually uses this.
All VFS functions always use inode->i_mapping->a_ops. Coda copies the
i_mapping of the underlying inode to it's own inode. This way we use the
same address space as the container file and avoid mapping the same file
pages in different locations in memory.
i_mapping is the true page cache. i_data is where an address_space resides and is allocated and freed with the inode, and is normally where i_mapping points to. But a filesystem can leave the i_data of an inode empty and points the i_mapping to the i_data of another inode, to avoid multiple page caches.
Related
I am reading the LDD3, and I would like to understand how the device driver file operations are called at the time a system call is performed.
From my understanding, when the open system call is performed, the struct file *filp gets its f_op attribute populated from the inode's i_fop.
But when/where does the inode get its i_fop attribute populated with the cdev's ops attribute?
My intuition is that when we call cdev_add in the driver, our device is added to the cdev_map with the MAJOR and MINOR numbers, but the inode is not yet linked to the character device. The inode would only be linked either when mknod is called to create the device file in the /dev directory, or when the device file is opened through the syscall.
The struct inode's i_fop member gets set to &def_chr_fops (in "fs/char_dev.c") for character special files by the init_special_inode() function (in "fs/inode.c"). That is called by the underlying filesystem (e.g. when it is populating its directory structures and inodes when mounted or when a new character special file is created in the filesystem by mknod().
When opening the file, the struct inode's i_fop is copied to the struct file's f_op member by the do_dentry_open() function called from the vfs_open() function (in "fs/open.c"). do_dentry_open() calls the open file operation handler. For character special files, the open file operation handler from def_chr_fops is the chrdev_open() function (in "fs/char_dev.c").
The chrdev_open() function looks up the struct cdev (if any) associated with the MAJOR/MINOR device number (from the inode's i_rdev member), copies the ops member from the struct cdev to the struct file's f_op member to replace the file operations, and calls the replacement open handler if there is one.
I am trying to get an idea on how does memory mapping take place using the system call mmap.
So far I know mmap takes arguments from the user and returns a logical address of where the file is stored. When the user tries to access it takes this address to the map table converts it to a a physical address and carries the operation as requested.
However I found articles as code example and Theoretical explanation
What it mentions is the memory mapping is carried out as:
A. Using system call mmap ()
B. file operations using (struct file *filp, struct vm_area_struct *vma)
What I am trying to figure out is:
How the arguments passed in the mmap system call are used in the struct vm_area_struct *vma) More generally how are these 2 related.
for instance: the struct vm_area_struct has arguments such as starting address, ending address permissions,etc. How are the values sent by the user used to fill values of these variables.
I am trying to write a driver so, Does the kernal fill the values for variables in the structure for us and I simply use it to call and pass values to remap_pfn_range
And a more fundamental question, why is a different file systems operation needed. The fact that mmap returns the virtual address means that it has already achieved a mapping doesnt it ?
Finally I am not that clear about how the entire process would work in user as well as kernal space. Any documentation explaining the process in details would be helpful.
I have one doubt in ip_rt_ioctl function
In case of route addition, first a copy_from_user is made for the structure struct rtentry and then the copied data from is subsequently used in rtentry_to_fib_config function, including the rtentry.rt_dev field which usually is the device name.
My understanding is copy_from_user does a shallow copy. So since the rtentry.rt_dev field is again a character pointer. So likely the contents of the pointer will not get copied.
Hence even after copy the device name will be pointer to the user space address.
So is it right to access the user space address from kernel space ?
It's OK to refer to user-space address from kernel-space while kernel is bound to that process' context (this is true for syscall handlers). In that case, proper page table is set and it's safe to refer to user process' memory.
However, you should always check validity of address or use copy_from_user() that does that.
I'm studying the MM in Linux and I got very confused when I could't find where the raw data is stored. I thought it was stored in some field of a page struct but I couldn't find there.
Where is the actual data represented by a page stored? And how to get a pointer to it?
page struct is just a helper which stores the metadata. it doesn't actually store any data, but the directions to locate the data in memory. That is, the address space mapping to the physical addresses etc. The actual data is still stored in the physical memory.
Where is the actual data represented by a page stored?
The actual data is in a physical page address by at least one virtual address AND/OR it is on disk in an inode and has never been mapped. For the inode case, accessing the virtual address will trigger a page fault and that handler will read the memory into a physical page and the faulted code will resume.
And how to get a pointer to it?
I believe that the struct page is contained in another array, like mem_map. For instance the function mem_map_next, is use to iterate through an array of struct page. Perhaps the structure you are interested in is struct vm_area_struct? This is a virtual address tracking structure. There maybe multiple virtual addresses mapping to the same physical page.
You need to know the context of a composing struct to know the address a struct page represents. Then it is simply a base address plus the index multiplied by the page size.
You could use page_address() to get virtual address of a page.
But the return address might be NULL due to the fact that not all pages have mapped virtual addresses.
void *page_address(const struct page *page);
You could use kmap to map a highmen page to a virtual address.
Also, remember to use kunmap to unmap this page when you don't need to access it.
struct page *page = alloc_pages(GFP_KERNEL | __GFP_HIGHMEM, 0);
if (page) {
void *addr = kmap(page);
if (addr) {
memset(addr, 0, PAGE_SIZE);
kunmap(addr);
}
}
I'm trying to retrieve in a kernel module the direct/indirect etc addresses in an ext4 file system inode. I understand that I need to look into ext_inode_info struct (I do this via container_of using the relevant vfs_inode).
But to which field am I supposed to look into?
Where can I find for example the first direct pointer? I thought it was stored in i_data array (it is in ext3_inode_info).
But for an ext4 inode when I examine the first entry in i_data, I get a sector address that is not remotely similar to the real sector holding the address of the first data block.
Any help will be appreciated.
==EDIT==
ok, so I seemed to have understood the basic problem. I have an extent-based ext4 file system. Wasn't aware of this change, and that this is enabled by default. So is there a simple way to extract the physical addresses of blocks by offset? I'm trying again as verification to look at the first physical block (logical 0), by looking at the first extent, but I get some gibberish numbers (though consistent and unique for every inode/file, so some progress was made).