Connection between mmap offset and vma_area_struct vm_pgoff field - linux-kernel

What is the connection between the two fields?
I'm implementing my driver's mmap function. Is it true to assume that the 2 fields are equal, these are the original argument passed in the mmap call and the vm_pgoff I get as a field of vm_area_struct I get as an argument?

Related

Why does SetFilePointerEx(1) succeed for an empty file that's open for reading only?

I do not understand why the following code, which sets the position of an open file handle relative to the base (i. e. sets the absolute position) succeeds when trying to set a positive position for an empty file that's open for reading only:
LARGE_INTEGER offset;
offset.QuadPart = 100;
LARGE_INTEGER pos = {0};
return ::SetFilePointerEx(_h, offset, &pos, FILE_BEGIN) != 0;
It returns a non-zero result, and the pos variable receives the value 100. That behavior is counter-intuitive for a GENRIC_READ file of size zero, what is the logic? I understand that this is normal behavior for files with write access.
P. S. The file is not overlapped and overall as simple as it can be with no fancy flags.
Does SetFilePointerEx ever fail at all for valid handles, positive absolute positions and plain files?
SetFilePointerEx internally call ZwSetInformationFile with FilePositionInformation. the FILE_POSITION_INFORMATION used as input.
exist only such restriction on this value
If the file was opened or created with the
FILE_NO_INTERMEDIATE_BUFFERING option, the value of CurrentByteOffset
must be an integral multiple of the sector size of the underlying
device.
also, always must be 0 <= CurrentByteOffset.QuadPart- so position must be not negative.
no more restriction on position value. you can set it to any, not depend from file size. this call even never go to file system but handle by I/O manager.
all what he doing - set CurrentByteOffset in FILE_OBJECT
how this is used ? when we call ZwReadFile or ZwWriteFile - the optional parameter ByteOffset exist
Pointer to a variable that specifies the starting byte offset in the
file where the read operation will begin. If an attempt is made to
read beyond the end of the file, ZwReadFile returns an error.
If the call to ZwCreateFile set either of the CreateOptions flags
FILE_SYNCHRONOUS_IO_ALERT or FILE_SYNCHRONOUS_IO_NONALERT, the I/O
Manager maintains the current file position. If so, the caller of
ZwReadFile can specify that the current file position offset be used
instead of an explicit ByteOffset value. This specification can be
made by using one of the following methods:
Specify a pointer to a LARGE_INTEGER value with the HighPart member
set to -1 and the LowPart member set to the system-defined value
FILE_USE_FILE_POINTER_POSITION.
Pass a NULL pointer for ByteOffset.
ZwReadFile updates the current file position by adding the number of
bytes read when it completes the read operation, if it is using the
current file position maintained by the I/O Manager.
Even when the I/O Manager is maintaining the current file position,
the caller can reset this position by passing an explicit ByteOffset
value to ZwReadFile. Doing this automatically changes the current file
position to that ByteOffset value, performs the read operation, and
then updates the position according to the number of bytes actually
read. This technique gives the caller atomic seek-and-read service.
so we can or explicit pass ByteOffset value or use additional api call for set it first in FILE_OBJECT and then I/O manager take it from here, if no explicit ByteOffset pointer.
note - in case asynchronous I/O - we need always explicit pass ByteOffset value or call just fail (exception for pipes and mailslot files)
in case ReadFile and WriteFile - ByteOffset taken from OVERLAPPED parameter. if it is 0 - the ByteOffset set to 0 pointer and CurrentByteOffset from FILE_OBJECT is used. and if pointer to OVERLAPPED not 0 - the exactly value from OVERLAPPED is explicit passed as ByteOffset value and CurrentByteOffset in FILE_OBJECT is ignored.
also always is ok use pointer to OVERLAPPED - not only for asynchronous file handles. simply for asynchronous - this is mandatory parameter and for synchronous is optional.
really faster and better - direct pass pointer to api call (read/write) than use separate api call, which take time, can (theoretical) fail, etc
use SetFilePointer may be exist sense only in legacu code, where it called from huge count of places, for not modify too many code

Flag value of shmat() function

This function is used for attaching allocated memory segment to the calling process. It takes three arguments. First argument corresponds to identifier of memory segment. Second argument is pointer to memory segment. For second argument, NULL or 0 value is passed to the function, since when we allocate the shared memory, we know only its identifier not its memory address.
However, I cannot find what the task of third argument is. Some codes that I am encountered by set the flag value to 0. NULL and 0 have same meaning in C language, and I think that additional adjustments are not needed; hence, NULL is passed to the function as third argument.
Is there anyone who can explain the task of flag value in shmat() function ?
Four flags are defined:
SHM_RDONLY - the segment is attached for reading; default is Read/Write
SHM_RND - the attach occurrs at the address equal to shmaddr rounded down to the nearest multiple of SHMLBA (usually defined as the page size)
SHM_REMAP - flag may be specified in shmflg to indicate that the mapping of the segment should replace any existing mapping in the range starting at shmaddr and continuing for the size of the segment. This flag is Linux-specific.
SHM_EXEC - allow the contents of the segment to be executed. Linux-specific.
Passing the value 0 means that all flags are unset. I wouldn't use NULL here, since NULL implies the parameter type is a pointer, which it is not.
See the shmat(2) man page.

Working of mmap()

I am trying to get an idea on how does memory mapping take place using the system call mmap.
So far I know mmap takes arguments from the user and returns a logical address of where the file is stored. When the user tries to access it takes this address to the map table converts it to a a physical address and carries the operation as requested.
However I found articles as code example and Theoretical explanation
What it mentions is the memory mapping is carried out as:
A. Using system call mmap ()
B. file operations using (struct file *filp, struct vm_area_struct *vma)
What I am trying to figure out is:
How the arguments passed in the mmap system call are used in the struct vm_area_struct *vma) More generally how are these 2 related.
for instance: the struct vm_area_struct has arguments such as starting address, ending address permissions,etc. How are the values sent by the user used to fill values of these variables.
I am trying to write a driver so, Does the kernal fill the values for variables in the structure for us and I simply use it to call and pass values to remap_pfn_range
And a more fundamental question, why is a different file systems operation needed. The fact that mmap returns the virtual address means that it has already achieved a mapping doesnt it ?
Finally I am not that clear about how the entire process would work in user as well as kernal space. Any documentation explaining the process in details would be helpful.

Can I pass an integer to `access_ok()` as it's second argument?

In LDD3's example, access_ok() is placed at the beginning of ioctl method of a kernel module to check whether a pointer passed from userspace is valid. It is correct when userspace application calls ioctl() system call, and passes it an address of a variable. In some cases, however, ioctl() system call is invoked with a value instead of a pointer as third argument and finally the second argument of access_ok() in kernel module.
I've tried to pass an integer as access_ok()'s second argument and it works fine. No error was reported. But I don't very sure that is this usage correct?
For example, if I invoke ioctl() in userspace with it's third argument to be '3'. Then, in ioctl() method of struct file_operations, access_ok() will receive 3 as it's second argument. Because the access_ok() expects a pointer, so it translates 3 to be a userspace pointer. Obversely, it's wrong...
Actually, access_ok's check is rough. Description of the function (in the source file) say:
Note that, depending on architecture, this function probably just
checks that the pointer is in the user space range - after calling
this function, memory access functions may still return -EFAULT.
E.g., according to source arch/x86/include/asm/uaccess.h, on x86 access_ok just checks that given address points to the lower area (because kernel besides in the upper area). So, it returns true for address equal to 3.
It is copy_from_user/copy_to_user who return a final verdict about user memory accessibility.
Userspace programs can give you any random value as a pointer, so access_ok() must be able to handle any random value.
So it is definitely OK to call access_ok() with a non-pointer value.
However, unless you are actually going to try to access that memory location, calling access_ok() is utterly pointless.
(For that matter, you should, if possible, avoid access_ok() and just check the actual userspace accesses (get_user() etc.) for errors.)

How device name is copied in ip_rt_ioctl in fib_frontend.c

I have one doubt in ip_rt_ioctl function
In case of route addition, first a copy_from_user is made for the structure struct rtentry and then the copied data from is subsequently used in rtentry_to_fib_config function, including the rtentry.rt_dev field which usually is the device name.
My understanding is copy_from_user does a shallow copy. So since the rtentry.rt_dev field is again a character pointer. So likely the contents of the pointer will not get copied.
Hence even after copy the device name will be pointer to the user space address.
So is it right to access the user space address from kernel space ?
It's OK to refer to user-space address from kernel-space while kernel is bound to that process' context (this is true for syscall handlers). In that case, proper page table is set and it's safe to refer to user process' memory.
However, you should always check validity of address or use copy_from_user() that does that.

Resources